Powering growth. Expanding possibilities.

At SGH, we design and develop high-performance, high-availability, enterprise solutions that help our customers solve for the future. Across our computing, memory, and LED lines of business, we focus on serving our customers by providing deep technical knowledge and expertise, custom design engineering, build-to-order flexibility, and a commitment to best-in-class quality.

We come from a broad collection of experiences and diverse backgrounds, but we’re united by a drive to raise the bar for the impactful technologies we design and manufacture, our customers, and each other. With an open and inclusive culture, we help one another think creatively and look beyond – because when we’re each at our best, we’re even more powerful together.

Powering growth. Expanding possibilities.

At SGH, we design and develop high-performance, high-availability, enterprise solutions that help our customers solve for the future. Across our computing, memory, and LED lines of business, we focus on serving our customers by providing deep technical knowledge and expertise, custom design engineering, build-to-order flexibility, and a commitment to best-in-class quality.

We come from a broad collection of experiences and diverse backgrounds, but we’re united by a drive to raise the bar for the impactful technologies we design and manufacture, our customers, and each other. With an open and inclusive culture, we help one another think creatively and look beyond – because when we’re each at our best, we’re even more powerful together.

HPC DevOps Engineer

Date Posted:  Apr 10, 2024
Requisition ID:  1079
Location: 

VA, US

Brand:  PenguinSolutions

The Penguin Solutions™ portfolio, which includes Penguin Computing™ and Penguin Edge™, accelerates customers’ digital transformation with the power of emerging technologies in HPC, AI, and IoT with solutions and services that span the continuum of edge, core, and cloud. By designing highly advanced infrastructure, machines, and networked systems we enable the world’s most innovative enterprises and government institutions to build the autonomous future, drive discovery and amplify human potential.

 

Overview:

You will play a critical role in designing, implementing, and maintaining our HPC infrastructure to support scientific and computational workloads. The ideal candidate should have a strong background in both high-performance computing and DevOps practices. Expert-level experience in managing HPC clusters and proficient in Ansible to ensure the reliability, scalability, and efficiency of our HPC systems.

 

Responsibilities:

  • Support, Improve, Document, and Test Ansible configuration management code for Linux-based, high-performance computing (HPC) and AI/ML environments
  • Collaborate with cross-functional teams to design and implement HPC infrastructure solutions.
  • Demonstrate expert-level proficiency in Ansible for automation, configuration management, and orchestration.
  • Implement and manage CI/CD pipelines for deploying HPC applications and workflows.
  • Automate system provisioning, configuration, and orchestration processes using Ansible, Cobbler.
  • Analyze and optimize configurations to maximize cluster utilization and performance.
  • Develop and implement monitoring solutions to proactively identify issues and ensure system reliability.
  • Respond to and resolve incidents related to HPC infrastructure within SLA
  • Work closely with researchers, scientists, and other stakeholders to understand their computational needs.
  • Create and maintain comprehensive documentation for HPC infrastructure and processes.
  • Make recommendations for hardware and software upgrades as needed.
  • Provide training and support to end-users on HPC best practices.
  • Some after hours support for critical events may be required

 

Qualifications:

  • BS in Computer Science, Information Technology, or equivalent experience
  • 5+ years HPC, software development, and/or system experience
  • Expert-level proficiency in Ansible for automation and configuration management
  • Strong background in DevOps practices and tools
  • Linux Experience, scripting, python
  • Working with technical break/fix resources
  • MS Office / Google Suite
  • Proficiency in scripting languages such as Python, Bash, or similar.
  • Cross functional leadership experience
  • Ability to align, motivate and lead a team including creating accountability
  • Comfortable running presentations, both remote and onsite, with internal and external senior leaders and team member
  • Knowledge of parallel file systems and storage solutions.
  • Understanding of networking principles and configurations in an HPC environment.
  • Excellent problem-solving and troubleshooting skills.
  • Strong communication and collaboration skills
  • Be flexible and able to function in a high growth environment

 

Preferred Qualifications:

  • Mentor developing team members
  • Professional Certifications such as RHCE, CISSP, CCNA/CCIE
  • Working knowledge of HPC/AI systems and components
  • Working knowledge of Agile methodologies
  • In-depth experience with Slurm workload manager.
  • Experience with containerization technologies (e.g., Docker, Singularity).

 

Location:

This is a hybrid role where you will be onsite at least 3 days per week in the Richmond, VA area.

 

Travel:

Estimated 10-25% travel required

 

Compensation & Benefits

The pay range that the Company reasonably expects to pay for this position in Virginia is $124,000 - $171,000; the pay ultimately offered within the expected range may vary based on business considerations, including job-related knowledge, skills, experience, and education. The position is bonus-eligible, and there are medical, dental, and vision benefits available. There is a 401k saving plan and other benefits, such as Paid Time Off, Life Insurance, and an Employee Assistance Plan.   

 

Diversity and Inclusion Statement

SGH, together with its affiliates, is committed to creating a diverse environment that embraces differences and fosters inclusion.

 

Equal Opportunity Statement

We are an Affirmative Action/Equal Opportunity Employer and strongly committed to all policies which will afford equal opportunity employment to all qualified persons without regard to age, national origin, race, ethnicity, creed, gender, disability, veteran status, or any other characteristic protected by law.