Unlimited Job Postings Subscription - $99/yr!

Job Details

Senior HPC Engineer, Infrastructure Specialist Team

  2025-06-30     Nvidia     all cities,CA  
Description:

NVIDIA is looking for a Senior HPC Engineer to join its Professional Services team. Academic, commercial, and government groups around the world are using NVIDIA products to revolutionize deep learning and data analytics, and to power data centers. Join the team building many of the largest and fastest AI / HPC systems in the world!

NVIDIA is seeking someone with the ability to work on a dynamic, customer-focused team that requires excellent interpersonal skills. This role involves interacting with customers, partners, and internal teams to analyze, define, and implement large-scale AI / HPC projects. These efforts include a combination of networking, system design, automation, and validation.

What you will be doing :

  • Deploying, managing, and validating AI / HPC infrastructure in Linux-based environments for new and existing customers.
  • Serving as the domain expert during planning calls through implementation.
  • Providing handover-related documentation and performing knowledge transfers to support customers in rolling out sophisticated systems.
  • Providing feedback to internal teams, such as reporting bugs, documenting workarounds, and suggesting improvements.

What we need to see :

  • 5+ years of experience providing in-depth support and deployment services, solving problems for hardware and software products.
  • Knowledge and experience with Linux system administration, process management, package management, task scheduling, kernel management, boot procedures/troubleshooting, performance reporting/optimization/logging, network routing, and advanced networking (tuning and monitoring).
  • Experience with cluster management technologies (bonus for BCM).
  • A minimum of a four-year degree from an accredited university or college in Computer Science, Electrical or Computer Engineering, or equivalent experience.
  • Excellent interpersonal skills and the ability to resolve customer issues effectively.
  • Strong organizational skills with the ability to prioritize and multitask with limited supervision.
  • Experience with schedulers such as SLURM, LSF, UGE, etc.
  • Willingness to travel within the United States to customer sites.
  • Background in automation tools (Ansible, Puppet, etc.).
  • Experience with benchmarking tools such as HPL, NCCL tests, MLPERF.
  • Kubernetes experience.

Ways to stand out from the crowd :

  • InfiniBand experience.
  • Experience with GPU-focused hardware/software.
  • Experience with MPI (Message Passing Interface).
  • Storage technologies such as Lustre or GPFS.
  • Familiarity with Dell and Supermicro GPU platforms.

NVIDIA is widely considered one of the most desirable employers in the tech industry. We have some of the most forward-thinking and hardworking people in the world working for us. If you're creative and autonomous, we want to hear from you!

The base salary range is $116,000 - $230,000 USD. Your salary will be determined based on your location, experience, and the pay scale for similar roles.

You will also be eligible for equity and other benefits. NVIDIA accepts applications on an ongoing basis.

NVIDIA is committed to fostering a diverse work environment and is proud to be an equal opportunity employer. We value diversity and do not discriminate in hiring or promotion practices based on race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability, or any other characteristic protected by law.

#J-18808-Ljbffr


Apply for this Job

Please use the APPLY HERE link below to view additional details and application instructions.

Apply Here

Back to Search