**Who this is for** This role is for a senior software engineer with a strong background in AI/ML systems and hardware performance, specifically focused on opti
Work type: onsite
Location: US, CA, Santa Clara | US, WA, Seattle
Type: Full-time
**Who this is for** This role is for a senior software engineer with a strong background in AI/ML systems and hardware performance, specifically focused on optimizing large-scale distributed training stacks for LLMs. **Key highlights** You will work on co-designing and benchmarking R&D tools for AI networking, building machine learning-based optimization techniques to improve resource utilization across massive data center GPU and CPU clusters. **You might be a good fit if you...** - Have 4+ years of experience applying machine learning to system architecture or networking. - Possess strong programming skills in Python, C++, and Bash. - Are proficient in PyTorch, TensorFlow, or JAX, with a deep understanding of CUDA and collective communication libraries like NCCL.
NVIDIA seeks a senior software engineer to join the AI Networking co-design and benchmark R&D team. In this pivotal role, the candidate is responsible for building and productizing machine learning tools. These include tools that use ML-based combinatorial optimization and build space exploration (DSE) techniques. These tools will be employed to optimize AI workloads across large GPU and CPU clusters, thereby ensuring the most efficient and productive utilization of system resources at data center scale. The role involves working on distributed Deep Learning, particularly within LLM training and inference stacks. A strong passion for collective communication and networking is desirable. The candidate will interact with diverse hardware and platforms, such as Host Channel Adapters (HCAs), Switches, CPUs, GPUs, and complete Systems. Furthermore, the role requires engagement across multiple software layers, including LLM applications, machine learning frameworks, and communication and computing libraries. The candidate will develop tools and methodologies using Machine Learning (ML) for comprehensive performance analysis and optimization, potentially incorporating learning-based agentic techniques. This work involves deep-diving across the software stack, from LLM applications and ML frameworks down to communication and computing libraries. This position offers a distinct opportunity to make significant contributions to the core infrastructure powering the next generation of large-scale AI systems.
What you'll be doing:
Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 152,000 USD - 241,500 USD for Level 3, and 184,000 USD - 287,500 USD for Level 4.
You will also be eligible for equity and [benefits](https://www.nvidia.com/en-us/benefits/).
Applications for this job will be accepted at least until April 25, 2026.
This posting is for an existing vacancy.
NVIDIA uses AI tools in its recruiting processes.
NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.