Member of Technical Staff - GPU Infrastructure at Prime Intellect

**Who this is for** This position is for a seasoned technical expert passionate about building and optimizing the GPU infrastructure that powers advanced AI mod

Work type: remote

Location: San Francisco | Remote

Salary: $150,000 – $300,000/yr

Type: Full-time

Summary

**Who this is for** This position is for a seasoned technical expert passionate about building and optimizing the GPU infrastructure that powers advanced AI model training. You will be the go-to person for designing and implementing high-performance computing environments for Prime Intellect's clients. **Key highlights** You will collaborate with clients to architect and deploy optimal GPU cluster solutions, ranging from hundreds to thousands of GPUs. This role involves hands-on deployment, configuration, and performance tuning of complex infrastructure, including networking and storage, ensuring maximum efficiency for LLM training and HPC workloads. **You might be a good fit if you...** - Have 3+ years of hands-on experience with GPU clusters and HPC environments. - Possess deep expertise in SLURM and Kubernetes within production GPU settings. - Have proven experience with InfiniBand configuration and troubleshooting. - Understand NVIDIA GPU architecture, the CUDA ecosystem, and driver stack. - Are proficient in Python, Bash, and systems programming.

Job Description

# Building Open Superintelligence Infrastructure

Prime Intellect is building the open superintelligence stack - from frontier agentic models to the infra that enables anyone to create, train, and deploy them. We aggregate and orchestrate global compute into a single control plane and pair it with the full rl post-training stack: environments, secure sandboxes, verifiable evals, and our async RL trainer. We enable researchers, startups and enterprises to run end-to-end reinforcement learning at frontier scale, adapting models to real tools, workflows, and deployment contexts.

As our Solutions Architect for GPU Infrastructure, you'll be the technical expert who transforms customer requirements into production-ready systems capable of training the world's most advanced AI models.

We recently raised $15mm in funding (total of $20mm raised) led by Founders Fund, with participation from Menlo Ventures and prominent angels including Andrej Karpathy (Eureka AI, Tesla, OpenAI), Tri Dao (Chief Scientific Officer of Together AI), Dylan Patel (SemiAnalysis), Clem Delangue (Huggingface), Emad Mostaque (Stability AI) and many others.

Core Technical Responsibilities

This customer-facing role combines deep technical expertise with hands-on implementation. You'll be instrumental in:

Customer Architecture & Design





Infrastructure Deployment & Optimization






Production Operations & Support






Technical RequirementsRequired Experience








Infrastructure Skills






Nice to Have







Growth Opportunity

You'll work directly with customers pushing the boundaries of AI, from startups training foundation models to enterprises deploying massive inference infrastructure. You'll collaborate with our world-class engineering team while having direct impact on systems powering the next generation of AI breakthroughs.

We value expertise and customer obsession - if you're passionate about building reliable, high-performance GPU infrastructure and have a track record of successful large-scale deployments, we want to talk to you.

Apply now and join us in our mission to democratize access to planetary scale computing.

Compensation

Cash Compensation Range of $150-300k plus Equity Incentives

View this job on nocollar jobs