AI Evaluation Engineer at Distyl AI

You are a mid-level software engineer with at least two years of experience building and deploying production-grade systems. You possess a specialized backgroun

Work type: hybrid

Location: San Francisco | New York

Salary: $150,000 – $250,000/yr

Type: Full-time

Summary

You are a mid-level software engineer with at least two years of experience building and deploying production-grade systems. You possess a specialized background in Evaluation-Driven Development and can translate human subject matter expertise into scalable automated test suites. **What makes it worth a look...** Distyl AI offers a base salary between $150,000 and $250,000 plus equity for this hybrid role based in San Francisco or New York. The position requires three days per week in the office and includes 100% employer-covered medical, dental, and vision insurance for you and your dependents. **You might be a good fit if you...** * Write production-ready, maintainable Python code for evaluation pipelines. * Have direct experience using structured evaluation frameworks to iterate on AI system behavior. * Can design and calibrate LLM-based graders to measure system quality against human judgment. * Are comfortable traveling 25 to 50 percent of the time.

Job Description

# About Distyl AI

Distyl is an applied AI technology company partnering with the world’s most ambitious institutions to rearchitect critical operations for the frontier of AI. Our customers include the largest companies in telecom, healthcare, insurance, manufacturing, consumer goods, and global social organizations.

We research and deploy technologies that power AI-native operations — both for our partners and for Distyl itself. Our work spans research into self-constructing systems, the development of the most reliable execution of AI systems, and products that transform mission-critical workflows. As a result, Distyl's technologies affect some of the world's largest operations — from hundreds of millions of consumer interactions to tens of millions of supply chain transactions and millions of patient journeys.

Distyl is backed by leading investors including Lightspeed Venture Partners, Khosla Ventures, Coatue, DST Global, and the board-members of 20+ F500s. The results reflect this approach: a 100% production deployment success rate for our customers and one of the few enterprise AI companies to run a profitable business.

# What We Are Looking For

At Distyl, we build AI systems using Evaluation-Driven Development—an approach where evaluation is not an afterthought, but the primary mechanism for iterating, improving, and trusting AI behavior in production.

AI Evaluation Engineers focus on designing and implementing the evaluation systems that drive this process. They are hands-on engineers who write production Python code, build evaluation pipelines, and use structured signals to guide system design, prompt iteration, and deployment decisions for real customer-facing AI systems.

This role is for engineers who believe that AI systems only improve when measurement is tightly coupled to development—and who want to apply that philosophy directly to systems that matter.

# Key Responsibilities







# What We Require








# What We Offer







Distyl has offices in San Francisco and New York. This role follows a hybrid collaboration model with 3+ days per week (Tuesday–Thursday) in‑office..We’re grateful for the strong interest in this role. The best way to get your profile in front of our team is to apply directly through our careers page, where all applications are reviewed. Due to the high volume of interest, we’re not able to review or respond to all direct emails or LinkedIn messages. We will be in touch with every applicant once we’ve completed our review, regardless of the decision.

View this job on nocollar jobs