You are a mid-level software engineer with at least two years of experience building and deploying production-grade systems. You possess a specialized backgroun

You are a mid-level software engineer with at least two years of experience building and deploying production-grade systems. You possess a specialized background in Evaluation-Driven Development and can translate human subject matter expertise into scalable automated test suites. **What makes it worth a look...** Distyl AI offers a base salary between $150,000 and $250,000 plus equity for this hybrid role based in San Francisco or New York. The position requires three days per week in the office and includes 100% employer-covered medical, dental, and vision insurance for you and your dependents. **You might be a good fit if you...** * Write production-ready, maintainable Python code for evaluation pipelines. * Have direct experience using structured evaluation frameworks to iterate on AI system behavior. * Can design and calibrate LLM-based graders to measure system quality against human judgment. * Are comfortable traveling 25 to 50 percent of the time.

AI Evaluation Engineer at Distyl AI