Senior AI Data Engineer at Veeva

This role is ideal for a senior-level data or software engineer who has spent the last few years specializing in the "quality" side of Generative AI. Beyond gen

Work type: remote

Location: Massachusetts - Boston

Salary: $110,000 – $270,000/yr

Type: Full-time

Summary

This role is ideal for a senior-level data or software engineer who has spent the last few years specializing in the "quality" side of Generative AI. Beyond general coding, you need 5+ years of experience with Python and automated evaluation pipelines. You are someone who finds "breaking" AI models as satisfying as building them, with a deep understanding of how to detect hallucinations, biases, and agentic reasoning failures. The compensation range is exceptionally wide ($110k - $270k), offering high upside for top-tier talent. As a Public Benefit Corporation, the company offers a unique "Work Anywhere" policy that truly supports remote flexibility across the US and Canada. You’ll be joining a mission-driven environment in the life sciences sector, focusing on high-stakes AI Agents where accuracy is critical. **You might be a good fit if you:** * Have a "test-to-break" mindset and can programmatically identify why an LLM is hallucinating. * Are highly proficient in advanced prompt engineering and debugging complex agentic workflows. * Can build custom Python frameworks to automate evaluation metrics like task success rate and semantic similarity. * Value a high-work-ethic culture and want to work for a company legally committed to social balance.

Job Description

Veeva Systems is a mission-driven organization and pioneer in industry cloud, helping life sciences companies bring therapies to patients faster. As one of the fastest-growing SaaS companies in history, we surpassed $3B in revenue in our last fiscal year with extensive growth potential ahead.

At the heart of Veeva are our values: Do the Right Thing, Customer Success, Employee Success, and Speed. We're not just any public company – we made history in 2021 by becoming a [public benefit corporation](https://medium.com/@peter.gassner/public-benefit-corporation-pbc-486e1e9508a6) (PBC), legally bound to balancing the interests of customers, employees, society, and investors.

As a [Work Anywhere](https://careers.veeva.com/work-anywhere/) company, we support your flexibility to work from home or in the office, so you can thrive in your ideal environment.

Join us in [transforming the life sciences industry](https://www.youtube.com/watch?v=TaPSP8cCSKY&ab_channel=Forbes), committed to making a positive impact on its customers, employees, and communities.

## What You'll Do

Evaluation Strategy & Planning: Define and establish comprehensive evaluation strategies for new AI Agents. Prioritize the integrity and coverage of test data sets to reflect real-world usage and potential failure modes

LLM Output Integrity Assessment: Programmatically and manually evaluate the quality of LLM-generated content against predefined metrics (e.g., factual accuracy, contextual relevance, coherence, and safety standards)

Creating High-Fidelity Datasets: Design, curate, and generate diverse, high-quality test data sets, including challenging prompts and scenarios. Evaluate LLM outputs to proactively identify system biases, unsafe content, hallucinations, and critical edge cases

Automation of Evaluation Pipelines: Develop, implement, and maintain scalable automated evaluations to ensure efficient, continuous validation of agent behavior and prevent regressions with new features and model updates

Root Cause Analysis: Understand model behaviors and assist in the trace and root-cause analysis of identified defects or performance degradations

Reporting & Performance Metrics: Clearly document, track, and communicate performance metrics, validation results, and bug status to the broader development and product teams

## Requirements

Data Integrity & Validation: A strong, specialized understanding of data quality principles, including methods for validating datasets against bias, integrity concerns, and quality standards. Ability to craft diverse and adversarial test data to uncover AI edge cases

Prompt Engineering & Model Expertise: Demonstrated skill in advanced prompt engineering techniques to create evaluation scenarios that test the AI's reasoning, action planning, and adherence to system instructions. Deep knowledge of LLM common failure modes (hallucination, incoherence, jailbreaking)

Automated Evaluation Implementation: 5+ years of experience designing and deploying automated evaluation pipelines to assess complex, agentic AI behaviors. Familiarity with quality metrics such as task success rate, semantic similarity, and sentiment analysis for output measurement

Debugging Agentic Systems: Must be comfortable with the specific challenges of debugging agentic systems, including tracing and interpreting an agent's internal reasoning, tool use, and action sequence to pinpoint failure points

Programming & Frameworks: 5+ years of experience using Python to develop custom evaluation frameworks, writing scripts, and integrating pipelines with CI/CD systems. Familiarity with standard test automation tools (e.g., Pytest, modern web automation tools)

Bachelor's degree in Data Science, Machine Learning, Computer Science, or a related field, with experience in Gen AI / LLMs

High work ethic. Veeva is a hard-working company

High integrity and honesty. Veeva is a PBC and a “do the right thing” company. We expect that from all employees

Applicants must have the unrestricted right to work in the United States or Canada. Veeva will not provide sponsorship at this time

## Perks & Benefits

Medical, dental, vision, and basic life insurance

Flexible PTO and company paid holidays

Retirement programs

1% charitable giving program

## Compensation

Base pay: $110,000 - $270,000

The salary range listed here has been provided to comply with local regulations and represents a potential base salary range for this role. Please note that actual salaries may vary within the range above or below, depending on experience and location. We look at compensation for each individual and base our offer on your unique qualifications, experience, and expected contributions. This position may also be eligible for other types of compensation in addition to base salary, such as variable bonus and/or stock bonus.

View this job on nocollar jobs