Applied ML Engineer at Knowtex

This role is ideal for a mid-to-senior level Engineer (3–7+ years) who thrives at the intersection of machine learning research and production software engineer

Work type: hybrid

Location: San Francisco

Type: Full-time

Summary

This role is ideal for a mid-to-senior level Engineer (3–7+ years) who thrives at the intersection of machine learning research and production software engineering. You should have a deep mastery of Python and PyTorch, with a proven track record of moving generative AI models beyond the notebook and into low-latency, high-throughput production environments. The most compelling aspect of this position is the opportunity to work at a high-growth startup founded by Stanford AI scientists. While an exact salary isn't listed, the package includes "meaningful equity," which offers significant upside potential as the platform scales across federal and commercial health systems. You’ll be solving complex technical challenges like model quantization and distributed inference while making a tangible impact on clinician burnout. **You might be a good fit if you...** * Have extensive experience deploying transformer-based LLMs or speech-to-text systems on AWS. * Are comfortable building automated evaluation pipelines to ensure model reliability in high-stakes healthcare settings. * Enjoy the "applied" side of ML—optimizing for speed, batching, and caching rather than just training. * Prefer a hybrid work environment in San Francisco and want to work on a product with clear social utility.

Job Description

About Knowtex

Knowtex is building the future of voice AI operating systems for clinicians, transforming how healthcare documentation happens at the point of care. Founded by Stanford AI scientists with deep clinical experience, we're experiencing explosive growth across both commercial health systems and federal healthcare, with our ambient documentation platform scaling rapidly to thousands of clinicians across hundreds of specialties. We're at an inflection point where cutting-edge AI meets real clinical impact, giving clinicians hours back each day to focus on what matters most - their patients.

Position Overview

We are seeking an Applied ML Engineer to productionize and scale machine learning systems powering our voice AI platform. This role bridges research and engineering — transforming models into reliable, low-latency, production-grade systems deployed across enterprise healthcare environments.

You will work closely with ML Scientists, Backend Engineers, and Platform teams to optimize inference performance, build evaluation pipelines, and ensure robust model deployment in regulated environments.

Key Responsibilities

Productionize ML models for real-time clinical applications

Optimize inference pipelines for low latency and high throughput

Deploy and scale models using AWS-based infrastructure

Build automated evaluation and regression testing frameworks for LLM outputs

Implement monitoring systems for model performance and drift detection

Collaborate with Backend teams to integrate ML services into APIs and workflows

Improve model efficiency through quantization, batching, caching, and optimization techniques

Support specialty-level model evaluation and performance analysis

Contribute to CI/CD workflows for ML deployment

Required Qualifications

3–7+ years of experience in machine learning engineering or applied ML roles

Strong proficiency in Python and PyTorch (or TensorFlow)

Experience deploying ML models in production environments

Familiarity with transformer architectures and large language models

Experience with model optimization techniques (quantization, distillation, pruning)

Experience working with cloud infrastructure (AWS preferred)

Strong software engineering fundamentals and debugging skills

Preferred Qualifications

Experience with speech recognition systems or NLP pipelines

Experience with Triton Inference Server or similar deployment frameworks

Familiarity with healthcare data or clinical documentation workflows

Experience working in regulated environments (HIPAA, GovCloud, etc.)

Knowledge of medical coding systems (ICD-10, CPT)

Technical Environment

Python, PyTorch / TensorFlow

Transformer-based LLM architectures

AWS (SageMaker, ECS, Lambda, S3)

Triton Inference Server

CI/CD pipelines for ML deployment

Observability tools for performance and drift monitoring

Compensation & Benefits

Meaningful equity compensation

Unlimited PTO

Premium health, dental, and vision coverage

401(k) plan

View this job on nocollar jobs