Multimodal Generative AI Researcher at Stability AI

**Who this is for** This role is for a research-oriented scientist with a strong engineering background, focused on advancing the capabilities of large Vision-L

Work type: remote

Location: Remote

Type: Full-time

Summary

**Who this is for** This role is for a research-oriented scientist with a strong engineering background, focused on advancing the capabilities of large Vision-Language and Multimodal Models. **Key highlights** You will lead the development and fine-tuning of hybrid architectures for tasks like 3D understanding and visual reasoning, bridging the gap between cutting-edge research and production-grade pipelines. **You might be a good fit if you...** - Possess a PhD in Machine Learning, Computer Vision, or a related field with a record of impactful research. - Have deep experience training and scaling VLMs/LLMs using distributed frameworks like PyTorch or Ray. - Bring a strong engineering mindset to design end-to-end training and evaluation systems. - Are familiar with recent trends like Mixture-of-Experts (MoE) and 3D-aware multimodal generation.

Job Description

Multimodal Generative AI ResearcherLocation: Remote

About the Role

We’re looking for a Research Scientist with deep expertise in training and fine-tuning large Vision-Language and Language Models (VLMs / LLMs) for downstream multimodal tasks. You’ll help push the next frontier of models that reason across vision, language, and 3D, bridging research breakthroughs with scalable engineering.

What You’ll Do






What You Bring









Bonus / Preferred







Equal Employment Opportunity:

We are an equal opportunity employer and do not discriminate on the basis of race, religion, national origin, gender, sexual orientation, age, veteran status, disability or other legally protected statuses.

View this job on nocollar jobs