This role is designed for a seasoned engineer with deep expertise in large-scale distributed systems and a passion for the "inner workings" of a global platform

This role is designed for a seasoned engineer with deep expertise in large-scale distributed systems and a passion for the "inner workings" of a global platform. The ideal candidate has already spent several years running mission-critical services at scale and understands how to architect self-healing infrastructure. Since you’ll be working on MongoDB’s internal observability stack, you should be highly proficient in at least one modern programming language (like Go or Python) and comfortable managing complex telemetry pipelines. Located in Dublin, this position offers a hybrid work model that balances collaborative in-office time with remote flexibility. You will have a high level of independence to define standards for a mission-critical platform used by the entire engineering organization. The company emphasizes a blameless culture and provides a generous compensation package with dedicated time for upskilling in new technologies. **You might be a good fit if you...** * Have hands-on experience with modern observability tools like VictoriaMetrics, Splunk, or Kubernetes-based environments. * Excel at designing systems for high availability, fault tolerance, and self-healing across multiple cloud providers (AWS, Azure, or GCP). * Enjoy a mix of heads-down architecture and collaborative consulting with other teams to improve their monitoring and instrumentation. * Are comfortable participating in a week-long on-call rotation and contributing to blameless post-mortem processes.

Site Reliability Engineer (Senior or Staff), Observability at MongoDB