You're a seasoned Site Reliability Engineer with over 7 years of experience operating production services at scale, ready to be a founding member of a new SRE t
Work type: remote
Location: United States (EST)
Salary: $180,000 – $220,000/yr
Type: Full-time
You're a seasoned Site Reliability Engineer with over 7 years of experience operating production services at scale, ready to be a founding member of a new SRE team. **What makes it worth a look...** Gradle is offering a full-time, on-site position in the US EST timezone with a salary of $180,000 - $220,000 annually. You'll have the unique chance to shape a new SRE team from the ground up and gain real ownership of production systems. **You might be a good fit if you...** * Have strong Kubernetes experience in production environments and cloud infrastructure expertise, preferably AWS (EKS, RDS, S3, EC2). * Are proficient with observability tools like Prometheus and Grafana, and Infrastructure as Code using Terraform. * Possess scripting proficiency in Python or Bash for automation and a track record of incident management in a 24/7 on-call setting. * Have experience designing and operating systems with SLOs and error budgets.
## Who We Are
AI is changing how software gets built. Code production is becoming a commodity. The focus is shifting from writing code to orchestrating, verifying, and governing change – and the toolchain is the new constraint.
Gradle is at the center of this shift. We build Develocity, a toolchain observability and intelligence platform used by some of the world's leading software organizations – Netflix, Airbnb, Spotify, SAP, major global banks, and hundreds more. Develocity helps software teams achieve delivery excellence through deep observability, build and test acceleration, and AI-powered intelligence across the entire toolchain – with current support for Gradle Build Tool, Apache Maven™, sbt, npm, and Python.
We are an AI-native company. AI is not a feature we're bolting on – it's central to how we work, how we think about our product, and where we're heading. We're investing deeply in making Develocity's unique data and decades of domain expertise accessible to both humans and AI agents, with trust, evidence, and explainability at the core of everything we build.
We have partnered with the Apache Software Foundation, the Commonhaus Foundation, the Micronaut Foundation, and other OSS projects such as Spring, Quarkus, Kotlin, JUnit, AndroidX, and many more to bring the values of Develocity also to the OSS Community.
## Our Values
Seek to Understand:Everything starts with listening and understanding; we strive to understand diverse viewpoints, problems, and motivations. Before we take action, we ensure we truly grasp the challenges, perspectives, and goals.
Know the Why: We approach our work with a clear sense of purpose, ensuring every step is deliberate and focused. We take meaningful action with urgency, but never at the expense of thoughtful consideration.
Innovate & Iterate: We embrace challenges and are not afraid to try new things, even if they might fail. With a deep understanding and a clear purpose, we can develop creative, bold solutions to tackle challenges.
Own the Outcome:We are empowered to take initiative, and we maintain transparency in our work and its outcomes. When we execute, we take responsibility for our decisions, measure the success of our innovations, and learn from the results.
## Who You Are
We're building a new SRE team and looking for founding members to help shape how we operate. As a Lead SRE, you’ll be a technical and operational leader for reliability across Develocity. You’ll help define our SRE vision, set standards for how we operate production services, and mentor other SREs as the team grows. This is a hands-on role with broad influence across engineering, cloud platform, and customer-facing teams.
The SRE team will be responsible for the reliability, performance, and availability of Develocity instances serving paying customers, open-source projects, and public-facing services, plus supporting infrastructure like artifact registries.
You'll work on our internally-built Cloud Application Platform, Kubernetes on AWS, and develop deep expertise in it. When incidents happen, you'll troubleshoot issues across the stack, from application to infrastructure. You'll collaborate with the Cloud Platform team to improve the tooling you depend on, and with engineering teams to build reliability into how we ship software. If you like automating things and hate doing the same task twice, you'll fit in well.
You'll be part of a distributed, remote-first team that values asynchronous communication and written documentation. Strong self-direction and clear communication across time zones are essential.
## Responsibilities
The US salary range for this position is $180-220k which reflects the target ranges for all US locations. Within this range, individual pay is determined by geographic location and additional factors including but not limited to experience, relevant skills, qualifications, seniority, performance, and travel requirements. Our recruiting team can share more information about the specific salary range for your location during the hiring process.
## Location