Senior Reliaibility Engineer - Technology at Truelogic

This role is designed for a senior SRE or Platform Engineer with at least 5 years of experience who thrives on high-scale observability and system resilience. Y

Work type: remote

Location: Santo Domingo

Type: Full-time

Summary

This role is designed for a senior SRE or Platform Engineer with at least 5 years of experience who thrives on high-scale observability and system resilience. You aren't just building infrastructure from scratch; you are the "detective and doctor" of the system, focusing on how services behave in production, tuning Kubernetes clusters, and evolving AWS CDK constructs to ensure everything stays up and scales automatically. The most attractive part of this offer is the **fully remote, USD-denominated salary**, which provides significant financial upside for talent in Latin America. You will be working with a high-growth U.S. data-tech company that scales Shopify brands, meaning you’ll handle complex, high-traffic environments involving Spark on K8s and Kafka. The culture emphasizes result-oriented autonomy rather than micromanagement. **You might be a good fit if you...** * Are an expert in **AWS (VPC, EKS, RDS)** and have deep experience with **Kubernetes** operations. * Fluently code in **Python** and have used **AWS CDK** or **CDK8s** for Infrastructure-as-Code. * Obsess over SLIs/SLOs and believe in "observability-first" design using Prometheus and Grafana. * Prefer optimizing existing distributed systems and automating recovery over basic provisioning.

Job Description

# About Truelogic

At Truelogic, we are a leading provider of nearshore staff augmentation services headquartered in New York. For over two decades, we’ve been delivering top-tier technology solutions to companies of all sizes, from innovative startups to industry leaders, helping them achieve their digital transformation goals.

Our team of 600+ highly skilled tech professionals, based in Latin America, drives digital disruption by partnering with U.S. companies on their most impactful projects. Whether collaborating with Fortune 500 giants or scaling startups, we deliver results that make a difference.

By applying for this position, you’re taking the first step in joining a dynamic team that values your expertise and aspirations. We aim to align your skills with opportunities that foster exceptional career growth and success while contributing to transformative projects that shape the future.

# Our Client

A data-driven technology company that partners with high-growth brands to optimize customer acquisition and retention. It specializes in delivering high-LTV audiences and enrichment data to increase repeat purchase rates. The company collaborates with major platforms and agencies such as Shopify, Experian, TransUnion, and top media partners, all focused on driving profitable revenue growth.

# Job Summary

The Site Reliability Engineer plays a key role in operating, observing, and improving the reliability of existing distributed systems running on AWS and Kubernetes, with a strong emphasis on observability, operational maturity, and automated responses to system behavior. Rather than focusing on provisioning infrastructure from scratch, this role concentrates on understanding how services behave in production, detecting when they are not operating correctly, and enabling automated scaling, recovery, and remediation using existing platforms and tooling. The engineer partners closely with backend and platform teams to evolve observability practices, define reliability signals, and improve how the platform responds to operational and performance concerns, driving overall system resilience and reliability.

## Responsibilities













## Qualifications and Job Requirements










## What We Offer






## Why You’ll Like Working Here




Apply now!

View this job on nocollar jobs