Staff Site Reliability Engineer

Company	Uber Freight
Location	Frisco, TX, USA
Salary	$Not Provided – $Not Provided
Type	Full-Time
Degrees	Bachelor’s
Experience Level	Senior, Expert or higher

Requirements

Bachelor’s degree in Computer Science, Engineering, or a related field
8+ years in platform engineering, DevOps, Site Reliability Engineering (SRE), or release engineering roles.

Responsibilities

Lead the design and architecture of major software components, systems, and applications to enhance availability, scalability, latency, and resource efficiency within a hybrid cloud environment.
Demonstrate expert-level proficiency with over 5 years of experience managing high-availability, fault-tolerant, scalable, distributed software systems in production.
Serve as the primary engineer for shared Kubernetes and Kafka platforms, managing their design, development, and operational performance.
Design and implement multi-region solutions in public cloud environments (GCP, Azure, or AWS) to improve scalability, reliability, and redundancy.
Develop and implement platform solutions for microservices and event-driven architectures using Kubernetes, with infrastructure as code principles driven by Terraform.
Lead the implementation of Kubernetes-based application security and compliance solutions, ensuring adherence to industry and organizational best practices.
Create and maintain reusable Helm charts for supporting microservices, accommodating diverse programming languages and architectures.
Build, optimize, and manage CI/CD pipelines tailored for a variety of software applications and target platforms.
Configure and oversee CI/CD tools and service infrastructure using Terraform or API-driven stateless environments.
Develop and maintain Gradle build scripts to streamline and optimize the software development lifecycle.
Architect and manage traffic routing strategies to build BCP/DR for enterprise scale organizations.
Oversee all aspects of release engineering, including version control systems, software builds, and deployment pipelines.
Collaborate with cross-functional teams to define and standardize best practices, reusable modules, and tooling frameworks across the organization.
Develop and maintain comprehensive technical documentation to outline platform engineering roadmaps and future strategies.
Develop observability solutions by defining metrics, KPIs, dashboards, alerts, and runbooks, ensuring the platform meets operational excellence standards.

Preferred Qualifications

Proven expertise in public cloud platforms (AWS, Azure, GCP) and hybrid infrastructure management.
Proficiency in the software development lifecycle (SDLC) and Agile/Scrum methodologies with a strong background in cross-functional team collaboration.
Experience in design and enhancement of observability platforms such as Datadog/Dynatrace/NewRelic to support microservices and infrastructure monitoring.
Familiarity with Service-Oriented Architecture (SOA) and event-based architectures (e.g., Kafka, Pub/Sub, SQS).
Hands-on experience with implementing event-driven microservices using modern architectural approaches.