Posted in

Staff Site Reliability Engineer

Staff Site Reliability Engineer

CompanyUber Freight
LocationFrisco, TX, USA
Salary$Not Provided – $Not Provided
TypeFull-Time
DegreesBachelor’s
Experience LevelSenior, Expert or higher

Requirements

  • Bachelor’s degree in Computer Science, Engineering, or a related field
  • 8+ years in platform engineering, DevOps, Site Reliability Engineering (SRE), or release engineering roles.

Responsibilities

  • Lead the design and architecture of major software components, systems, and applications to enhance availability, scalability, latency, and resource efficiency within a hybrid cloud environment.
  • Demonstrate expert-level proficiency with over 5 years of experience managing high-availability, fault-tolerant, scalable, distributed software systems in production.
  • Serve as the primary engineer for shared Kubernetes and Kafka platforms, managing their design, development, and operational performance.
  • Design and implement multi-region solutions in public cloud environments (GCP, Azure, or AWS) to improve scalability, reliability, and redundancy.
  • Develop and implement platform solutions for microservices and event-driven architectures using Kubernetes, with infrastructure as code principles driven by Terraform.
  • Lead the implementation of Kubernetes-based application security and compliance solutions, ensuring adherence to industry and organizational best practices.
  • Create and maintain reusable Helm charts for supporting microservices, accommodating diverse programming languages and architectures.
  • Build, optimize, and manage CI/CD pipelines tailored for a variety of software applications and target platforms.
  • Configure and oversee CI/CD tools and service infrastructure using Terraform or API-driven stateless environments.
  • Develop and maintain Gradle build scripts to streamline and optimize the software development lifecycle.
  • Architect and manage traffic routing strategies to build BCP/DR for enterprise scale organizations.
  • Oversee all aspects of release engineering, including version control systems, software builds, and deployment pipelines.
  • Collaborate with cross-functional teams to define and standardize best practices, reusable modules, and tooling frameworks across the organization.
  • Develop and maintain comprehensive technical documentation to outline platform engineering roadmaps and future strategies.
  • Develop observability solutions by defining metrics, KPIs, dashboards, alerts, and runbooks, ensuring the platform meets operational excellence standards.

Preferred Qualifications

  • Proven expertise in public cloud platforms (AWS, Azure, GCP) and hybrid infrastructure management.
  • Proficiency in the software development lifecycle (SDLC) and Agile/Scrum methodologies with a strong background in cross-functional team collaboration.
  • Experience in design and enhancement of observability platforms such as Datadog/Dynatrace/NewRelic to support microservices and infrastructure monitoring.
  • Familiarity with Service-Oriented Architecture (SOA) and event-based architectures (e.g., Kafka, Pub/Sub, SQS).
  • Hands-on experience with implementing event-driven microservices using modern architectural approaches.