Staff Site Reliability Engineer
Company | Uber Freight |
---|---|
Location | Frisco, TX, USA |
Salary | $Not Provided – $Not Provided |
Type | Full-Time |
Degrees | Bachelor’s |
Experience Level | Senior, Expert or higher |
Requirements
- Bachelor’s degree in Computer Science, Engineering, or a related field
- 8+ years in platform engineering, DevOps, Site Reliability Engineering (SRE), or release engineering roles.
Responsibilities
- Lead the design and architecture of major software components, systems, and applications to enhance availability, scalability, latency, and resource efficiency within a hybrid cloud environment.
- Demonstrate expert-level proficiency with over 5 years of experience managing high-availability, fault-tolerant, scalable, distributed software systems in production.
- Serve as the primary engineer for shared Kubernetes and Kafka platforms, managing their design, development, and operational performance.
- Design and implement multi-region solutions in public cloud environments (GCP, Azure, or AWS) to improve scalability, reliability, and redundancy.
- Develop and implement platform solutions for microservices and event-driven architectures using Kubernetes, with infrastructure as code principles driven by Terraform.
- Lead the implementation of Kubernetes-based application security and compliance solutions, ensuring adherence to industry and organizational best practices.
- Create and maintain reusable Helm charts for supporting microservices, accommodating diverse programming languages and architectures.
- Build, optimize, and manage CI/CD pipelines tailored for a variety of software applications and target platforms.
- Configure and oversee CI/CD tools and service infrastructure using Terraform or API-driven stateless environments.
- Develop and maintain Gradle build scripts to streamline and optimize the software development lifecycle.
- Architect and manage traffic routing strategies to build BCP/DR for enterprise scale organizations.
- Oversee all aspects of release engineering, including version control systems, software builds, and deployment pipelines.
- Collaborate with cross-functional teams to define and standardize best practices, reusable modules, and tooling frameworks across the organization.
- Develop and maintain comprehensive technical documentation to outline platform engineering roadmaps and future strategies.
- Develop observability solutions by defining metrics, KPIs, dashboards, alerts, and runbooks, ensuring the platform meets operational excellence standards.
Preferred Qualifications
- Proven expertise in public cloud platforms (AWS, Azure, GCP) and hybrid infrastructure management.
- Proficiency in the software development lifecycle (SDLC) and Agile/Scrum methodologies with a strong background in cross-functional team collaboration.
- Experience in design and enhancement of observability platforms such as Datadog/Dynatrace/NewRelic to support microservices and infrastructure monitoring.
- Familiarity with Service-Oriented Architecture (SOA) and event-based architectures (e.g., Kafka, Pub/Sub, SQS).
- Hands-on experience with implementing event-driven microservices using modern architectural approaches.