Skip to content

Site-Reliability Engineer
Company | Autostore |
---|
Location | Denver, CO, USA |
---|
Salary | $130000 – $160000 |
---|
Type | Full-Time |
---|
Degrees | |
---|
Experience Level | Senior |
---|
Requirements
- 5+ years of experience in a Site Reliability Engineering or related role
- 2+ years of experience focusing on improving observability and performance of applications
- Mindful of the tradeoffs with various infrastructure choices and how they impact uptime
- Focused on delighting customers by establishing clear expectations
- Experience evangelizing technical concepts is a must
Responsibilities
- Collaborate to achieve highly available and scalable Azure cloud infrastructure supporting 24/7 warehouse automation applications
- Experienced in leading teams to define healthy observability practices with tools such as New Relic, DataDog, Sentry, Prometheus, and Grafana
- Work with application engineers to establish and track Service Level Objectives (SLOs) and Service Level Indicators (SLIs) for core application features
- Firm understanding of root cause analysis and comfortable coaching teams on improving existing practices
- Comfortable with Terraform to ensure consistent and repeatable deployments
- Experienced with multiple CNCF projects such as Helm, Flux, Argo, Kubernetes, Prometheus, and Grafana
- Create tools and automation to streamline development workflows and enable safe, efficient application deployments
- Collaborate with product squads to assess risks and develop mitigation strategies for system reliability
- Implement security best practices and ensure compliance with industry standards across cloud infrastructure
- Serve as a technical evangelist for reliability engineering principles and best practices across the organization
- Mentor software engineers on building reliable, observable applications while continuously improving operational efficiency
Preferred Qualifications
No preferred qualifications provided.