Posted in

Site-Reliability Engineer

Site-Reliability Engineer

CompanyAutostore
LocationDenver, CO, USA
Salary$130000 – $160000
TypeFull-Time
Degrees
Experience LevelSenior

Requirements

  • 5+ years of experience in a Site Reliability Engineering or related role
  • 2+ years of experience focusing on improving observability and performance of applications
  • Mindful of the tradeoffs with various infrastructure choices and how they impact uptime
  • Focused on delighting customers by establishing clear expectations
  • Experience evangelizing technical concepts is a must

Responsibilities

  • Collaborate to achieve highly available and scalable Azure cloud infrastructure supporting 24/7 warehouse automation applications
  • Experienced in leading teams to define healthy observability practices with tools such as New Relic, DataDog, Sentry, Prometheus, and Grafana
  • Work with application engineers to establish and track Service Level Objectives (SLOs) and Service Level Indicators (SLIs) for core application features
  • Firm understanding of root cause analysis and comfortable coaching teams on improving existing practices
  • Comfortable with Terraform to ensure consistent and repeatable deployments
  • Experienced with multiple CNCF projects such as Helm, Flux, Argo, Kubernetes, Prometheus, and Grafana
  • Create tools and automation to streamline development workflows and enable safe, efficient application deployments
  • Collaborate with product squads to assess risks and develop mitigation strategies for system reliability
  • Implement security best practices and ensure compliance with industry standards across cloud infrastructure
  • Serve as a technical evangelist for reliability engineering principles and best practices across the organization
  • Mentor software engineers on building reliable, observable applications while continuously improving operational efficiency

Preferred Qualifications

    No preferred qualifications provided.