Posted in

Senior Site Reliability Engineer

Senior Site Reliability Engineer

CompanyCarGurus
LocationBoston, MA, USA
Salary$Not Provided – $Not Provided
TypeFull-Time
Degrees
Experience LevelSenior

Requirements

  • Linux administration
  • SRE theory and vocabulary
  • basic coding and scripting
  • production experience
  • incident management experience
  • proven background in software engineering with multiple languages
  • significant relative operational experience running revenue-critical services at scale
  • understanding of technologies beyond coding such as Load Balancing, Configuration Management, Kubernetes, Terraform and Observability Systems
  • comfort in dealing with Incidents and Availability Issues under pressure
  • familiarity and experience working with cloud infrastructure in an AWS environment
  • familiarity with modern best Site Reliability Engineering practices and theory
  • comfort and skill in written and verbal communication across teams and organizations
  • excitement in solving puzzles, discovering how a new service or tool works by identifying the individual components, libraries, and relationships it is built upon
  • a bias for action, but sufficient emotional intelligence to approach colleagues with positive regard and understanding their challenges and decisions
  • curiosity and the acceptance that there are always ways to learn and grow
  • the desire to be an active contributor in a collaborative and fast-paced environment

Responsibilities

  • Linux administration, site reliability best practices, incident management, critical on call
  • Collaborating with Engineering and Product Managers to define SLOs and monitoring of well-designed SLIs
  • Embedding with Engineering teams and independently addressing issues or collaborating to improve operational excellence
  • Being the primary point of escalation and on the on call rotation for major engineering incidents
  • Owning our Incident Response Process, including conducting blameless Postmortems
  • Partnering with Engineering teams to ensure new services are production-ready
  • Championing our organizational standards for architecting, observing, deploying, and scaling our products
  • Evolving and maintaining our tracing, logging, monitoring, alerting, and other observability systems to increase observability and transparency
  • Educating the company on observability tools and troubleshooting techniques and practices
  • Making Data-Driven decisions to drive continuous improvement
  • Refusing to accept manual work as a solution to areas of weakness

Preferred Qualifications

    No preferred qualifications provided.