Senior Site Reliability Engineer
Company | CarGurus |
---|---|
Location | Boston, MA, USA |
Salary | $Not Provided – $Not Provided |
Type | Full-Time |
Degrees | |
Experience Level | Senior |
Requirements
- Linux administration
- SRE theory and vocabulary
- basic coding and scripting
- production experience
- incident management experience
- proven background in software engineering with multiple languages
- significant relative operational experience running revenue-critical services at scale
- understanding of technologies beyond coding such as Load Balancing, Configuration Management, Kubernetes, Terraform and Observability Systems
- comfort in dealing with Incidents and Availability Issues under pressure
- familiarity and experience working with cloud infrastructure in an AWS environment
- familiarity with modern best Site Reliability Engineering practices and theory
- comfort and skill in written and verbal communication across teams and organizations
- excitement in solving puzzles, discovering how a new service or tool works by identifying the individual components, libraries, and relationships it is built upon
- a bias for action, but sufficient emotional intelligence to approach colleagues with positive regard and understanding their challenges and decisions
- curiosity and the acceptance that there are always ways to learn and grow
- the desire to be an active contributor in a collaborative and fast-paced environment
Responsibilities
- Linux administration, site reliability best practices, incident management, critical on call
- Collaborating with Engineering and Product Managers to define SLOs and monitoring of well-designed SLIs
- Embedding with Engineering teams and independently addressing issues or collaborating to improve operational excellence
- Being the primary point of escalation and on the on call rotation for major engineering incidents
- Owning our Incident Response Process, including conducting blameless Postmortems
- Partnering with Engineering teams to ensure new services are production-ready
- Championing our organizational standards for architecting, observing, deploying, and scaling our products
- Evolving and maintaining our tracing, logging, monitoring, alerting, and other observability systems to increase observability and transparency
- Educating the company on observability tools and troubleshooting techniques and practices
- Making Data-Driven decisions to drive continuous improvement
- Refusing to accept manual work as a solution to areas of weakness
Preferred Qualifications
-
No preferred qualifications provided.