Senior Site Reliability Engineer

Linux administration
SRE theory and vocabulary
basic coding and scripting
production experience
incident management experience
proven background in software engineering with multiple languages
significant relative operational experience running revenue-critical services at scale
understanding of technologies beyond coding such as Load Balancing, Configuration Management, Kubernetes, Terraform and Observability Systems
comfort in dealing with Incidents and Availability Issues under pressure
familiarity and experience working with cloud infrastructure in an AWS environment
familiarity with modern best Site Reliability Engineering practices and theory
comfort and skill in written and verbal communication across teams and organizations
excitement in solving puzzles, discovering how a new service or tool works by identifying the individual components, libraries, and relationships it is built upon
a bias for action, but sufficient emotional intelligence to approach colleagues with positive regard and understanding their challenges and decisions
curiosity and the acceptance that there are always ways to learn and grow
the desire to be an active contributor in a collaborative and fast-paced environment

Linux administration, site reliability best practices, incident management, critical on call
Collaborating with Engineering and Product Managers to define SLOs and monitoring of well-designed SLIs
Embedding with Engineering teams and independently addressing issues or collaborating to improve operational excellence
Being the primary point of escalation and on the on call rotation for major engineering incidents
Owning our Incident Response Process, including conducting blameless Postmortems
Partnering with Engineering teams to ensure new services are production-ready
Championing our organizational standards for architecting, observing, deploying, and scaling our products
Evolving and maintaining our tracing, logging, monitoring, alerting, and other observability systems to increase observability and transparency
Educating the company on observability tools and troubleshooting techniques and practices
Making Data-Driven decisions to drive continuous improvement
Refusing to accept manual work as a solution to areas of weakness

No preferred qualifications provided.