Staff Site Reliability Engineer

Company	Illumio
Location	Sunnyvale, CA, USA
Salary	$192000 – $230000
Type	Full-Time
Degrees	Bachelor’s
Experience Level	Senior, Expert or higher

Requirements

Bachelor’s degree in Computer Science, Engineering, or related field; or equivalent work experience
6+ years of relevant SRE, DevOps, Platform or Infrastructure Engineering experience
4+ years in production support role in a fast-paced industry/organization
Experience deploying, tuning, and maintaining Linux-based, highly available, fault-tolerant web platforms in public cloud providers such as AWS, Azure, and GCP
Common monitoring, log aggregation, and metrics gathering platforms experience (Icinga, Sensu, Splunk, Telegraf/InfluxDB, et. al.)
Configuration management & orchestration tools experience like Chef, Ansible, and AWS Services & APIs, or equivalent
Experience scripting/coding with Python, Java, Ruby and/or Go
Experience with MySQL, PostgreSQL, Redis, or similar
Solid knowledge of Linux operating system, Ubuntu, RHEL, OEL7 is required
EKS and/or AKS frameworks
Knowledge/Experience of Incident Management/on-call: PagerDuty
Knowledge of Database Technologies, Release Management, REST, SRE, etc.
Load balancers/ Traffic manager knowledge
Experience working with Kubernetes, Docker, or other virtualization & containerization technologies
Networking basics and trouble shooting skills
Good understanding of Production deployment, Distributed Environments required
Strong problem solving and operational process skills, attention to detail
Application support and debugging experience in a dynamic fast-paced production environment
Experience with SDLC principles, architecture and operations
Experience working with senior leadership both inside and outside of engineering
Ability to manage multiple tasks and competing priorities to deliver projects on schedule

Responsibilities

Driving reliability improvements back into applications
Building code to resolve reliability/resiliency issues
Mentor and educate team members to aid in strengthening technical expertise
Collaborate closely with cloud architects to drive cloud solutions
Curating proper SLI/SLOs to accurately measure or assess error budgets
Embed with the development teams to assist with cloud methodologies when developing products to ensure that the deliverable is as reliable as possible
Work with development teams to build and strengthen application security and compliance
Manage high impact situations that involve technically challenging issues across diverse audiences and drive to find the root cause, mitigate, and identify a solution
Focus on observability

Preferred Qualifications

Azure certifications such as Azure Administrator, Azure Developer, or AWS/GCP certifications are a plus