Skip to content

Staff Site Reliability Engineer
Company | Illumio |
---|
Location | Sunnyvale, CA, USA |
---|
Salary | $192000 – $230000 |
---|
Type | Full-Time |
---|
Degrees | Bachelor’s |
---|
Experience Level | Senior, Expert or higher |
---|
Requirements
- Bachelor’s degree in Computer Science, Engineering, or related field; or equivalent work experience
- 6+ years of relevant SRE, DevOps, Platform or Infrastructure Engineering experience
- 4+ years in production support role in a fast-paced industry/organization
- Experience deploying, tuning, and maintaining Linux-based, highly available, fault-tolerant web platforms in public cloud providers such as AWS, Azure, and GCP
- Common monitoring, log aggregation, and metrics gathering platforms experience (Icinga, Sensu, Splunk, Telegraf/InfluxDB, et. al.)
- Configuration management & orchestration tools experience like Chef, Ansible, and AWS Services & APIs, or equivalent
- Experience scripting/coding with Python, Java, Ruby and/or Go
- Experience with MySQL, PostgreSQL, Redis, or similar
- Solid knowledge of Linux operating system, Ubuntu, RHEL, OEL7 is required
- EKS and/or AKS frameworks
- Knowledge/Experience of Incident Management/on-call: PagerDuty
- Knowledge of Database Technologies, Release Management, REST, SRE, etc.
- Load balancers/ Traffic manager knowledge
- Experience working with Kubernetes, Docker, or other virtualization & containerization technologies
- Networking basics and trouble shooting skills
- Good understanding of Production deployment, Distributed Environments required
- Strong problem solving and operational process skills, attention to detail
- Application support and debugging experience in a dynamic fast-paced production environment
- Experience with SDLC principles, architecture and operations
- Experience working with senior leadership both inside and outside of engineering
- Ability to manage multiple tasks and competing priorities to deliver projects on schedule
Responsibilities
- Driving reliability improvements back into applications
- Building code to resolve reliability/resiliency issues
- Mentor and educate team members to aid in strengthening technical expertise
- Collaborate closely with cloud architects to drive cloud solutions
- Curating proper SLI/SLOs to accurately measure or assess error budgets
- Embed with the development teams to assist with cloud methodologies when developing products to ensure that the deliverable is as reliable as possible
- Work with development teams to build and strengthen application security and compliance
- Manage high impact situations that involve technically challenging issues across diverse audiences and drive to find the root cause, mitigate, and identify a solution
- Focus on observability
Preferred Qualifications
- Azure certifications such as Azure Administrator, Azure Developer, or AWS/GCP certifications are a plus