Skip to content

Senior Site Reliability Engineer
Company | Salesforce |
---|
Location | San Francisco, CA, USA |
---|
Salary | $172000 – $236500 |
---|
Type | Full-Time |
---|
Degrees | |
---|
Experience Level | Senior |
---|
Requirements
- 3+ years of experience in SRE/Devops/Systems Engineering roles
- Experience operating large scale cluster management systems (e.g. Kubernetes) of a mission critical service
- Strong working experience with Kubernetes, Docker, Container Orchestration, Service Mesh, Ingress Gateway
- Good knowledge with network technologies, such as TCP/IP, DNS, TLS termination, HTTP proxies, Load Balancers, etc.
- Excellent troubleshooting skills with the ability to learn new technologies in complex distributed systems
- Strong Experience in Observability tools like Prometheus, Grafana, Splunk, ElasticSearch etc.
- Strong working experience with Linux Systems Administration. Good knowledge of Linux internals
- Good experience in scripting/programming languages: Python, GoLang etc .
- Experience with AWS, Terraform, Spinnaker, ArgoCD
- Ability to manage multiple projects simultaneously, meet deadlines and adapt to shifting priorities
- Excellent problem-solving, analytical and communication skills, with a strong ability to work effectively in a team environment
Responsibilities
- You are responsible for the high availability for the microservices supporting service mesh and ingress gateway on a large fleet of 1000+ clusters running various technologies like Kubernetes, Docker, network load balancers, service mesh, Istio and so on.
- You’ll gain valuable experience troubleshooting real production issues which will expand your knowledge of the architecture.
- You will contribute code to drive availability improvement for services.
- You will help improve the platform’s visibility by implementing necessary monitoring and metrics with Prometheus, Grafana and other monitoring frameworks.
- You will drive automation efforts in Python/Golang/Puppet/Jenkins to eliminate manual work with day to day operations.
- You will drive improvements to CI/CD pipelines built on Terraform, Spinnaker and Argo.
- You’ll implement AIOps automation, monitoring and self-healing mechanisms to proactively fix issues to reduce MTTR and Operational Toil.
- You will get a chance to improve your communication and collaboration skills working with various other Infrastructure teams across Salesforce.
- You will interact with a highly innovative and creative team of developers and architects.
- You will evaluate new technologies to solve problems as needed.
Preferred Qualifications
No preferred qualifications provided.