Posted in

Senior Site Reliability Engineer

Senior Site Reliability Engineer

CompanySalesforce
LocationSan Francisco, CA, USA
Salary$172000 – $236500
TypeFull-Time
Degrees
Experience LevelSenior

Requirements

  • 3+ years of experience in SRE/Devops/Systems Engineering roles
  • Experience operating large scale cluster management systems (e.g. Kubernetes) of a mission critical service
  • Strong working experience with Kubernetes, Docker, Container Orchestration, Service Mesh, Ingress Gateway
  • Good knowledge with network technologies, such as TCP/IP, DNS, TLS termination, HTTP proxies, Load Balancers, etc.
  • Excellent troubleshooting skills with the ability to learn new technologies in complex distributed systems
  • Strong Experience in Observability tools like Prometheus, Grafana, Splunk, ElasticSearch etc.
  • Strong working experience with Linux Systems Administration. Good knowledge of Linux internals
  • Good experience in scripting/programming languages: Python, GoLang etc .
  • Experience with AWS, Terraform, Spinnaker, ArgoCD
  • Ability to manage multiple projects simultaneously, meet deadlines and adapt to shifting priorities
  • Excellent problem-solving, analytical and communication skills, with a strong ability to work effectively in a team environment

Responsibilities

  • You are responsible for the high availability for the microservices supporting service mesh and ingress gateway on a large fleet of 1000+ clusters running various technologies like Kubernetes, Docker, network load balancers, service mesh, Istio and so on.
  • You’ll gain valuable experience troubleshooting real production issues which will expand your knowledge of the architecture.
  • You will contribute code to drive availability improvement for services.
  • You will help improve the platform’s visibility by implementing necessary monitoring and metrics with Prometheus, Grafana and other monitoring frameworks.
  • You will drive automation efforts in Python/Golang/Puppet/Jenkins to eliminate manual work with day to day operations.
  • You will drive improvements to CI/CD pipelines built on Terraform, Spinnaker and Argo.
  • You’ll implement AIOps automation, monitoring and self-healing mechanisms to proactively fix issues to reduce MTTR and Operational Toil.
  • You will get a chance to improve your communication and collaboration skills working with various other Infrastructure teams across Salesforce.
  • You will interact with a highly innovative and creative team of developers and architects.
  • You will evaluate new technologies to solve problems as needed.

Preferred Qualifications

    No preferred qualifications provided.