Site Reliability Engineer III
Company | JP Morgan Chase |
---|---|
Location | Columbus, OH, USA |
Salary | $Not Provided – $Not Provided |
Type | Full-Time |
Degrees | |
Experience Level | Mid Level, Senior |
Requirements
- Formal training or certification on Site Reliability concepts and 3+ years applied experience
- Proficiency in at least one programming language such as Python, Java/Spring Boot, or .Net.
- Strong knowledge of cloud platforms (e.g., AWS, Azure, Google Cloud) and virtualization technologies.
- Experience with monitoring and logging tools such as Prometheus, Grafana, Splunk, Dynatrace, and Datadog.
- Proficiency in continuous integration and continuous delivery tools like Jenkins, GitLab, or Terraform.
- Experience with container and container orchestration technologies such as ECS, Kubernetes, and Docker.
- Experience implementing and managing error budgets and familiarity with site reliability culture and principles.
- Proficient knowledge of software applications and technical processes within disciplines like Cloud and artificial intelligence.
- Familiarity with troubleshooting common networking technologies and issues.
- Ability to contribute to large and collaborative teams by presenting information logically and effectively.
- Ability to proactively recognize roadblocks, identify new technologies, and implement innovative solutions to solve business problems.
Responsibilities
- Collaborate with cross-functional teams to design, implement, and maintain scalable and reliable systems.
- Develop and maintain monitoring, alerting, and incident response systems to ensure high availability, performance, and quality.
- Implement and manage service level indicators (SLIs), service level objectives (SLOs), and service level agreements (SLAs) to measure and improve system reliability and customer satisfaction.
- Utilize error budgets to manage delivery and prioritize reliability improvements for the applications and platforms we own and support.
- Automate repetitive tasks and processes to improve efficiency and achieve reduction of toil, minimizing manual intervention.
- Participate in on-call rotations to provide 24/7 support for critical systems and services.
- Conduct root cause analysis and blameless post-mortem reviews to prevent future incidents and improve system reliability.
Preferred Qualifications
- Experience with Virtual Desktop Infrastructure (VDI) solutions.
- Knowledge of networking and security best practices.
- Certifications in AWS, Splunk, Dynatrace, or Terraform.
- Experience with data science tools, methodologies, and techniques.