Posted in

Site Reliability Engineer III

Site Reliability Engineer III

CompanyJP Morgan Chase
LocationColumbus, OH, USA
Salary$Not Provided – $Not Provided
TypeFull-Time
Degrees
Experience LevelMid Level, Senior

Requirements

  • Formal training or certification on Site Reliability concepts and 3+ years applied experience
  • Proficiency in at least one programming language such as Python, Java/Spring Boot, or .Net.
  • Strong knowledge of cloud platforms (e.g., AWS, Azure, Google Cloud) and virtualization technologies.
  • Experience with monitoring and logging tools such as Prometheus, Grafana, Splunk, Dynatrace, and Datadog.
  • Proficiency in continuous integration and continuous delivery tools like Jenkins, GitLab, or Terraform.
  • Experience with container and container orchestration technologies such as ECS, Kubernetes, and Docker.
  • Experience implementing and managing error budgets and familiarity with site reliability culture and principles.
  • Proficient knowledge of software applications and technical processes within disciplines like Cloud and artificial intelligence.
  • Familiarity with troubleshooting common networking technologies and issues.
  • Ability to contribute to large and collaborative teams by presenting information logically and effectively.
  • Ability to proactively recognize roadblocks, identify new technologies, and implement innovative solutions to solve business problems.

Responsibilities

  • Collaborate with cross-functional teams to design, implement, and maintain scalable and reliable systems.
  • Develop and maintain monitoring, alerting, and incident response systems to ensure high availability, performance, and quality.
  • Implement and manage service level indicators (SLIs), service level objectives (SLOs), and service level agreements (SLAs) to measure and improve system reliability and customer satisfaction.
  • Utilize error budgets to manage delivery and prioritize reliability improvements for the applications and platforms we own and support.
  • Automate repetitive tasks and processes to improve efficiency and achieve reduction of toil, minimizing manual intervention.
  • Participate in on-call rotations to provide 24/7 support for critical systems and services.
  • Conduct root cause analysis and blameless post-mortem reviews to prevent future incidents and improve system reliability.

Preferred Qualifications

  • Experience with Virtual Desktop Infrastructure (VDI) solutions.
  • Knowledge of networking and security best practices.
  • Certifications in AWS, Splunk, Dynatrace, or Terraform.
  • Experience with data science tools, methodologies, and techniques.