Site Reliability Engineer II – Real-Time
Company | Esri |
---|---|
Location | West Redlands, Redlands, CA, USA |
Salary | $82160 – $138320 |
Type | Full-Time |
Degrees | Bachelor’s |
Experience Level | Senior |
Requirements
- 5+ years of experience managing Kubernetes (EKS), logging and monitoring (ELK, Prometheus), and container technologies (Docker, ECS)
- Proficient in using Terraform for automating infrastructure provisioning and management
- Ability to design and automate Git workflows for streamlined code integration, testing, and infrastructure deployment
- Ability to write scripts to deploy infrastructure and/or applications (Bash, Python, Terraform)
- High level of understanding and experience with cloud computing platforms (AWS)
- Strong knowledge of Linux Operating system administration, including troubleshooting, performance tuning, and shell scripting
- Proficient in cloud networking, including VPCs, subnets, security groups, and VPNs in platforms like AWS
- Skilled in identifying and resolving system and application issues through effective troubleshooting and root cause analysis
- Working knowledge of a source control and issue management system, preferably GitHub
- Working knowledge of authoring, deploying, and troubleshooting Java applications on AWS Lambda
- Bachelor’s in computer science, computer engineering, GIS, or information systems
Responsibilities
- Collaborate with a team of SRE engineers to operate SaaS capabilities across multiple regions on the cloud platform
- Design, implement, configure, and utilize monitoring systems to monitor the health of SaaS products
- Manage infrastructure used for ArcGIS Velocity and ArcGIS Workflow Manager, respond to alerts, and troubleshoot problems to resolution
- Develop, implement, and maintain automation solutions for repetitive operational tasks, such as deployment pipelines, incident resolution, and scaling processes
- Design and implement the deployment and upgrade containerized micro-service components that, when combined, power Esri’s SaaS offerings
- Create and automate Git workflows to simplify code integration, testing, and infrastructure deployments.
- Participate in technical spike efforts, bringing new innovative ideas to future versions of our software
- Troubleshoot the system incidents and provide root cause analysis reports
- Provide rotational on-call technical support
Preferred Qualifications
- 5+ years of experience designing, administering, and/or maintaining cloud environments, such as AWS, supporting 24×7 high-availability production environments
- Interest in working with GitOps principles to automate the deployment of applications on Kubernetes clusters
- Certifications: AWS Certified Solution Architect Associate, CKA/CKAD or similar
- Experience managing OpenSearch (datastore or logstore), and Kafka for managing distributed data streams and ensuring high availability in large-scale systems
- Ability to work with continuous integration and delivery best practices
- Knowledge of operating resilient, highly available, scalable, and performance SaaS capabilities
- Knowledge of Esri ArcGIS or other web mapping technologies
- Master’s in computer science, computer engineering, GIS, or information systems