Skip to content

Site Reliability Engineer
Company | Pano AI |
---|
Location | San Francisco, CA, USA |
---|
Salary | $150000 – $205000 |
---|
Type | Full-Time |
---|
Degrees | |
---|
Experience Level | Senior |
---|
Requirements
- 5+ years of professional experience in a fast-paced SaaS or a similar business environment
- 3+ years of hands-on experience supporting production systems as a Site Reliability Engineer (SRE) or a DevOps Engineer
- 3+ years of hands-on experience with cloud services and technologies (GCP, AWS, Azure, etc.)
- Experience with containerization and orchestration tools (e.g., Docker, Kubernetes)
- Proficient in Infrastructure as Code (IaC) tools and methodologies (e.g. Terraform, Pulumi, Puppet, etc.)
- Proven ability to troubleshoot and resolve complex technical issues in distributed systems
- Ability to communicate effectively within the team and across the organization while sharing insights and updates and collaborating to achieve project goals
Responsibilities
- Implement and maintain monitoring systems to proactively identify and address potential issues before they impact users.
- Automate repetitive tasks and processes, such as deployments, infrastructure management, and incident response, to improve efficiency and reduce manual effort.
- Respond to incidents, diagnose problems, and implement solutions to restore service quickly.
- Improve the performance and scalability of systems and applications, ensuring they can handle peak loads and user traffic.
- Help plan future capacity needs, ensuring that systems can accommodate growth and evolving requirements, while remaining cost-efficient.
- Work closely with development teams to understand their needs, guide them, and ensure that systems are designed and deployed reliably.
- Build tools to codify and automate infrastructure operations.
- Define and track SLIs and SLOs to measure the performance and reliability of services.
- Assess and mitigate risks associated with deployments and infrastructure changes.
- Assist with the release and deployment processes, ensuring that changes are rolled out smoothly and reliably.
Preferred Qualifications
- Advanced working knowledge of GCP Services like GKE, GCS, IAM, etc.
- Professional experience supporting containerized Java/JVM/Python services
- Experience with relational databases, particularly PostgreSQL
- 3+ years of professional experience designing, and implementing and/or administering CI/CD solutions (e.g. Github Actions, Buildkite, Jenkins, etc.)
- Strong SRE mindset with focus on cloud networking and security best practices
- Strong software development, particularly with scripting languages (e.g., Python, Bash, etc.)
- Experience with system administration in general and Linux in particular
- Familiarity with SOC2 / ISO 27001 security frameworks
- Preference for someone in the Pacific / Mountain time zone