Posted in

Site Reliability Engineer

Site Reliability Engineer

CompanyPano AI
LocationSan Francisco, CA, USA
Salary$150000 – $205000
TypeFull-Time
Degrees
Experience LevelSenior

Requirements

  • 5+ years of professional experience in a fast-paced SaaS or a similar business environment
  • 3+ years of hands-on experience supporting production systems as a Site Reliability Engineer (SRE) or a DevOps Engineer
  • 3+ years of hands-on experience with cloud services and technologies (GCP, AWS, Azure, etc.)
  • Experience with containerization and orchestration tools (e.g., Docker, Kubernetes)
  • Proficient in Infrastructure as Code (IaC) tools and methodologies (e.g. Terraform, Pulumi, Puppet, etc.)
  • Proven ability to troubleshoot and resolve complex technical issues in distributed systems
  • Ability to communicate effectively within the team and across the organization while sharing insights and updates and collaborating to achieve project goals

Responsibilities

  • Implement and maintain monitoring systems to proactively identify and address potential issues before they impact users.
  • Automate repetitive tasks and processes, such as deployments, infrastructure management, and incident response, to improve efficiency and reduce manual effort.
  • Respond to incidents, diagnose problems, and implement solutions to restore service quickly.
  • Improve the performance and scalability of systems and applications, ensuring they can handle peak loads and user traffic.
  • Help plan future capacity needs, ensuring that systems can accommodate growth and evolving requirements, while remaining cost-efficient.
  • Work closely with development teams to understand their needs, guide them, and ensure that systems are designed and deployed reliably.
  • Build tools to codify and automate infrastructure operations.
  • Define and track SLIs and SLOs to measure the performance and reliability of services.
  • Assess and mitigate risks associated with deployments and infrastructure changes.
  • Assist with the release and deployment processes, ensuring that changes are rolled out smoothly and reliably.

Preferred Qualifications

  • Advanced working knowledge of GCP Services like GKE, GCS, IAM, etc.
  • Professional experience supporting containerized Java/JVM/Python services
  • Experience with relational databases, particularly PostgreSQL
  • 3+ years of professional experience designing, and implementing and/or administering CI/CD solutions (e.g. Github Actions, Buildkite, Jenkins, etc.)
  • Strong SRE mindset with focus on cloud networking and security best practices
  • Strong software development, particularly with scripting languages (e.g., Python, Bash, etc.)
  • Experience with system administration in general and Linux in particular
  • Familiarity with SOC2 / ISO 27001 security frameworks
  • Preference for someone in the Pacific / Mountain time zone