Skip to content

Software Engineer – Evals Infrastructure – Preparedness
Company | OpenAI |
---|
Location | San Francisco, CA, USA |
---|
Salary | $325000 – $325000 |
---|
Type | Full-Time |
---|
Degrees | Bachelor’s |
---|
Experience Level | Senior, Expert or higher |
---|
Requirements
- Bachelor’s degree in Computer Science, Information Technology, or a related field (or equivalent work experience)
- At least 7+ years of professional software engineering experience
- Proven experience as a reliability engineer or a similar role in a fast-paced, rapidly scaling company
- Strong proficiency in cloud infrastructure
- Proficiency in programming/scripting languages
- Experience with containerization technologies and container orchestration platforms like Kubernetes
- Knowledge of IaC tools such as Terraform or CloudFormation
- Excellent problem-solving and troubleshooting skills
- Strong communication and collaboration skills
- Experience with observability tools such as DataDog, Prometheus, Grafana, Splunk and ELK stack
- Experience with microservices architecture and service mesh technologies
- Knowledge of security best practices in cloud environments
Responsibilities
- Work on scaling our infrastructure to support a wide variety of evaluations, supporting systems and automation
- Collaborate with development teams to make our systems more reliable (owning Production Readiness Reviews)
- Implement and manage monitoring systems to proactively identify issues and anomalies in our production environment
- Develop and maintain service level objectives (SLOs) and service level indicators (SLIs) to measure and ensure system reliability
- Implement fault-tolerant and resilient design patterns to minimize service disruptions
- Build and maintain automation tools to streamline repetitive tasks and improve system reliability
- Partner with engineers and researchers at OpenAI to help bring frontier research capabilities to the world
- Participate in an on-call rotation to respond to critical incidents and ensure 24/7 system availability
Preferred Qualifications
- Enjoy seeking out and addressing bottlenecks and areas for performance improvement in our systems
- Utilize Infrastructure as Code (IaC) principles to automate infrastructure provisioning and configuration management
- Experienced in collaborating with cross-functional teams to ensure that reliability and scalability are considered in the design and development of new features and services
- Track record of accelerating engineering reliability by empowering fellow engineers with excellent tooling and systems
- Help create a diverse, equitable, and inclusive culture that makes all feel welcome while enabling radical candor and the challenging of group think
- Humble attitude, eagerness to help colleagues, and a desire to do whatever it takes to make the team succeed
- Own problems end-to-end, and are willing to pick up whatever knowledge you’re missing to get the job done