Site Reliability Engineer – Public Sector
Company | OpenAI |
---|---|
Location | Washington, DC, USA, San Francisco, CA, USA |
Salary | $279000 – $385000 |
Type | Full-Time |
Degrees | |
Experience Level | Senior |
Requirements
- Hold an active US security clearance
- 5+ years experience operating infrastructure and systems at scale
- Hands-on experience with containers (Docker) and orchestration platforms (kubernetes)
- Scripting experience with Python or equivalents for automating routine tasks
- Strong troubleshooting skills across the entire stack (infrastructure, systems, and applications)
Responsibilities
- Design and build performant, reliable, and scalable infrastructure, both on-premises and in the cloud, for our public sector customers.
- Administer the systems from the hardware up to kubernetes, ensuring our teams have a standardized infrastructure to deploy OpenAI’s technology onto.
- Own the reliability of these systems by being on-site with the customer, utilizing observability tooling, and directly troubleshooting issues that arise as the first line of support.
- Partner with teams across engineering and security to ensure the product supports the unique needs of the infrastructure and use-cases.
- Automate routine tasks and standardize our infrastructure offerings to allow our team to scale as we continue to grow.
- Partner with teams across the business, including engineering, security, and compliance, to enable our products to work within the unique constraints of new environments.
Preferred Qualifications
- Worked out of secure environments, closely collaborating with both on-site clients and remote colleagues.
- Own problems end-to-end, and are willing to pick up whatever knowledge you’re missing to get the job done to ensure both your team and our customers succeed.
- Thrive in dynamic environments and can navigate ambiguity with ease.