Posted in

Site Reliability Engineer – Public Sector

Site Reliability Engineer – Public Sector

CompanyOpenAI
LocationWashington, DC, USA, San Francisco, CA, USA
Salary$279000 – $385000
TypeFull-Time
Degrees
Experience LevelSenior

Requirements

  • Hold an active US security clearance
  • 5+ years experience operating infrastructure and systems at scale
  • Hands-on experience with containers (Docker) and orchestration platforms (kubernetes)
  • Scripting experience with Python or equivalents for automating routine tasks
  • Strong troubleshooting skills across the entire stack (infrastructure, systems, and applications)

Responsibilities

  • Design and build performant, reliable, and scalable infrastructure, both on-premises and in the cloud, for our public sector customers.
  • Administer the systems from the hardware up to kubernetes, ensuring our teams have a standardized infrastructure to deploy OpenAI’s technology onto.
  • Own the reliability of these systems by being on-site with the customer, utilizing observability tooling, and directly troubleshooting issues that arise as the first line of support.
  • Partner with teams across engineering and security to ensure the product supports the unique needs of the infrastructure and use-cases.
  • Automate routine tasks and standardize our infrastructure offerings to allow our team to scale as we continue to grow.
  • Partner with teams across the business, including engineering, security, and compliance, to enable our products to work within the unique constraints of new environments.

Preferred Qualifications

  • Worked out of secure environments, closely collaborating with both on-site clients and remote colleagues.
  • Own problems end-to-end, and are willing to pick up whatever knowledge you’re missing to get the job done to ensure both your team and our customers succeed.
  • Thrive in dynamic environments and can navigate ambiguity with ease.