Posted in

Software Engineer – Power Management – Hardware Health

Software Engineer – Power Management – Hardware Health

CompanyOpenAI
LocationSan Francisco, CA, USA
Salary$310000 – $460000
TypeFull-Time
Degrees
Experience LevelSenior, Expert or higher

Requirements

  • 7+ years of software engineering experience with a focus on solving large-scale, system-level challenges.
  • Strong proficiency in Python and familiarity with automation and scripting tools (e.g., shell scripting).
  • Experience with distributed systems to efficiently aggregate and analyze streaming data.
  • Knowledge of electrical engineering concepts including digital signal processing, power systems, Fast Fourier Transforms, or related areas.
  • Experience in system-level investigations and development of automated solutions to address power management, fault detection, and remediation.
  • Strong analytical skills and the ability to dig into noisy data (experience with SQL, PromQL, Pandas, etc.).
  • Comfort working with both hardware and software teams to solve multidisciplinary problems.

Responsibilities

  • Develop and implement system-level and software-level solutions to optimize power usage in large-scale supercomputers, ensuring efficient and reliable operations.
  • Build automation to monitor power consumption patterns during training workloads and design algorithms to stabilize these fluctuations, preventing issues with grid reliability.
  • Work with researchers and engineers to design tools for real-time monitoring, detection, and remediation of power-related hardware and system faults.
  • Collaborate cross-functionally to translate complex electrical system requirements into code, while driving continuous improvements in power management solutions.
  • Drive the development of power throttling mechanisms at the IT system level to dynamically adjust power usage based on workload demands and infrastructure limitations.
  • Collaborate with hardware design teams to integrate system-level power control requirements into IT hardware design, ensuring seamless coordination between software-driven power management and hardware capabilities.

Preferred Qualifications

  • Deep expertise with the power characteristics of synchronous workloads (as seen in supercomputing or model training environments).
  • Knowledge of power control requirements in IT hardware design, with the ability to drive cross-functional collaboration to integrate power management features into hardware systems effectively.
  • Working knowledge of control system fundamentals and how physical systems respond to control strategies.