Posted in

Machine Learning Performance Engineer

Machine Learning Performance Engineer

CompanyWayve
LocationSunnyvale, CA, USA
Salary$Not Provided – $Not Provided
TypeFull-Time
DegreesBachelor’s, Master’s
Experience LevelSenior

Requirements

  • 5+ years experience in performance optimization or ML engineering.
  • Experience optimize large scale training jobs on GPU compute clusters (e.g. PyTorch, CUDA)
  • Experience in working in platform teams and working with research teams.
  • Experience in reporting and tracking over time benchmarked performance in an open and accessible way.
  • Ability to write high quality, well-structured and tested Python code
  • BS or MS in Machine Learning, Computer Science, Engineering, or a related technical discipline or equivalent experience

Responsibilities

  • Maximising the MFU of our large scale training jobs.
  • Profiling and identifying bottlenecks in training code.
  • Implementing GPU kernels to improve training throughput.
  • Working closely with Research teams to integrate and test training efficiency improvements.
  • Owning and improving our GPU training clusters.

Preferred Qualifications

  • Solid experience working with concurrent, parallel and distributed computing.
  • Experience using Nvidia NSight Systems.
  • Experience implementing GPU kernels.
  • Knowledge of computing fundamentals – what makes code fast, secure and reliable.