Machine Learning Performance Engineer
Company | Wayve |
---|---|
Location | Sunnyvale, CA, USA |
Salary | $Not Provided – $Not Provided |
Type | Full-Time |
Degrees | Bachelor’s, Master’s |
Experience Level | Senior |
Requirements
- 5+ years experience in performance optimization or ML engineering.
- Experience optimize large scale training jobs on GPU compute clusters (e.g. PyTorch, CUDA)
- Experience in working in platform teams and working with research teams.
- Experience in reporting and tracking over time benchmarked performance in an open and accessible way.
- Ability to write high quality, well-structured and tested Python code
- BS or MS in Machine Learning, Computer Science, Engineering, or a related technical discipline or equivalent experience
Responsibilities
- Maximising the MFU of our large scale training jobs.
- Profiling and identifying bottlenecks in training code.
- Implementing GPU kernels to improve training throughput.
- Working closely with Research teams to integrate and test training efficiency improvements.
- Owning and improving our GPU training clusters.
Preferred Qualifications
- Solid experience working with concurrent, parallel and distributed computing.
- Experience using Nvidia NSight Systems.
- Experience implementing GPU kernels.
- Knowledge of computing fundamentals – what makes code fast, secure and reliable.