Machine Learning Performance Engineer

5+ years experience in performance optimization or ML engineering.
Experience optimize large scale training jobs on GPU compute clusters (e.g. PyTorch, CUDA)
Experience in working in platform teams and working with research teams.
Experience in reporting and tracking over time benchmarked performance in an open and accessible way.
Ability to write high quality, well-structured and tested Python code
BS or MS in Machine Learning, Computer Science, Engineering, or a related technical discipline or equivalent experience

Maximising the MFU of our large scale training jobs.
Profiling and identifying bottlenecks in training code.
Implementing GPU kernels to improve training throughput.
Working closely with Research teams to integrate and test training efficiency improvements.
Owning and improving our GPU training clusters.

Solid experience working with concurrent, parallel and distributed computing.
Experience using Nvidia NSight Systems.
Experience implementing GPU kernels.
Knowledge of computing fundamentals – what makes code fast, secure and reliable.