Senior Machine Learning Engineer - Hardware Acceleration

Senior Machine Learning Engineer – Hardware Acceleration

Bachelor’s degree in computer science, data science, artificial intelligence, or related field with 6+ years of professional experience or a master’s degree with 3+ years of experience
Mastery of Modern C++ (14 or more recent) and Python, with the ability to write efficient and maintainable code for both performance and flexibility
Familiarity with object-oriented software design patterns, and their implementation in C++
In-depth knowledge of CUDA programming and experience with optimizing deep learning kernels
Excellent understanding of parallel computing (GPGPU) and high-performance (HPC) concepts
Excel at working in a highly collaborative environment
Familiarity with AGILE development practices
Comfortable using collaborative development tools such as Git and Jira
Ability to adhere to company coding standards
Proven dedication to writing production-quality code that is robust, efficient, portable, maintainable, and bug-free

Optimize machine learning inference models for NVIDIA Orin execution
Leverage data parallelism and CUDA programming
Implement tensorrt plugins
Stay abreast of the latest advancements in PyTorch, maximizing their potential for target hardware execution
Collaborate with machine learning engineers to develop innovative and performant deep learning solutions
Analyze and optimize deep learning inference using profiling and optimization tools, identifying, and eliminating performance bottlenecks
Contribute to the development of internal tools and libraries to further enhance deep learning performance on the target hardware
Document your work clearly and concisely, sharing knowledge effectively with team members

Phd with 1+ years of experience
Experience working on safety critical systems
Experience with other relevant NVIDIA libraries and frameworks, such as CUBLAS, CuDNN, and NPP
Deep Learning frameworks such as TensorFlow, PyTorch, or Caffe