Senior Machine Learning Engineer – Hardware Acceleration
Company | Torc Robotics |
---|---|
Location | Ann Arbor, MI, USA |
Salary | $177300 – $212800 |
Type | Full-Time |
Degrees | Bachelor’s, Master’s, PhD |
Experience Level | Senior |
Requirements
- Bachelor’s degree in computer science, data science, artificial intelligence, or related field with 6+ years of professional experience or a master’s degree with 3+ years of experience
- Mastery of Modern C++ (14 or more recent) and Python, with the ability to write efficient and maintainable code for both performance and flexibility
- Familiarity with object-oriented software design patterns, and their implementation in C++
- In-depth knowledge of CUDA programming and experience with optimizing deep learning kernels
- Excellent understanding of parallel computing (GPGPU) and high-performance (HPC) concepts
- Excel at working in a highly collaborative environment
- Familiarity with AGILE development practices
- Comfortable using collaborative development tools such as Git and Jira
- Ability to adhere to company coding standards
- Proven dedication to writing production-quality code that is robust, efficient, portable, maintainable, and bug-free
Responsibilities
- Optimize machine learning inference models for NVIDIA Orin execution
- Leverage data parallelism and CUDA programming
- Implement tensorrt plugins
- Stay abreast of the latest advancements in PyTorch, maximizing their potential for target hardware execution
- Collaborate with machine learning engineers to develop innovative and performant deep learning solutions
- Analyze and optimize deep learning inference using profiling and optimization tools, identifying, and eliminating performance bottlenecks
- Contribute to the development of internal tools and libraries to further enhance deep learning performance on the target hardware
- Document your work clearly and concisely, sharing knowledge effectively with team members
Preferred Qualifications
- Phd with 1+ years of experience
- Experience working on safety critical systems
- Experience with other relevant NVIDIA libraries and frameworks, such as CUBLAS, CuDNN, and NPP
- Deep Learning frameworks such as TensorFlow, PyTorch, or Caffe