Posted in

Machine Learning Intern

Machine Learning Intern

Companyd-Matrix
LocationSanta Clara, CA, USA
Salary$Not Provided – $Not Provided
TypeInternship
DegreesBachelor’s, Master’s
Experience LevelInternship

Requirements

  • Currently pursuing a degree in Computer Science, Electrical Engineering, Machine Learning, or a related field.
  • Familiarity with PyTorch and deep learning concepts, particularly regarding model optimization and memory management.
  • Understanding of CUDA programming and hardware-accelerated computation (experience with CUDA is a plus).
  • Strong programming skills in Python, with experience in PyTorch.
  • Analytical mindset with the ability to approach problems creatively.

Responsibilities

  • Research and analyze existing KV-Cache implementations used in LLM inference, particularly those utilizing lists of past-key-values PyTorch tensors.
  • Investigate ‘Paged Attention’ mechanisms that leverage dedicated CUDA data structures to optimize memory for variable sequence lengths.
  • Design and implement a torch-native dynamic KV-Cache model that can be integrated seamlessly within PyTorch.
  • Model KV-Cache behavior within the PyTorch compute graph to improve compatibility with torch.compile and facilitate the export of the compute graph.
  • Conduct experiments to evaluate memory utilization and inference efficiency on D-Matrix hardware.

Preferred Qualifications

  • Experience with deep learning model inference optimization.
  • Knowledge of data structures used in machine learning for memory and compute efficiency.
  • Experience with hardware-specific optimization, especially on custom hardware like D-Matrix, is an advantage.