Machine Learning Intern
Company | d-Matrix |
---|---|
Location | Santa Clara, CA, USA |
Salary | $Not Provided – $Not Provided |
Type | Internship |
Degrees | Bachelor’s, Master’s |
Experience Level | Internship |
Requirements
- Currently pursuing a degree in Computer Science, Electrical Engineering, Machine Learning, or a related field.
- Familiarity with PyTorch and deep learning concepts, particularly regarding model optimization and memory management.
- Understanding of CUDA programming and hardware-accelerated computation (experience with CUDA is a plus).
- Strong programming skills in Python, with experience in PyTorch.
- Analytical mindset with the ability to approach problems creatively.
Responsibilities
- Research and analyze existing KV-Cache implementations used in LLM inference, particularly those utilizing lists of past-key-values PyTorch tensors.
- Investigate ‘Paged Attention’ mechanisms that leverage dedicated CUDA data structures to optimize memory for variable sequence lengths.
- Design and implement a torch-native dynamic KV-Cache model that can be integrated seamlessly within PyTorch.
- Model KV-Cache behavior within the PyTorch compute graph to improve compatibility with torch.compile and facilitate the export of the compute graph.
- Conduct experiments to evaluate memory utilization and inference efficiency on D-Matrix hardware.
Preferred Qualifications
- Experience with deep learning model inference optimization.
- Knowledge of data structures used in machine learning for memory and compute efficiency.
- Experience with hardware-specific optimization, especially on custom hardware like D-Matrix, is an advantage.