Posted in

Machine Learning Engineer – Staff – Model Factory

Machine Learning Engineer – Staff – Model Factory

Companyd-Matrix
LocationSanta Clara, CA, USA
Salary$155000 – $250000
TypeFull-Time
DegreesBachelor’s, Master’s
Experience LevelSenior, Expert or higher

Requirements

  • BS in Computer Science with 7+ years or MS in Computer Science with 4+ years
  • Strong programming skills in Python and experience with ML frameworks like PyTorch, TensorFlow, or JAX
  • Hands-on experience with model optimization, quantization, and inference acceleration
  • Deep understanding of Transformer architectures, attention mechanisms, and distributed inference (Tensor Parallel, Pipeline Parallel, Sequence Parallel)
  • Knowledge of quantization (INT8, BF16, FP16) and memory-efficient inference techniques
  • Solid grasp of software engineering best practices, including CI/CD, containerization (Docker, Kubernetes), and MLOps
  • Strong problem-solving skills and ability to work in a fast-paced, iterative development environment

Responsibilities

  • Design, build, and optimize machine learning deployment pipelines for large-scale models
  • Implement and enhance model inference frameworks
  • Develop automated workflows for model development, experimentation, and deployment
  • Collaborate with research, architecture, and engineering teams to improve model performance and efficiency
  • Work with distributed computing frameworks (e.g., PyTorch/XLA, JAX, TensorFlow, Ray) to optimize model parallelism and deployment
  • Implement scalable KV caching and memory-efficient inference techniques for transformer-based models
  • Monitor and optimize infrastructure performance across different levels of custom hardware hierarchy – cards, servers, and racks, which are powered by the d-Matrix custom AI chips
  • Ensure best practices in ML model versioning, evaluation, and monitoring

Preferred Qualifications

  • Experience working with cloud-based ML pipelines (AWS, GCP, or Azure)
  • Experience with LLM fine-tuning, LoRA, PEFT, and KV cache optimizations
  • Contributions to open-source ML projects or research publications
  • Experience with low-level optimizations using CUDA, Triton, or XLA