Machine Learning Engineer - Staff - Model Factory

Machine Learning Engineer – Staff – Model Factory

BS in Computer Science with 7+ years or MS in Computer Science with 4+ years
Strong programming skills in Python and experience with ML frameworks like PyTorch, TensorFlow, or JAX
Hands-on experience with model optimization, quantization, and inference acceleration
Deep understanding of Transformer architectures, attention mechanisms, and distributed inference (Tensor Parallel, Pipeline Parallel, Sequence Parallel)
Knowledge of quantization (INT8, BF16, FP16) and memory-efficient inference techniques
Solid grasp of software engineering best practices, including CI/CD, containerization (Docker, Kubernetes), and MLOps
Strong problem-solving skills and ability to work in a fast-paced, iterative development environment

Design, build, and optimize machine learning deployment pipelines for large-scale models
Implement and enhance model inference frameworks
Develop automated workflows for model development, experimentation, and deployment
Collaborate with research, architecture, and engineering teams to improve model performance and efficiency
Work with distributed computing frameworks (e.g., PyTorch/XLA, JAX, TensorFlow, Ray) to optimize model parallelism and deployment
Implement scalable KV caching and memory-efficient inference techniques for transformer-based models
Monitor and optimize infrastructure performance across different levels of custom hardware hierarchy – cards, servers, and racks, which are powered by the d-Matrix custom AI chips
Ensure best practices in ML model versioning, evaluation, and monitoring