Développeur senior d’opérations d’apprentissage automatique – Optimisation d’inférence/ Senior Mach… – Inference Optimization
Company | Cerence |
---|---|
Location | Montreal, QC, Canada |
Salary | $Not Provided – $Not Provided |
Type | Full-Time |
Degrees | Master’s, PhD |
Experience Level | Senior |
Requirements
- 6+ years of experience in software engineering, with a focus on AI/ML.
- Deep expertise in AI model optimization techniques, including quantization, pruning, knowledge distillation, and hardware-aware model design.
- Proficiency in programming languages such as Python, C++, or Rust.
- Experience with AI/ML frameworks such as TensorFlow, PyTorch, and ONNX.
- Hands-on experience with GPU/TPU acceleration and deployment in cloud and edge environments.
- Strong DevOps mindset with experience in Kubernetes, containers, deployments, dashboards, high availability, autoscaling, metrics, and logs.
- Strong problem-solving skills and the ability to make data-driven decisions.
- Excellent communication skills and the ability to articulate complex technical concepts to a diverse audience.
Responsibilities
- Design, develop, and implement strategies to optimize AI/ML inference pipelines for performance, scalability, and cost efficiency.
- Collaborate closely with other Principal and Senior Engineers on the team, fostering a culture of knowledge-sharing and joint problem-solving.
- Work with cross-functional teams, including MLOps, data science, and software engineering, to integrate optimized inference solutions into production environments.
- Drive innovation in hardware acceleration, quantization, model compression, and distributed inference techniques.
- Stay up-to-date with LLM hosting frameworks and their configuration on both machine and cluster levels (e.g., vLLM, TensorRT, KubeFlow).
- Optimize systems using techniques such as batching, caching, and speculative decoding.
- Conduct performance tuning, benchmarking, and profiling for inference systems, with expertise in memory management, threading, concurrency, and GPU optimization.
- Manage model repositories, artifact delivery, and related infrastructure.
- Develop and maintain logging mechanisms for diagnostics and research purposes.
Preferred Qualifications
- Experience with Kubernetes, Docker, and CI/CD pipelines for AI/ML workloads.
- Familiarity with MLOps practices and tools, including model versioning and monitoring.
- Familiarity with performance tuning of inference engines like vLLM and techniques such as LoRA adapters.
- Understanding of LLM architecture and optimization.
- Contributions to open-source AI/ML projects.
- Familiarity with automotive or transportation industry applications.
- Master’s or Ph.D. in Computer Science, Machine Learning, or a related field.