Développeur senior d’opérations d’apprentissage automatique – Optimisation d’inférence/ Senior Mach… - Inference Optimization

Développeur senior d’opérations d’apprentissage automatique – Optimisation d’inférence/ Senior Mach… – Inference Optimization

6+ years of experience in software engineering, with a focus on AI/ML.
Deep expertise in AI model optimization techniques, including quantization, pruning, knowledge distillation, and hardware-aware model design.
Proficiency in programming languages such as Python, C++, or Rust.
Experience with AI/ML frameworks such as TensorFlow, PyTorch, and ONNX.
Hands-on experience with GPU/TPU acceleration and deployment in cloud and edge environments.
Strong DevOps mindset with experience in Kubernetes, containers, deployments, dashboards, high availability, autoscaling, metrics, and logs.
Strong problem-solving skills and the ability to make data-driven decisions.
Excellent communication skills and the ability to articulate complex technical concepts to a diverse audience.

Design, develop, and implement strategies to optimize AI/ML inference pipelines for performance, scalability, and cost efficiency.
Collaborate closely with other Principal and Senior Engineers on the team, fostering a culture of knowledge-sharing and joint problem-solving.
Work with cross-functional teams, including MLOps, data science, and software engineering, to integrate optimized inference solutions into production environments.
Drive innovation in hardware acceleration, quantization, model compression, and distributed inference techniques.
Stay up-to-date with LLM hosting frameworks and their configuration on both machine and cluster levels (e.g., vLLM, TensorRT, KubeFlow).
Optimize systems using techniques such as batching, caching, and speculative decoding.
Conduct performance tuning, benchmarking, and profiling for inference systems, with expertise in memory management, threading, concurrency, and GPU optimization.
Manage model repositories, artifact delivery, and related infrastructure.
Develop and maintain logging mechanisms for diagnostics and research purposes.

Experience with Kubernetes, Docker, and CI/CD pipelines for AI/ML workloads.
Familiarity with MLOps practices and tools, including model versioning and monitoring.
Familiarity with performance tuning of inference engines like vLLM and techniques such as LoRA adapters.
Understanding of LLM architecture and optimization.
Contributions to open-source AI/ML projects.
Familiarity with automotive or transportation industry applications.
Master’s or Ph.D. in Computer Science, Machine Learning, or a related field.