Skip to content

ML Operations Engineer
Company | Maxar Technologies |
---|
Location | Reston, VA, USA |
---|
Salary | $131000 – $219000 |
---|
Type | Full-Time |
---|
Degrees | Bachelor’s |
---|
Experience Level | Senior, Expert or higher |
---|
Requirements
- Eight (8) years’ experience in machine learning, data science, software engineering, data analytics, or DevOps
- Bachelor of Science (BS) Degree from an accredited university in a technical field is required. Five (5) additional years of experience in storage operations may be considered in lieu of degree.
- Experience building and deploying machine learning models
- Experience with Docker or Kubernetes for production-grade solutions
- Proficiency in programming languages for data science, including Python, Java, and C++
- Experience architecting and deploying AI solutions with generative models, including large language models (LLMs)
- Familiarity with CI/CD pipelines, including Jenkins or Gitlab
- Experience with enterprise data platforms
- Knowledge of automation technologies, including Ansible and/or Terraform
- Must meet DoD 8570 IAT Level II requirements including one of the following: Security+ CE, CND, SSCP, GSEC, GICSP, CySA+, or CCNA Security
- Top Secret SCI with a CI Polygraph
Responsibilities
- Architect, develop, and implement systems that enhance software development and machine learning processes
- Streamline the software development life cycle from requirements to monitoring in production
- Incorporate open-source tools and automation to reduce tedious tasks
- Implement continuous integration and delivery to limit manual testing and troubleshooting
- Enhance workflows and processes by building an enterprise-scale environment using DevOps methodologies
- Collaborate with software engineers to deploy machine learning models, ensuring optimal performance and resource utilization
- Architect and implement solutions to scale machine learning inference for large workloads
- Monitor and fine-tune model inference for optimal speed and resource utilization
- Implement automation tools and processes for model deployment, monitoring, and scaling
- Develop robust monitoring and logging solutions to track model performance and system health in real-time
- Maintain detailed documentation of machine learning operations processes and best practices
- Provide technical support for debugging and resolving issues related to model deployment and inference
Preferred Qualifications
- Ability to understand ML code leveraging modern ML and data frameworks such as Pytorch and Tensorflow.
- Experience with data engineering with distributed data processing and distributed training
- Experience with MLOps frameworks like Kubeflow, MLFlow, Airflow, etc.
- Familiarity with containerization and orchestration tools such as Dockers and Kubernetes.
- Knowledge of A/B testing and benchmarking model performance in production.
- Experience with architecting and deploying machine learning cybersecurity tools such as Morpheus