Posted in

ML Operations Engineer

ML Operations Engineer

CompanyMaxar Technologies
LocationReston, VA, USA
Salary$131000 – $219000
TypeFull-Time
DegreesBachelor’s
Experience LevelSenior, Expert or higher

Requirements

  • Eight (8) years’ experience in machine learning, data science, software engineering, data analytics, or DevOps
  • Bachelor of Science (BS) Degree from an accredited university in a technical field is required. Five (5) additional years of experience in storage operations may be considered in lieu of degree.
  • Experience building and deploying machine learning models
  • Experience with Docker or Kubernetes for production-grade solutions
  • Proficiency in programming languages for data science, including Python, Java, and C++
  • Experience architecting and deploying AI solutions with generative models, including large language models (LLMs)
  • Familiarity with CI/CD pipelines, including Jenkins or Gitlab
  • Experience with enterprise data platforms
  • Knowledge of automation technologies, including Ansible and/or Terraform
  • Must meet DoD 8570 IAT Level II requirements including one of the following: Security+ CE, CND, SSCP, GSEC, GICSP, CySA+, or CCNA Security
  • Top Secret SCI with a CI Polygraph

Responsibilities

  • Architect, develop, and implement systems that enhance software development and machine learning processes
  • Streamline the software development life cycle from requirements to monitoring in production
  • Incorporate open-source tools and automation to reduce tedious tasks
  • Implement continuous integration and delivery to limit manual testing and troubleshooting
  • Enhance workflows and processes by building an enterprise-scale environment using DevOps methodologies
  • Collaborate with software engineers to deploy machine learning models, ensuring optimal performance and resource utilization
  • Architect and implement solutions to scale machine learning inference for large workloads
  • Monitor and fine-tune model inference for optimal speed and resource utilization
  • Implement automation tools and processes for model deployment, monitoring, and scaling
  • Develop robust monitoring and logging solutions to track model performance and system health in real-time
  • Maintain detailed documentation of machine learning operations processes and best practices
  • Provide technical support for debugging and resolving issues related to model deployment and inference

Preferred Qualifications

  • Ability to understand ML code leveraging modern ML and data frameworks such as Pytorch and Tensorflow.
  • Experience with data engineering with distributed data processing and distributed training
  • Experience with MLOps frameworks like Kubeflow, MLFlow, Airflow, etc.
  • Familiarity with containerization and orchestration tools such as Dockers and Kubernetes.
  • Knowledge of A/B testing and benchmarking model performance in production.
  • Experience with architecting and deploying machine learning cybersecurity tools such as Morpheus