Posted in

Software Engineer – Machine Learning Infrastructure

Software Engineer – Machine Learning Infrastructure

CompanyCharacter.AI
LocationMenlo Park, CA, USA, New York, NY, USA
Salary$150000 – $350000
TypeFull-Time
Degrees
Experience LevelMid Level, Senior

Requirements

  • 4+ years of experience supporting the infrastructure within an ML environment
  • Experience in developing tools used to diagnose ML infrastructure problems and failures
  • Experience with cloud platforms (e.g., Compute Engine, Kubernetes, Cloud Storage)
  • Experience working with GPUs

Responsibilities

  • Provide infrastructure support to our ML research and product
  • Build tooling to diagnose cluster issues and hardware failures
  • Monitor deployments, manage experiments, and generally support our research
  • Maximize GPU allocation and utilization for both serving and training

Preferred Qualifications

  • Experience with large GPU clusters and high-performance computing/networking
  • Experience with supporting large language model training
  • Experience with ML frameworks like Pytorch/TensorFlow/JAX
  • Experience with GPU kernel development