Posted in

Staff Software Engineer – Machine Learning Infrastructure – Slack

Staff Software Engineer – Machine Learning Infrastructure – Slack

CompanySalesforce
LocationDallas, TX, USA, Atlanta, GA, USA
Salary$Not Provided – $Not Provided
TypeFull-Time
Degrees
Experience LevelSenior

Requirements

  • 5+ years experience with software engineering, which includes 3+ years in machine learning
  • Built large-scale, distributed, production ML/AI systems professionally
  • Worked on complex issues requiring in-depth knowledge of the company and existing architecture
  • Familiarity with modern methodologies for unit tests, code review, design documentation, debugging, and troubleshooting
  • Experience developing, monitoring, and deploying systems in cloud environments like AWS, GCP, and Azure
  • Experience with ops tools and frameworks such as Terraform, Chef, and Kubernetes
  • Experience with ML model serving frameworks/toolkits like Kubeflow, MLflow, AWS Bedrock and SageMaker
  • Experience with functional or imperative programming languages: PHP, Python, Ruby, Go, C, Scala or Java
  • Experience with Grafana, Prometheus, Honeycomb, or other monitoring software

Responsibilities

  • Managing deployments of machine learning models in our own kubernetes-based deployment system and through AWS Bedrock and SageMaker, working with tools like Chef, Hashicorp Terraform, and KubeRay
  • Optimizing our models and infrastructure to reduce latency and handle spikes in traffic
  • Constantly evaluating and improving our infrastructure to maximize efficiency and minimize costs
  • Setting up our model training infrastructure to fine tune embedding models while keeping our customer’s data secure
  • Working with our search team to generate embeddings at scale to power semantic search and enterprise search
  • Working with our ML Modeling and AI teams to support development of AI features and deployment at scale
  • Building and supporting an AI Platform
  • Supporting 24/7 on-call rotation

Preferred Qualifications

  • You’re analytical and data driven
  • Experience developing machine learning models in PyTorch, TensorFlow, XGBoost, Scikit-learn or similar
  • Experience with building data pipelines in Airflow, Spark, and similar
  • Experience with vector based retrieval like through Vespa, Milvus, or Solr