Skip to content

Research Engineer – Tokens ML Infra
Company | Anthropic |
---|
Location | San Francisco, CA, USA |
---|
Salary | $315000 – $425000 |
---|
Type | Full-Time |
---|
Degrees | Master’s, PhD |
---|
Experience Level | Mid Level, Senior |
---|
Requirements
- Strong software engineering skills with experience in building distributed systems
- Expertise in Python and experience with distributed computing frameworks
- Deep understanding of cloud computing platforms and distributed systems architecture
- Experience with high-throughput, fault-tolerant system design
- Strong background in performance optimization and system scaling
- Excellent problem-solving skills and attention to detail
- Strong communication skills and ability to work in a collaborative environment
Responsibilities
- Design and implement high-performance ML training infrastructure for large language model research
- Develop and maintain core ML framework primitives in JAX, PyTorch, etc.
- Create robust automated evaluation and benchmarking systems for model performance
- Implement comprehensive monitoring and debugging tools for ML workflows
- Design and optimize data loading pipelines that maximize training throughput
- Build MLOps tooling to support reproducible research and experimentation
- Collaborate with research teams to prototype and scale novel training architectures
- Develop infrastructure for efficient hyperparameter sweeps and architecture search
Preferred Qualifications
- Advanced degree (MS or PhD) in Computer Science or related field
- Experience with language model training infrastructure
- Strong background in distributed systems and parallel computing
- Expertise in tokenization algorithms and techniques
- Experience building high-throughput, fault-tolerant systems
- Deep knowledge of monitoring and observability practices
- Experience with infrastructure-as-code and configuration management
- Background in MLOps or ML infrastructure