Research Engineer / Scientist – Safeguards
Company | Anthropic |
---|---|
Location | San Francisco, CA, USA, New York, NY, USA |
Salary | $320000 – $560000 |
Type | Full-Time |
Degrees | Bachelor’s |
Experience Level | Mid Level, Senior |
Requirements
- Have significant software, ML, or research engineering experience
- Have some experience contributing to empirical AI research projects
- Have some familiarity with technical AI safety research
- Education requirements: We require at least a Bachelor’s degree in a related field or equivalent experience.
Responsibilities
- Testing the robustness of our safety techniques by training language models to subvert our safety techniques, and seeing how effective they are at subverting our interventions.
- Run multi-agent reinforcement learning experiments to test out techniques like AI Debate.
- Build tooling to efficiently evaluate the effectiveness of novel LLM-generated jailbreaks.
- Write scripts and prompts to efficiently produce evaluation questions to test models’ reasoning abilities in safety-relevant contexts.
- Contribute ideas, figures, and writing to research papers, blog posts, and talks.
- Run experiments that feed into key AI safety efforts at Anthropic, like the design and implementation of our Responsible Scaling Policy.
Preferred Qualifications
- Have experience authoring research papers in machine learning, NLP, or AI safety
- Have experience with LLMs
- Have experience with reinforcement learning
- Have experience with Kubernetes clusters and complex shared codebases