AI Evaluation Engineer
Company | Trunk Tools |
---|---|
Location | Austin, TX, USA, New York, NY, USA |
Salary | $Not Provided – $Not Provided |
Type | Full-Time |
Degrees | Master’s, PhD |
Experience Level | Mid Level |
Requirements
- MS/PhD in Computer Science, Machine Learning, Artificial Intelligence or a related field
- 2+ years of experience evaluating AI and/or ML systems, with a focus on performance metrics and validation
- Hands-on experience with observability, analytics platforms, or data engineering to create robust monitoring pipelines
- Proficiency in Python and strong experience with machine learning frameworks such as scikit-learn, TensorFlow, PyTorch
- Knowledge of retrieval-augmented generation (RAG) and agent-based workflows, including best practices for measuring their performance
- Experience with synthetic data generation or test automation to validate model robustness
- Strong problem-solving skills and a collaborative mindset, eager to work in a fast-paced environment
Responsibilities
- Design and implement rigorous evaluation frameworks and performance metrics for AI systems (including RAG and agent-based architectures)
- Develop tools, dashboards, and processes that bring observability to every step of the AI development lifecycle
- Collaborate cross-functionally to embed best-in-class monitoring and testing methodologies into production workflows
- Identify bottlenecks and propose solutions to ensure high accuracy and reliability across all AI components
- Stay at the forefront of industry trends in LLMs, measurement techniques, and agent architectures to enhance system evaluation capabilities
Preferred Qualifications
- Bonus: Experience with reinforcement learning, reward function design and policy optimization
- Bonus: Construction industry knowledge or an interest in automating complex, large-scale processes