Posted in

AI Evaluation Engineer

AI Evaluation Engineer

CompanyTrunk Tools
LocationAustin, TX, USA, New York, NY, USA
Salary$Not Provided – $Not Provided
TypeFull-Time
DegreesMaster’s, PhD
Experience LevelMid Level

Requirements

  • MS/PhD in Computer Science, Machine Learning, Artificial Intelligence or a related field
  • 2+ years of experience evaluating AI and/or ML systems, with a focus on performance metrics and validation
  • Hands-on experience with observability, analytics platforms, or data engineering to create robust monitoring pipelines
  • Proficiency in Python and strong experience with machine learning frameworks such as scikit-learn, TensorFlow, PyTorch
  • Knowledge of retrieval-augmented generation (RAG) and agent-based workflows, including best practices for measuring their performance
  • Experience with synthetic data generation or test automation to validate model robustness
  • Strong problem-solving skills and a collaborative mindset, eager to work in a fast-paced environment

Responsibilities

  • Design and implement rigorous evaluation frameworks and performance metrics for AI systems (including RAG and agent-based architectures)
  • Develop tools, dashboards, and processes that bring observability to every step of the AI development lifecycle
  • Collaborate cross-functionally to embed best-in-class monitoring and testing methodologies into production workflows
  • Identify bottlenecks and propose solutions to ensure high accuracy and reliability across all AI components
  • Stay at the forefront of industry trends in LLMs, measurement techniques, and agent architectures to enhance system evaluation capabilities

Preferred Qualifications

  • Bonus: Experience with reinforcement learning, reward function design and policy optimization
  • Bonus: Construction industry knowledge or an interest in automating complex, large-scale processes