AI Evaluation Engineer

MS/PhD in Computer Science, Machine Learning, Artificial Intelligence or a related field
2+ years of experience evaluating AI and/or ML systems, with a focus on performance metrics and validation
Hands-on experience with observability, analytics platforms, or data engineering to create robust monitoring pipelines
Proficiency in Python and strong experience with machine learning frameworks such as scikit-learn, TensorFlow, PyTorch
Knowledge of retrieval-augmented generation (RAG) and agent-based workflows, including best practices for measuring their performance
Experience with synthetic data generation or test automation to validate model robustness
Strong problem-solving skills and a collaborative mindset, eager to work in a fast-paced environment

Design and implement rigorous evaluation frameworks and performance metrics for AI systems (including RAG and agent-based architectures)
Develop tools, dashboards, and processes that bring observability to every step of the AI development lifecycle
Collaborate cross-functionally to embed best-in-class monitoring and testing methodologies into production workflows
Identify bottlenecks and propose solutions to ensure high accuracy and reliability across all AI components
Stay at the forefront of industry trends in LLMs, measurement techniques, and agent architectures to enhance system evaluation capabilities

Bonus: Experience with reinforcement learning, reward function design and policy optimization
Bonus: Construction industry knowledge or an interest in automating complex, large-scale processes