The Right Way to Do AI Evals (ft Freddie Vargus)

The Right Way to Do AI Evals (ft Freddie Vargus) - Ep 44

AI Summary

This video is episode 44 of Tool Use hosted by Mike Bird with guest Freddy Vargas, CTO and co-founder of Quotient AI. Freddy shares insights on how to use evals (evaluations) to systematically build and improve AI products faster. The conversation covers why evals are important for product improvement, especially as AI apps reach higher complexities like multi-turn interactions and tool calling.

Key points discussed include:

The role of evals in measuring AI system performance and avoiding regressions

Different levels of app complexity from single-turn prompts to multi-turn conversations with tool calls

The importance of testing function executability and state dependency in tool calls

Strategies for designing effective evals including milestones (desired states) and minefields (undesired states) in AI behavior

Methods to collect and store eval data realistically, like using databases and spreadsheets

How to bootstrap eval sets using role-played scenarios or early user data

Using human annotators versus AI judges for labeling and evaluation quality

Challenges around complexity and the need for intentional, thoughtful eval design

Practical advice to prioritize and improve AI models using continuous eval feedback

Freddy emphasizes the value of taking the eval process seriously as a systematic method to improve AI products, and invites viewers to learn more or collaborate via Quotient AI.

The episode provides deep practical guidance for developers building AI tools and agents, focusing on evaluation as a key factor in product success.

ThirdBrAIn.tech

Explorer

The Right Way to Do AI Evals (ft Freddie Vargus) - Ep 44

The Right Way to Do AI Evals (ft Freddie Vargus) - Ep 44

Graph View

Backlinks