Evaluating GenAI Applications in Healthcare
Reliability, accuracy, observability, and auditability are crucial in building LLM workflows in healthcare. All of these rely on the ability to measure LLM automations at scale. But as the metrics we care about in GenAI applications (e.g. hallucinations, adherence to a policy, etc.) are complex, traditional machine learning or NLP metrics are not relevant anymore. These measurements can only be conducted by other LLMs that are tuned specifically for judging, i.e., LLM‑Judges. As the evaluators are LLMs themselves in this paradigm, they also need to be observed, measured, and tuned to prevent drifts from expected behaviour. This talk will delve into LLM‑Judges in the context of healthcare LLM workflows and agents.
About the speaker

Ouz Gencoglu
Co-Founder & Head of AI
at Root Signals
When
Sessions: April 2nd – 3rd 2024
Trainings: April 15th – 19th 2024