Evaluating GenAI Applications in Healthcare

Evaluating GenAI Applications in Healthcare

Reliability, accuracy, observability, and auditability are crucial in building LLM workflows in healthcare. All of these rely on the ability to measure LLM automations at scale. But as the metrics we care about in GenAI applications (e.g. hallucinations, adherence to a policy, etc.) are complex, traditional machine learning or NLP metrics are not relevant anymore. These measurements can only be conducted by other LLMs that are tuned specifically for judging, i.e., LLM‑Judges. As the evaluators are LLMs themselves in this paradigm, they also need to be observed, measured, and tuned to prevent drifts from expected behaviour. This talk will delve into LLM‑Judges in the context of healthcare LLM workflows and agents.

About the speaker

Ouz Gencoglu

Co-Founder & Head of AI
at Root Signals

When

Online Event | April 1-2, 2025

Contact

nlpsummit@johnsnowlabs.com

Presented by