Development and evaluation of an artificial intelligence chatbot for menopause information

We developed and evaluated an AI chatbot that provides reliable menopause information based on trusted, peer-reviewed sources, such as medical guidelines and position statements from The Menopause Society (TMS).

The chatbot was created using retrieval augmented generation (RAG) to enhance response accuracy by incorporating relevant content from TMS position statements. The chatbot was evaluated (1-lowest, 5-highest) for faithfulness to TMS content, relevance, potential harmfulness, and clinical correctness over a diverse range of test inputs. The evaluation was conducted both automatically and manually by clinicians, with scores averaged across criteria.

The chatbot demonstrated high faithfulness (average score 4.43) and relevance (4.59), with a 95% faithfulness score in automated claims analysis. Clinical correctness scored 4.44, and potential harmfulness was minimal (4.93). Our chatbot shows promise for clinical use due to high accuracy and low harm risk, but further research is required to build additional guardrails and to support the integration of other types of trusted, peer-reviewed documents.

About the speaker
Amy-Heineike

Alex Handy

Head of Data Science at Vira Health

NLP-Summit

When

Online Event: September 25, 2024

 

Contact

nlpsummit@johnsnowlabs.com

Presented by

jhonsnow_logo