Development and evaluation of an artificial intelligence chatbot for menopause information
We developed and evaluated an AI chatbot that provides reliable menopause information based on trusted, peer-reviewed sources, such as medical guidelines and position statements from The Menopause Society (TMS).
The chatbot was created using retrieval augmented generation (RAG) to enhance response accuracy by incorporating relevant content from TMS position statements. The chatbot was evaluated (1-lowest, 5-highest) for faithfulness to TMS content, relevance, potential harmfulness, and clinical correctness over a diverse range of test inputs. The evaluation was conducted both automatically and manually by clinicians, with scores averaged across criteria.
The chatbot demonstrated high faithfulness (average score 4.43) and relevance (4.59), with a 95% faithfulness score in automated claims analysis. Clinical correctness scored 4.44, and potential harmfulness was minimal (4.93). Our chatbot shows promise for clinical use due to high accuracy and low harm risk, but further research is required to build additional guardrails and to support the integration of other types of trusted, peer-reviewed documents.
Alex Handy
Head of Data Science at Vira Health
I enjoy developing integrated data, insights and scientific teams that identify and solve real business problems and make products to improve people’s lives. I currently run the data science team at Vira Health, where I am responsible for all aspects of data including strategy, data platform, engineering and governance, product analytics and BI, customer insights and data science and machine learning. I also hold a senior research fellowship at University College London where I conduct research on applying AI in healthcare.