Lessons learned deploying a LLM document processing system at scale

Vendr and Extend built an LLM document processing system to analyze more than 3 million pages from 100,000 documents across 20+ categories, from highly unstructured sales contracts to 50-page legal agreements. This session shares lessons learned from transforming these unstructured documents into structured data:

Different techniques for reliably using LLMs for accuracy-intensive use cases, including LLM confidence signals, logprobs, data validations, and human-in-the-loop tooling
Using evals to determine the best model for the job, and when you should use OpenAI vs Anthropic vs open-source models
How to improve performance over time via prompt optimizations, fine-tuning, and few-shot feedback loops
Challenges overcome in mapping LLM outputs into a structured data catalog
Employing text embeddings and targeted data reviews to build a trustworthy, high-quality dataset

By the end of this presentation, you’ll be armed with knowledge to deliver an LLM document processing system at scale – and get a glimpse into the future of unstructured data processing.

About the speaker

Mark Andersen

VP, Data Science & Analytics at Vendr

Stefan Jol

Director of Machine Learning at Vendr

Kushal Byatnaik

CEO at Extend

When

Online Event: September 26, 2024

Contact

nlpsummit@johnsnowlabs.com

Presented by