Lessons learned deploying a LLM document processing system at scale
Vendr and Extend built an LLM document processing system to analyze more than 3 million pages from 100,000 documents across 20+ categories, from highly unstructured sales contracts to 50-page legal agreements. This session shares lessons learned from transforming these unstructured documents into structured data:
- Different techniques for reliably using LLMs for accuracy-intensive use cases, including LLM confidence signals, logprobs, data validations, and human-in-the-loop tooling
- Using evals to determine the best model for the job, and when you should use OpenAI vs Anthropic vs open-source models
- How to improve performance over time via prompt optimizations, fine-tuning, and few-shot feedback loops
- Challenges overcome in mapping LLM outputs into a structured data catalog
- Employing text embeddings and targeted data reviews to build a trustworthy, high-quality dataset
By the end of this presentation, you’ll be armed with knowledge to deliver an LLM document processing system at scale – and get a glimpse into the future of unstructured data processing.
About the speaker
Mark Andersen
VP, Data Science & Analytics at Vendr
Stefan Jol
Director of Machine Learning at Vendr
Kushal Byatnaik
CEO at Extend