Lessons learned deploying a LLM document processing system at scale

Vendr and Extend built an LLM document processing system to analyze more than 3 million pages from 100,000 documents across 20+ categories, from highly unstructured sales contracts to 50-page legal agreements. This session shares lessons learned from transforming these unstructured documents into structured data:

  • Different techniques for reliably using LLMs for accuracy-intensive use cases, including LLM confidence signals, logprobs, data validations, and human-in-the-loop tooling
  • Using evals to determine the best model for the job, and when you should use OpenAI vs Anthropic vs open-source models
  • How to improve performance over time via prompt optimizations, fine-tuning, and few-shot feedback loops
  • Challenges overcome in mapping LLM outputs into a structured data catalog
  • Employing text embeddings and targeted data reviews to build a trustworthy, high-quality dataset

By the end of this presentation, you’ll be armed with knowledge to deliver an LLM document processing system at scale – and get a glimpse into the future of unstructured data processing.

About the speaker
Amy-Heineike

Mark Andersen

VP, Data Science & Analytics at Vendr

Amy-Heineike

Stefan Jol

Director of Machine Learning at Vendr

Amy-Heineike

Kushal Byatnaik

CEO at Extend

NLP-Summit

When

Online Event: September 26, 2024

 

Contact

nlpsummit@johnsnowlabs.com

Presented by

jhonsnow_logo