Quantizing Large Language Models

Quantization is an excellent technique to compress Large Language Models (LLM) and accelerate their inference. In this session, lets explore different quantization methods and techniques, the common libraries used and also discuss evaluation of performance and quality of quantized LLMs using standard metrics

About the speaker

Supriya Raman

AI Engineering Manager at IBM WatsonX Labs

With 17+ years of experience, Supriya Raman is currently working as Senior Vice President, Data Engineering at JPMorgan Chase. In her current role, she is responsible for defining, leading and evangelizing analytics solutions leveraging Data Science offerings. She is directing end-to-end model development and deployment lifecycle, accomplishing mission-critical user-centric deliveries using Scrum methodology.

Her key skills are in NLP and Generative AI. She is also actively involved in career mentoring, upskilling and coaching students from reputed universities and working professionals. She is Google Women Tech Ambassador and part of IEEE ICWITE Organizing Committee

When

Online Event: September 26, 2024

Contact

nlpsummit@johnsnowlabs.com

Presented by