Quantizing Large Language Models

Quantization is an excellent technique to compress Large Language Models (LLM) and accelerate their inference. In this session, lets explore different quantization methods and techniques, the common libraries used and also discuss evaluation of performance and quality of quantized LLMs using standard metrics

 

About the speaker
Amy-Heineike

Supriya Raman

Senior Vice President, MLOps at JPMorgan Chase & Co.

NLP-Summit

When

Online Event: September 26, 2024

 

Contact

nlpsummit@johnsnowlabs.com

Presented by

jhonsnow_logo