Quantizing Large Language Models
Quantization is an excellent technique to compress Large Language Models (LLM) and accelerate their inference. In this session, lets explore different quantization methods and techniques, the common libraries used and also discuss evaluation of performance and quality of quantized LLMs using standard metrics
About the speaker
Supriya Raman
Senior Vice President, MLOps at JPMorgan Chase & Co.