Quantizing Large Language Models
Quantization is an excellent technique to compress Large Language Models (LLM) and accelerate their inference. In this session, lets explore different quantization methods and techniques, the common libraries used and also discuss evaluation of performance and quality of quantized LLMs using standard metrics
Supriya Raman
AI Engineering Manager at IBM WatsonX Labs
With 17+ years of experience, Supriya Raman is currently working as Senior Vice President, Data Engineering at JPMorgan Chase. In her current role, she is responsible for defining, leading and evangelizing analytics solutions leveraging Data Science offerings. She is directing end-to-end model development and deployment lifecycle, accomplishing mission-critical user-centric deliveries using Scrum methodology.
Her key skills are in NLP and Generative AI. She is also actively involved in career mentoring, upskilling and coaching students from reputed universities and working professionals. She is Google Women Tech Ambassador and part of IEEE ICWITE Organizing Committee