Deploying BLOOM: A 176B Parameter Multi-Lingual Large Language Model
In this talk, we will present the technology and infrastructure that enabled the deployment of BLOOM, the largest ever open-access multilingual model.
We will begin by outlining the challenges presented by large language models (LLMs) and the considerations taken to combat them. We will cover how we scaled our modelling code through 2D parameter and activation partitioning. We will present our hardware considerations and how we optimised for throughput. Following this, we will outline how we ported the system from a stand-alone model to a public Hugging Face demo, accepting user requests and returning generated text. We will conclude by discussing how we open-sourced our code and the platform that this provides practitioners in the community.
Sanchit Gandhi
Research Engineer at Hugging Face
An ML Engineer in the open-source speech team, Sanchit is a contributor and maintainer of Hugging Face Transformers, the current most popular state-of-the-art machine learning repository. Sanchit is pioneering the integration of JAX-based models to Transformers, enabling efficient and scalable inference for large language models.
Sanchit’s research interests lie in robust speech recognition, namely the use of pre-trained encoder/decoder checkpoints for generalisable and extensible speech systems.
Prior to working at Hugging Face, Sanchit completed his Master’s Degree from the University of Cambridge, writing his thesis on the topic of “Interpretability for Deep Learning” under the supervision of Professor Mark Gales.
Suraj Patil
Machine Learning Engineer at Hugging Face