Using Vector Databases to Scale Multimodal Embeddings and Search
Many real-world problems are inherently multimodal, from the communicative modalities humans use such as spoken language and gestures to the force, proprioception, and visual sensors ubiquitous in robotics. In order for machine learning models to address these problems and interact more naturally and wholistically with the world around them and ultimately be more general and powerful reasoning engines we need them to understand data across all of its corresponding image, video, text, audio, and tactile representations.
In this talk, I will discuss how we can use multimodal models, that can see, hear, read, and feel data(!), to perform cross-modal retrieval/search(searching audio with images, videos with text etc.) at the billion-object scale with the help of vector databases. I will also demonstrate, with live code demos and large-scale datasets, how being able to perform this cross-modal retrieval in real-time can help us guide the generative capabilities of large language models by grounding it in the relevant source material.
This talk will revolve around how we scaled the usage of multimodal embedding models in production and how our users and customers leverage them for cross-modal search and subsequent retrieval augmented generation with their large-scale multimodal datasets.
Zain Hasan
Senior ML Developer Advocate at Weaviate
Zain Hasan is a senior ML developer advocate at Weaviate, an open-source vector database. An engineer and data scientist by training, he pursued his undergraduate and graduate work at the University of Toronto St. George building artificially intelligent assistive technologies, then founded his company, VinciLabs, which operated at the intersection of digital health and machine learning.
More recently he practiced as a consultant senior data scientist in Toronto. Zain is passionate about the field of machine learning, education, and public speaking.