https://store-images.s-microsoft.com/image/apps.19458.a1fe5887-0e95-48af-b46f-0b74d1a6286d.2339660c-ea4e-489f-bba5-024986be3c83.04006613-6a58-47a1-8704-9d5bf6b341da

voyage-multimodal-3 Embedding Model

MongoDB, Inc.

voyage-multimodal-3 Embedding Model

MongoDB, Inc.

Multimodal embedding model that can vectorize interleaved text and content-rich images. 32K context.

Multimodal embedding models are neural networks that transform multiple modalities, such as text and images, into numerical vectors. They are a crucial building block for semantic search/retrieval systems and retrieval-augmented generation (RAG) and are responsible for the retrieval quality.

voyage-multimodal-3 is a state-of-the-art multimodal embedding model that uniquely vectorizes interleaved texts + images while capturing visual features from PDFs, slides, tables, figures, and more, eliminating complex document parsing. It improves retrieval accuracy by an average of 19.63% over the next best-performing multimodal embedding model when evaluated across 3 multimodal retrieval tasks (20 total datasets).