Encrypted Multi-Modal Search with enVector

In this tutorial, we introduce how enVector improves the security of Multi-Modal Search.

Recent advances in AI have enabled LLMs and multimodal models to understand user intent and process text, images, audio, and video in an integrated manner. At the core of this capability are embedding vectors, which numerically represent data and allow semantic similarity comparison across modalities. Although embeddings provide the expressiveness, they pose security risks because original data can potentially be reconstructed from them.

To address this issue, enVector performs encrypted vector search using fully homomorphic encryption (FHE). enVector ensures strong data privacy and security against various threats while enabling safe and powerful multimodal search in untrusted environments.

Understanding Multi-Modal Search with enVector

Multi-modal models align embeddings from different modalities, such as text and images, into a shared semantic space. Each modality is encoded separately, and multi-modal models ensures that they share embedding space, enabling cross-modal understanding and retrieval. Because all modalities lie in the same space, simple similarity measures like cosine similarity support efficient and scalable multi-modal search.

Multi-Modal Search Overview

For example, we demonstrate Text-to-Image Search using enVector. Text-to-Image Search retrieves images whose embeddings are most similar to a user’s textual query, and is widely used in applications such as e-commerce and photo search. With enVector, the query embedding is encrypted and the server performs similarity search directly on encrypted vectors in the index without accessing the query or results in plaintext. For efficient search, we use an approximate nearest neighbor (ANN) method that scales to large datasets.

Load Dataset

enVector Initialization

Index Creation

Get Query

png

enVector allows for powerful but secure cross-modal retrieval applications including multi-modal search. Multi-modal models align vectors from different modalities—such as text, images, and audio—into a shared embedding space, enabling semantic search across modalities through simple distance metrics like cosine similarity. As a result, enVector has become a core component of modern retrieval systems, allowing for flexible, scalable, and modality-agnostic search experiences!

Last updated