Encrypted RAG with enVector

In this tutorial, we will walk through the steps to use the enVector SDK for Encrypted Retrieval-Augmented Generation (Encrypted RAG) using fully homomorphic encryption (FHE).

Import SDK

First, you should install and import the pyenvector package to use enVector Python APIs. Before installing, make sure you have Python 3 and a virtual environment on your system.

For more details, see SDK installation section.

import pyenvector as ev

Connect and Initialize

To use the enVector service, the initialization step is required. The following initialization step includes establishing a connection to the enVector server and configuring cryptographic settings necessary for vector search.

For more details, see Initialize section.

ev.init(
    address="localhost:50050",
    # access_token="...", # if needed
    key_path="./keys",
    key_id="rag_key_id",
)
<pyenvector.client.client.EnvectorClient at 0x7ff4fdd8a7b0>

Prepare Data

Prepare Plaintext Vectors

To perform RAG, we need to prepare the plaintext text embedding vectors.

Note that these vectors should be normalized for the identification metric, cosine similarity. This is just one example of text embedding that uses sentence-transformers, you can also use your own embedding model to generate vectors from your text dataset.

Create Index

For encrypted similarity search, we first prepare a vector index, called Index, to store encrypted vectors and their metadata in the enVector system.

For more details, see Index Creation section.

Insert Data

Once the index is ready, you can encrypt and insert data into it. This first encrypts the vectors using the generated encryption keys and inserts them into the created index. The data to be inserted can be in the form of vectors and associated metadata that provide additional context for RAG.

For more details, see Insert section.

Prepare query

First, prepare a query for encrypted search.

Encrypted search on the index

Let's perform an encrypted similarity search for encrypted RAG.

Once the encrypted vector index and encrypted query vectors are ready, we can perform a similarity search on encrypted data without decrypting it. The index object contains the decryption key, enabling the enVector server to return encrypted scores. These scores are decrypted by the client to retrieve the top-k relevant results along with their indices. After identifying the indices by decryption and top-k selection, we retrieve the encrypted documents and decrypt them to obtain the plaintext.

For more details, see Search section.

Generate Answers with Retrieval-augmented Context

Once the decrypted documents are retrieved, we can use an LLM (e.g. OpenAI's GPT) to generate answers based on the retrieved documents.

Note: Make sure your LLM is served via an API.

In this example, we run the gemma3:270m model locally with ollama as follows:

Then generate an answer using your LLM with the retrieved context.

Clean Up

We can delete the created index and the registered key when they are no longer needed.

Last updated