Encrypted RAG with enVector

In this tutorial, we will walk through the steps to use the enVector SDK for Encrypted Retrieval-Augmented Generation (Encrypted RAG) using fully homomorphic encryption (FHE).

Import SDK

First, you should install and import the pyenvector package to use enVector Python APIs. Before installing, make sure you have Python 3 and a virtual environment on your system.

For more details, see SDK installation section.

import pyenvector as ev

Connect and Initialize

To use the enVector service, the initialization step is required. The following initialization step includes establishing a connection to the enVector server and configuring cryptographic settings necessary for vector search.

For more details, see Initialize section.

ev.init(
    address="localhost:50050",
    # access_token="...", # if needed
    key_path="./keys",
    key_id="rag_key_id",
)

<pyenvector.client.client.EnvectorClient at 0x7ff4fdd8a7b0>

Prepare Data

Prepare Plaintext Vectors

To perform RAG, we need to prepare the plaintext text embedding vectors.

Note that these vectors should be normalized for the identification metric, cosine similarity. This is just one example of text embedding that uses sentence-transformers, you can also use your own embedding model to generate vectors from your text dataset.

from typing import List, Union
from fastembed import TextEmbedding
import numpy as np

# 1. Load a pretrained text embedding model
model = TextEmbedding("sentence-transformers/all-MiniLM-L6-v2")
dim = model.embedding_size

# 2. Calculate embeddings by calling model.embed()
def get_embedding(texts: Union[str, List[str]]) -> np.ndarray:
    BATCH_SIZE=128
    if isinstance(texts, str):
        texts = [texts]
    embeddings = np.empty((0, model.embedding_size))
    for i in range(0, len(texts), BATCH_SIZE):
        batch_texts = texts[i : i + BATCH_SIZE]
        batch_embeddings = list(model.embed(batch_texts))
        embeddings = np.vstack([embeddings, batch_embeddings])
    embeddings = np.squeeze(embeddings)
    return embeddings

# Prepare vectors to be indexed
texts = [
    "The capital of USA is Washington, D.C.",
    "The capital of South Korea is Seoul.",
    "The capital of France is Paris.",
    "The capital of Germany is Berlin.",
    "The capital of Italy is Rome.",
    "The capital of Canada is Ottawa.",
]

# Get embeddings
vectors = get_embedding(texts)

Create Index

For encrypted similarity search, we first prepare a vector index, called Index, to store encrypted vectors and their metadata in the enVector system.

For more details, see Index Creation section.

index = ev.create_index("rag_index", dim=dim)

Insert Data

Once the index is ready, you can encrypt and insert data into it. This first encrypts the vectors using the generated encryption keys and inserts them into the created index. The data to be inserted can be in the form of vectors and associated metadata that provide additional context for RAG.

For more details, see Insert section.

index.insert(vectors, metadata=texts)

Encrypt and Insert: 100%|██████████| 1/1 [00:00<00:00, 10.10it/s]





[1, 2, 3, 4, 5, 6]

Encrypted Similarity Search

Prepare query

First, prepare a query for encrypted search.

query_text = "What is the capital of France?"

query_vector = get_embedding(query_text)

Encrypted search on the index

Let's perform an encrypted similarity search for encrypted RAG.

Once the encrypted vector index and encrypted query vectors are ready, we can perform a similarity search on encrypted data without decrypting it. The index object contains the decryption key, enabling the enVector server to return encrypted scores. These scores are decrypted by the client to retrieve the top-k relevant results along with their indices. After identifying the indices by decryption and top-k selection, we retrieve the encrypted documents and decrypt them to obtain the plaintext.

For more details, see Search section.

result = index.search(query_vector, top_k=1, output_fields=["metadata"])[0]

retrieved_docs = [res["metadata"] for res in result]
retrieved_docs

['The capital of France is Paris.']

Generate Answers with Retrieval-augmented Context

Once the decrypted documents are retrieved, we can use an LLM (e.g. OpenAI's GPT) to generate answers based on the retrieved documents.

Note: Make sure your LLM is served via an API.

In this example, we run the gemma3:270m model locally with ollama as follows:

# Example: run the LLM with ollama in your shell (bash)
ollama serve
ollama run gemma3:270m

Then generate an answer using your LLM with the retrieved context.

import requests

def generate_answer(docs, query, model="gemma3:270m"):
    instruction = "You are an assistant that answers questions based on the provided documents."
    prompt = f"""{instruction}:\n\n[Documents]\n"""
    for doc in docs:
        prompt += f"- {doc}\n"
    prompt += f"\n[Question]\n{query}\n[Answer]\n"
    
    response = requests.post(
        "http://localhost:11434/api/chat",
        json={
            "model": model,
            "messages": [
                {"role": "system", "content": instruction},
                {"role": "user", "content": prompt}
            ],
            "stream": False
        }
    )
    response.raise_for_status()
    return response.json()["message"]["content"].strip()

answer = generate_answer(retrieved_docs, query_text)
print(f"Generated Answer: \n{answer}")

Generated Answer: 
The capital of France is Paris.

Clean Up

We can delete the created index and the registered key when they are no longer needed.

ev.drop_index("rag_index")

<pyenvector.client.client.EnvectorClient at 0x7ff4fdd8a7b0>

ev.delete_key("rag_key_id")

<pyenvector.client.client.EnvectorClient at 0x7ff4fdd8a7b0>

PreviousLangChain NextEncrypted Face Recognition with enVector

Last updated 2 months ago

hashtagImport SDK

hashtagConnect and Initialize

hashtagPrepare Data

hashtagPrepare Plaintext Vectors

hashtagCreate Index

hashtagInsert Data

hashtagEncrypted Similarity Search

hashtagPrepare query

hashtagEncrypted search on the index

hashtagGenerate Answers with Retrieval-augmented Context

hashtagClean Up