RAG with LangChain enVector

This example demonstrates the complete workflow of the enVector Python SDK, showcasing its capabilities for Encrypted Retrieval-Augmented Generation (Encrypted RAG) using fully homomorphic encryption (FHE). In this example, we'll see:

How text data is stored and encrypted in the index for RAG
How the encrypted similarity search is performed with FHE
How the LLM (Ollama using ChatGPT OSS) leverages RAG while keeping results encrypted until decryption

Prerequisites

enVector server reachable from this notebook environment
Registered key path and key ID for the target index
pyenvector, langchain, langchain-community, langchain-text-splitters, and sentence-transformers packages installed
A PDF document accessible from the working directory

# !pip install langchain-envector==0.1.3 --force-reinstall
# !pip install langchain-community --force-reinstall

Import langchain-envector

Import langchain_envector to use enVector with the LangChain framework.

import langchain_envector

First, load a sample document to search.

In this example, we use a NIST report. This report evaluates how accurate and reliable common empirical formulas are when used to predict fire behavior in various scenarios. For more details about the report, see NIST Report and download the PDF from Link.

import os
from pathlib import Path

PDF_PATH = Path("./NIST.SP.1169.pdf")  # Update with your PDF path
assert PDF_PATH.exists(), f"PDF file not found: {PDF_PATH}"

Load the PDF and split into chunks

We rely on LangChain community loaders and text splitters to turn the PDF pages into retrieval-friendly passages.

from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

loader = PyPDFLoader(str(PDF_PATH))
raw_docs = loader.load()
print(f"Loaded {len(raw_docs)} pages from {PDF_PATH.name}")

splitter = RecursiveCharacterTextSplitter(chunk_size=900, chunk_overlap=150)
chunked_docs = splitter.split_documents(raw_docs)
print(f"Created {len(chunked_docs)} chunks")

Prepare text and metadata payloads

enVector expects parallel lists of texts and metadata dictionaries. Here we keep track of the original page number for traceability.

texts = []
metadatas = []
for doc in chunked_docs:
    texts.append(doc.page_content)
    meta = dict(doc.metadata)
    meta.setdefault("source", PDF_PATH.name)
    metadatas.append(meta)

print(texts[0][:200])
print(metadatas[0])
print(f"Prepared {len(texts)} text chunks")

Set embedding model

We'll use HuggingFace embeddings to convert our text chunks into numerical vectors that can be encrypted and searched. The embeddings model will transform each text chunk into a high-dimensional vector that captures semantic meaning. These vectors will be encrypted before being stored in the enVector index.

from langchain_community.embeddings import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
vector_dim = len(embeddings.embed_query("envector probe"))
print(f"Embedding dimension: {vector_dim}")

Initialize the enVector store

Configure the encrypted vector index and instantiate the LangChain-compatible store. The embedding model derives the vector dimension automatically.

Initialization step includes:

ConnectionConfig: establishing a connection to the enVector server,
IndexSettings: configuring index settings necessary for vector search, including query and metadata encryption, and
KeyConfig: registering evaluation keys to enable the enVector server to perform secure operations.

ENVECTOR_ADDRESS = os.getenv("ENVECTOR_ADDRESS", "localhost:50050")
ENVECTOR_ACCESS_TOKEN = os.getenv("ENVECTOR_ACCESS_TOKEN", "")

from langchain_envector.config import ConnectionConfig, EnvectorConfig, IndexSettings, KeyConfig
from langchain_envector.vectorstore import Envector

config = EnvectorConfig(
    connection=ConnectionConfig(address=ENVECTOR_ADDRESS, access_token=ENVECTOR_ACCESS_TOKEN),
    key=KeyConfig(key_path="./keys", key_id="rag_key_id"),
    index=IndexSettings(index_name="nist", dim=vector_dim, query_encryption="plain"),
    create_if_missing=True,
)
store = Envector(config=config, embeddings=embeddings)

Insert chunks (batched)

_ = store.add_texts(texts, metadatas)

Encrypted search on the index

Let's perform an encrypted similarity search using LangChain-enVector.

The enVector vectorstore provides a simple interface through LangChain to perform similarity search on encrypted data. Under the hood, enVector handles all the encryption, decryption, and secure search operations automatically. When we call similarity_search(), the query is encrypted, the secure similarity search is performed on the encrypted vectors, and the results are decrypted before being returned.

The store.similarity_search() method returns the top-k most relevant documents along with their similarity scores, making it easy to build secure RAG applications without having to manage encryption directly.

query = "Which organizations collaborated on NIST SP 1169’s fire model verification and validation study, and what larger NRC report summarizes the results?"

# Query in plaintext
results = store.similarity_search(query, k=3)
for idx, doc in enumerate(results, start=1):
    print(f"--- Result {idx} (score={doc.metadata.get('_score'):.4f}) ---")
    print(doc.page_content[:400], "...")
    print({k: v for k, v in doc.metadata.items() if not k.startswith('_')})
    print()

Generate Answers with Retrieval-augmented Context

Once the decrypted documents are retrieved, we can use an LLM (e.g. OpenAI's GPT) to generate answers based on the retrieved documents.

In this example, we use the gpt-oss model running locally with ollama.

import requests

def generate_answer(docs, query, model="gpt-oss"):
    instruction = "You are an assistant that answers questions based on the provided documents."
    prompt = f"""{instruction}:\n\n[Documents]\n"""
    for doc in docs:
        prompt += f"- {doc}\n"
    prompt += f"\n[Question]\n{query}\n[Answer]\n"

    response = requests.post(
        "http://localhost:11434/api/chat",
        json={
            "model": model,
            "messages": [
                {"role": "system", "content": instruction},
                {"role": "user", "content": prompt}
            ],
            "stream": False
        }
    )
    response.raise_for_status()
    return response.json()["message"]["content"].strip()

# Example usage
answer = generate_answer(results[0].metadata, query)
print(f"Generated Answer: \n{answer}")

PreviousEncrypted Multi-Modal Search with enVector NextEnvectorClient

Last updated 2 months ago

hashtagPrerequisites

hashtagImport langchain-envector

hashtagLoad the PDF and split into chunks

hashtagPrepare text and metadata payloads

hashtagSet embedding model

hashtagInitialize the enVector store

hashtagInsert chunks (batched)

hashtagEncrypted search on the index

hashtagGenerate Answers with Retrieval-augmented Context