This example demonstrates the complete workflow of the enVector Python SDK, showcasing its capabilities for Encrypted Retrieval-Augmented Generation (Encrypted RAG) using fully homomorphic encryption (FHE). In this example, we'll see:
How text data is stored and encrypted in the index for RAG
How the encrypted similarity search is performed with FHE
How the LLM (Ollama using ChatGPT OSS) leverages RAG while keeping results encrypted until decryption
Prerequisites
enVector server reachable from this notebook environment
Registered key path and key ID for the target index
pyenvector, langchain, langchain-community, langchain-text-splitters, and sentence-transformers packages installed
A PDF document accessible from the working directory
Import langchain_envector to use enVector with the LangChain framework.
import langchain_envector
First, load a sample document to search.
In this example, we use a NIST report. This report evaluates how accurate and reliable common empirical formulas are when used to predict fire behavior in various scenarios. For more details about the report, see NIST Report and download the PDF from Link.
Load the PDF and split into chunks
We rely on LangChain community loaders and text splitters to turn the PDF pages into retrieval-friendly passages.
Prepare text and metadata payloads
enVector expects parallel lists of texts and metadata dictionaries. Here we keep track of the original page number for traceability.
Set embedding model
We'll use HuggingFace embeddings to convert our text chunks into numerical vectors that can be encrypted and searched. The embeddings model will transform each text chunk into a high-dimensional vector that captures semantic meaning. These vectors will be encrypted before being stored in the enVector index.
Initialize the enVector store
Configure the encrypted vector index and instantiate the LangChain-compatible store. The embedding model derives the vector dimension automatically.
Initialization step includes:
ConnectionConfig: establishing a connection to the enVector server,
IndexSettings: configuring index settings necessary for vector search, including query and metadata encryption, and
KeyConfig: registering evaluation keys to enable the enVector server to perform secure operations.
Insert chunks (batched)
Encrypted search on the index
Let's perform an encrypted similarity search using LangChain-enVector.
The enVector vectorstore provides a simple interface through LangChain to perform similarity search on encrypted data. Under the hood, enVector handles all the encryption, decryption, and secure search operations automatically. When we call similarity_search(), the query is encrypted, the secure similarity search is performed on the encrypted vectors, and the results are decrypted before being returned.
The store.similarity_search() method returns the top-k most relevant documents along with their similarity scores, making it easy to build secure RAG applications without having to manage encryption directly.
Generate Answers with Retrieval-augmented Context
Once the decrypted documents are retrieved, we can use an LLM (e.g. OpenAI's GPT) to generate answers based on the retrieved documents.
In this example, we use the gpt-oss model running locally with ollama.
import os
from pathlib import Path
PDF_PATH = Path("./NIST.SP.1169.pdf") # Update with your PDF path
assert PDF_PATH.exists(), f"PDF file not found: {PDF_PATH}"
from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
loader = PyPDFLoader(str(PDF_PATH))
raw_docs = loader.load()
print(f"Loaded {len(raw_docs)} pages from {PDF_PATH.name}")
splitter = RecursiveCharacterTextSplitter(chunk_size=900, chunk_overlap=150)
chunked_docs = splitter.split_documents(raw_docs)
print(f"Created {len(chunked_docs)} chunks")
texts = []
metadatas = []
for doc in chunked_docs:
texts.append(doc.page_content)
meta = dict(doc.metadata)
meta.setdefault("source", PDF_PATH.name)
metadatas.append(meta)
print(texts[0][:200])
print(metadatas[0])
print(f"Prepared {len(texts)} text chunks")
from langchain_envector.config import ConnectionConfig, EnvectorConfig, IndexSettings, KeyConfig
from langchain_envector.vectorstore import Envector
config = EnvectorConfig(
connection=ConnectionConfig(address=ENVECTOR_ADDRESS, access_token=ENVECTOR_ACCESS_TOKEN),
key=KeyConfig(key_path="./keys", key_id="rag_key_id"),
index=IndexSettings(index_name="nist", dim=vector_dim, query_encryption="plain"),
create_if_missing=True,
)
store = Envector(config=config, embeddings=embeddings)
_ = store.add_texts(texts, metadatas)
query = "Which organizations collaborated on NIST SP 1169’s fire model verification and validation study, and what larger NRC report summarizes the results?"
# Query in plaintext
results = store.similarity_search(query, k=3)
for idx, doc in enumerate(results, start=1):
print(f"--- Result {idx} (score={doc.metadata.get('_score'):.4f}) ---")
print(doc.page_content[:400], "...")
print({k: v for k, v in doc.metadata.items() if not k.startswith('_')})
print()
import requests
def generate_answer(docs, query, model="gpt-oss"):
instruction = "You are an assistant that answers questions based on the provided documents."
prompt = f"""{instruction}:\n\n[Documents]\n"""
for doc in docs:
prompt += f"- {doc}\n"
prompt += f"\n[Question]\n{query}\n[Answer]\n"
response = requests.post(
"http://localhost:11434/api/chat",
json={
"model": model,
"messages": [
{"role": "system", "content": instruction},
{"role": "user", "content": prompt}
],
"stream": False
}
)
response.raise_for_status()
return response.json()["message"]["content"].strip()
# Example usage
answer = generate_answer(results[0].metadata, query)
print(f"Generated Answer: \n{answer}")