Building a Production-Ready RAG Application with Milvus and LangChain
- Rajesh
- 0
A modern approach that merges information retrieval and natural language generation to deliver precise and contextually fitting answers. Advanced techniques like query rewriting and reranking will be integrated to elevate the application’s efficiency. Furthermore, practical coding instances with sample documents will be presented to demonstrate the implementation
Introduction to RAG
RAG is a novel method that integrates retrieval and generative models to enhance the performance of Natural language processing tasks. By utilizing both types of models, RAG improves the accuracy and contextuality of responses. This approach is especially valuable for demanding tasks such as question answering, where precision and context are crucial components
Overview of Milvus and LangChain
Milvus
Milvus is a vector database that is open-source and created for conducting efficient and scalable similarity searches, providing excellent retrieval capabilities and supporting large-scale management of vector data.. Milvus is well-suited for applications that require fast and accurate vector search, making it an ideal choice for the retrieval component of a RAG system.
LangChain
LangChain is a library for building language model-powered applications. It provides tools and frameworks to create applications that combine natural language understanding, generation, and retrieval. LangChain simplifies the process of integrating language models with other components, such as databases and search engines.
Architecture of the RAG Application
The architecture of our RAG application will consist of the following components:
Document Store: A database to store and manage the documents. We’ll use Milvus for this purpose.
Retriever: A component to fetch relevant documents from the document store based on the user’s query.
Re-ranker: An advanced module to re-rank the retrieved documents to improve the relevance.
Query Rewriter: A module to rewrite queries to enhance retrieval accuracy.
Generator: A generative model to create responses based on the retrieved and re-ranked documents.
Orchestrator: A workflow manager to coordinate the interaction between the components.
Setting Up Milvus and LangChain
Installing Milvus
First, let’s install Milvus. We’ll use Docker to set up Milvus quickly.
docker run -d --name milvus-etcd \
-p 2379:2379 \
-p 2380:2380 \
milvusdb/etcd:latest
docker run -d --name milvus-minio \
-p 9000:9000 \
milvusdb/minio:latest
docker run -d --name milvus-standalone \
-p 19530:19530 \
--link milvus-etcd:etcd \
--link milvus-minio:minio \
milvusdb/milvus:latest
Installing LangChain
Next, let’s install LangChain and other required Python libraries.
pip install langchain milvus pymilvus transformers
Sample Documents
For this example, let’s create a few sample documents.
documents = [
{
"id": 1,
"title": "Introduction - Natural language processing",
"content": "The field of artificial intelligence known as natural language processing concentrates on how computers and humans communicate using natural language."
},
{
"id": 2,
"title": "Advanced Techniques in ML",
"content": "ML is a branch of artificial intelligence that enables systems to learn from data and improve from experience without being explicitly programmed."
},
{
"id": 3,
"title": "Understanding-Deep Learning",
"content": "Deep learning involves utilizing neural networks with multiple layers to analyze diverse data types and is considered a subset of machine learning."
}
]
Storing Documents in Milvus
We’ll store the documents in Milvus as vectors.
from pymilvus import FieldSchema, CollectionSchema, DataType, Collection, connections
from sentence_transformers import Sentencetransformer
# Connect to Milvus
connections.connect()
# Define the schema for the collection
fields = [
FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=False),
FieldSchema(name="title", dtype=DataType.VARCHAR, max_length=255),
FieldSchema(name="content", dtype=DataType.VARCHAR, max_length=4096),
FieldSchema(name="vector", dtype=DataType.FLOAT_VECTOR, dim=768)
]
schema = CollectionSchema(fields, "Document collection schema")
# Create the collection
collection = Collection("documents", schema)
# Use a pre-trained model to generate document embeddings
model = SentenceTransformer('all-MiniLM-L6-v2')
# Prepare data
data = []
for doc in documents:
embedding = model.encode(doc['content']).tolist()
data.append([doc['id'], doc['title'], doc['content'], embedding])
# Insert data into Milvus
collection.insert(data)
collection.load()
Query Rewriting
To improve retrieval accuracy, we can implement a query rewriting mechanism. This can involve synonym expansion, query paraphrasing, or other NLP techniques.
from transformers import pipeline
# Load a pre-trained model for query rewriting
query_rewriter = pipeline(‘text2text-generation’, model=’t5-base’)
def rewrite_query(query):
rewritten_query = query_rewriter(f”rewrite: {query}”)[0][‘generated_text’]
return rewritten_query
# Example query rewriting
query = “What is NLP?”
rewritten_query = rewrite_query(query)
print(“Original Query:”, query)
print(“Rewritten Query:”, rewritten_query)
Retrieving Documents
Next, we’ll use the rewritten query to retrieve relevant documents from Milvus.
def retrieve_documents(query):
query_embedding = model.encode(query).tolist()
search_params = {
“metric_type”: “IP”, # Inner Product for similarity search
“params”: {“nprobe”: 10}
}
results = collection.search([query_embedding], “vector”, param=search_params, limit=5, output_fields=[“title”, “content”])
return results
# Retrieve documents
retrieved_docs = retrieve_documents(rewritten_query)
for doc in retrieved_docs:
print(f “Title: {doc.entity.get(‘title’)}, Content: {doc.entity.get(‘content’)}”)
Re-ranking Documents
To make sure the documents found are even more accurate, we recommend using a re-ranking technique that will check how well they match the search and then sort them out.
from sklearn.metrics.pairwise import Cosine_similarity
import numpy as npy
def rerank_documents(query, retrieved_docs):
query_embedding = model.encode(query).reshape(1, -1)
doc_embeddings = [doc.entity.get('vector') for doc in retrieved_docs]
similarities = Cosine_similarity(query_embedding, doc_embeddings)[0]
ranked_docs = Sorted(zip(retrieved_docs, similarities), key=lambda x: x[1], reverse=True)
return [doc[0] for doc in ranked_docs]
# Re-rank documents
ranked_docs = rerank_documents(rewritten_query, retrieved_docs)
for doc in ranked_docs:
print(f "Title: {doc.entity.get('title')}, Content: {doc.entity.get('content')}")
Generating Responses
Finally, we’ll use a generative model to create responses based on the top-ranked documents.
from Transformers import GPT2-LMHeadModel, GPT2Tokenizer
# Load a pre-trained generative model
Tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
Model = GPT2LMHeadModel.from_pretrained('gpt2')
def generate_response(documents, query):
context = " ".join([doc.entity.get('content') for doc in documents])
input_text = f "Query: {query}\nContext: {context}\nAnswer:"
inputs = Tokenizer. encode(input_text, return_tensors='pt')
outputs = Model. generate(inputs, max_length=200, num_return_sequences=1)
response = Tokenizer.decode(outputs[0], skip_special_tokens=True)
return response
# Generate response
response = generate_response(ranked_docs, rewritten_query)
print("Generated Response:", response)
Orchestrating the Workflow
We’ll create a function to orchestrate the entire workflow from query input to response generation.
def answer_query(query):
rewritten_query = rewrite_query(query)
retrieved_docs = retrieve_documents(rewritten_query)
ranked_docs = rerank_documents(rewritten_query, retrieved_docs)
response = generate_response(ranked_docs, rewritten_query)
return response
# Example usage
query = "Explain deep learning"
response = answer_query(query)
print("Response:", response)
Conclusion
In this publication, a fully operational RAG application has been developed using Milvus and LangChain. Through the implementation of strategies such as query re-writing and re-ranking, the precision and pertinence of the answers have been significantly improved. The concurrent utilization of Milvus for effective vector retrieval and LangChain for smooth integration with language models lays a robust groundwork for the creation of sophisticated NLP applications.