Building a Production-Ready RAG Application with Milvus and LangChain

A modern approach that merges information retrieval and natural language generation to deliver precise and contextually fitting answers. Advanced techniques like query rewriting and reranking will be integrated to elevate the application’s efficiency. Furthermore, practical coding instances with sample documents will be presented to demonstrate the implementation

Introduction to RAG

RAG is a novel method that integrates retrieval and generative models to enhance the performance of Natural language processing tasks. By utilizing both types of models, RAG improves the accuracy and contextuality of responses. This approach is especially valuable for demanding tasks such as question answering, where precision and context are crucial components

Overview of Milvus and LangChain

Milvus

Milvus is a vector database that is open-source and created for conducting efficient and scalable similarity searches, providing excellent retrieval capabilities and supporting large-scale management of vector data.. Milvus is well-suited for applications that require fast and accurate vector search, making it an ideal choice for the retrieval component of a RAG system.

LangChain

LangChain is a library for building language model-powered applications. It provides tools and frameworks to create applications that combine natural language understanding, generation, and retrieval. LangChain simplifies the process of integrating language models with other components, such as databases and search engines.

Architecture of the RAG Application

The architecture of our RAG application will consist of the following components:

Document Store: A database to store and manage the documents. We’ll use Milvus for this purpose.

Retriever: A component to fetch relevant documents from the document store based on the user’s query.

Re-ranker: An advanced module to re-rank the retrieved documents to improve the relevance.

Query Rewriter: A module to rewrite queries to enhance retrieval accuracy.

Generator: A generative model to create responses based on the retrieved and re-ranked documents.

Orchestrator: A workflow manager to coordinate the interaction between the components.

Setting Up Milvus and LangChain

Installing Milvus

First, let’s install Milvus. We’ll use Docker to set up Milvus quickly.

docker run -d --name milvus-etcd \
  -p 2379:2379 \
  -p 2380:2380 \
  milvusdb/etcd:latest

docker run -d --name milvus-minio \
  -p 9000:9000 \
  milvusdb/minio:latest

docker run -d --name milvus-standalone \
  -p 19530:19530 \
  --link milvus-etcd:etcd \
  --link milvus-minio:minio \
  milvusdb/milvus:latest

Installing LangChain

Next, let’s install LangChain and other required Python libraries.

pip install langchain milvus pymilvus transformers

Sample Documents

For this example, let’s create a few sample documents.

documents = [
    {
        "id": 1,
        "title": "Introduction - Natural language processing",
        "content": "The field of artificial intelligence known as natural language processing concentrates on how computers and humans communicate using natural language."
    },
    {
        "id": 2,
        "title": "Advanced Techniques in ML",
        "content": "ML is a branch of artificial intelligence that enables systems to learn from data and improve from experience without being explicitly programmed."
    },
    {
        "id": 3,
        "title": "Understanding-Deep Learning",
        "content": "Deep learning involves utilizing neural networks with multiple layers to analyze diverse data types and is considered a subset of machine learning."
    }
]

Storing Documents in Milvus

We’ll store the documents in Milvus as vectors.

from pymilvus import FieldSchema, CollectionSchema, DataType, Collection, connections

from sentence_transformers import Sentencetransformer

# Connect to Milvus
connections.connect()

# Define the schema for the collection
fields = [
    FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=False),
    FieldSchema(name="title", dtype=DataType.VARCHAR, max_length=255),
    FieldSchema(name="content", dtype=DataType.VARCHAR, max_length=4096),
    FieldSchema(name="vector", dtype=DataType.FLOAT_VECTOR, dim=768)
]
schema = CollectionSchema(fields, "Document collection schema")

# Create the collection
collection = Collection("documents", schema)

# Use a pre-trained model to generate document embeddings
model = SentenceTransformer('all-MiniLM-L6-v2')

# Prepare data
data = []
for doc in documents:
    embedding = model.encode(doc['content']).tolist()
    data.append([doc['id'], doc['title'], doc['content'], embedding])

# Insert data into Milvus
collection.insert(data)
collection.load()

Query Rewriting

To improve retrieval accuracy, we can implement a query rewriting mechanism. This can involve synonym expansion, query paraphrasing, or other NLP techniques.

from transformers import pipeline

# Load a pre-trained model for query rewriting

query_rewriter = pipeline(‘text2text-generation’, model=’t5-base’)

def rewrite_query(query):

rewritten_query = query_rewriter(f”rewrite: {query}”)[0][‘generated_text’]

return rewritten_query

# Example query rewriting

query = “What is NLP?”

rewritten_query = rewrite_query(query)

print(“Original Query:”, query)

print(“Rewritten Query:”, rewritten_query)

Retrieving Documents

Next, we’ll use the rewritten query to retrieve relevant documents from Milvus.

def retrieve_documents(query):

query_embedding = model.encode(query).tolist()

search_params = {

“metric_type”: “IP”, # Inner Product for similarity search

“params”: {“nprobe”: 10}

}

results = collection.search([query_embedding], “vector”, param=search_params, limit=5, output_fields=[“title”, “content”])

return results

# Retrieve documents

retrieved_docs = retrieve_documents(rewritten_query)

for doc in retrieved_docs:

print(f “Title: {doc.entity.get(‘title’)}, Content: {doc.entity.get(‘content’)}”)

Re-ranking Documents

To make sure the documents found are even more accurate, we recommend using a re-ranking technique that will check how well they match the search and then sort them out.

from sklearn.metrics.pairwise import Cosine_similarity
import numpy as npy

def rerank_documents(query, retrieved_docs):
    query_embedding = model.encode(query).reshape(1, -1)
    doc_embeddings = [doc.entity.get('vector') for doc in retrieved_docs]
    similarities = Cosine_similarity(query_embedding, doc_embeddings)[0]
    ranked_docs = Sorted(zip(retrieved_docs, similarities), key=lambda x: x[1], reverse=True)
    return [doc[0] for doc in ranked_docs]

# Re-rank documents
ranked_docs = rerank_documents(rewritten_query, retrieved_docs)
for doc in ranked_docs:
    print(f "Title: {doc.entity.get('title')}, Content: {doc.entity.get('content')}")

Generating Responses

Finally, we’ll use a generative model to create responses based on the top-ranked documents.

from Transformers import GPT2-LMHeadModel, GPT2Tokenizer

# Load a pre-trained generative model
Tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
Model = GPT2LMHeadModel.from_pretrained('gpt2')

def generate_response(documents, query):
    context = " ".join([doc.entity.get('content') for doc in documents])
    input_text = f "Query: {query}\nContext: {context}\nAnswer:"
    inputs = Tokenizer. encode(input_text, return_tensors='pt')
    outputs = Model. generate(inputs, max_length=200, num_return_sequences=1)
    response = Tokenizer.decode(outputs[0], skip_special_tokens=True)
    return response

# Generate response
response = generate_response(ranked_docs, rewritten_query)
print("Generated Response:", response)

Orchestrating the Workflow

We’ll create a function to orchestrate the entire workflow from query input to response generation.

def answer_query(query):
    rewritten_query = rewrite_query(query)
    retrieved_docs = retrieve_documents(rewritten_query)
    ranked_docs = rerank_documents(rewritten_query, retrieved_docs)
    response = generate_response(ranked_docs, rewritten_query)
    return response

# Example usage
query = "Explain deep learning"
response = answer_query(query)
print("Response:", response)

Conclusion

In this publication, a fully operational RAG application has been developed using Milvus and LangChain. Through the implementation of strategies such as query re-writing and re-ranking, the precision and pertinence of the answers have been significantly improved. The concurrent utilization of Milvus for effective vector retrieval and LangChain for smooth integration with language models lays a robust groundwork for the creation of sophisticated NLP applications.

Most Searched:

Author

Rajesh

Rajesh Yerremshetty is an IIT Roorkee MBA graduate with 10 years of experience in Data Analytics and AI. He has worked with leading organizations, including CarDekho.com, Vansun Media Tech Pvt. Ltd., and STRIKIN.com, driving innovative solutions and business growth through data-driven insights.
View all posts