In this article, we will walk through a Retrieval-Augmented Generation (RAG) chatbot implemented using LangChain, Mistral LLM, Hugging Face embeddings, and vector search. This chatbot enables users to ask questions and receive responses based on retrieved information from a custom dataset, such as a PDF document.

Overview of the Chatbot

The chatbot is built using Mistral LLM for natural language processing, Hugging Face embeddings for vector-based document retrieval, and LangChain for structuring the conversational agent. The main capabilities of this chatbot include:

Retrieval-Augmented Generation (RAG): Combines retrieval-based search with generative AI to provide more relevant answers.
Vector Search with In-Memory Storage: Efficiently retrieves information using semantic search on document embeddings.
PDF Document Processing: Loads and processes PDF documents to extract and index relevant content.
Conversational Memory: Remembers previous interactions within a session using LangGraph’s MemorySaver.

Let’s break down the components of the implementation.

Installation

pip install --quiet --upgrade langchain-text-splitters langchain-community langgraph langchain[mistralai]

1. Initializing the Chatbot

The chatbot is encapsulated in a ChatBot class, which initializes the necessary components:

class ChatBot():
    def __init__(self):
        # Create memory storage
        self.memory = MemorySaver()
        
        # Load Mistral LLM
        self.model = init_chat_model("mistral-large-latest", model_provider="mistralai")

        # Initialize vector search with Hugging Face embeddings
        self.embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")
        self.vector_store = InMemoryVectorStore(self.embeddings)

        # Load and process PDF data
        self.load_data("data/administratoreecaf3b490e2d43d2e3b50c0c068b5d7.pdf")        # Define prompt for question-answering
        self.prompt = hub.pull("rlm/rag-prompt")        

        # Define retrieval tool
        self.tools = ToolNode([self.retrieve])

        # Create the conversational agent
        self.agent_executor = create_react_agent(self.model, self.tools, checkpointer=self.memory)

Key Components

Mistral LLM Initialization: The chatbot uses the "mistral-large-latest" model from MistralAI for generating responses.

2. Hugging Face Embeddings for Vector Search: The chatbot uses sentence-transformers/all-mpnet-base-v2 to generate embeddings and store them in an in-memory vector database.

3. Memory Management: MemorySaver from LangGraph stores previous interactions, enabling contextual understanding.

4. Tool Integration: The chatbot uses ToolNode to integrate a retrieval function that fetches relevant information from indexed documents.

2. Loading and Processing PDF Data

The chatbot can process a PDF document, split it into smaller chunks, and store the text embeddings for retrieval.

def load_data(self, pdf_path):
    # Load and chunk contents of the PDF
    loader = PyPDFLoader(pdf_path)
    docs = loader.load()

    # Split text into manageable chunks
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
    all_splits = text_splitter.split_documents(docs)
    # Index chunks in vector store
    _ = self.vector_store.add_documents(documents=all_splits)

How it Works

The PyPDFLoader extracts text from the PDF.
The RecursiveCharacterTextSplitter splits the text into chunks of 1000 characters, with a 200-character overlap to preserve context.
The processed text is converted into embeddings and stored in the vector database.

This ensures that when a user asks a question, the chatbot can search for relevant document sections instead of scanning the entire PDF.

3. Retrieving Relevant Information

When a user asks a question, the chatbot retrieves similar text chunks from the stored PDF content.

def retrieve(self, state: State):
    """Retrieve information related to a query."""
    retrieved_docs = self.vector_store.similarity_search(state["question"])
    return {"context": retrieved_docs}

How Retrieval Works

User Input: The chatbot receives a question from the user.
Vector Search: The question is converted into an embedding vector and compared against stored document vectors using similarity search.
Top Matches Returned: The most relevant document sections are retrieved and used as context for generating an answer.

4. Answering User Questions

The chatbot processes user queries and generates responses using the Mistral LLM and retrieved context.

def ask(self, message: str, thread_id: str = "abc123"):
    # Use the agent
    config = {"configurable": {"thread_id": thread_id}}
    response = self.agent_executor.invoke(
        {"messages": [HumanMessage(content=message)]},
        config)
    return response["messages"][-1].content

How It Works

The function receives the user’s message.
It sends the message to the LangChain agent, along with the retrieved context.
The Mistral model generates a response based on the context.
The chatbot returns the final answer.

This ensures context-aware responses, improving accuracy when answering domain-specific questions.

Conclusion

This chatbot is a powerful assistant that leverages Mistral LLM, RAG, and vector-based retrieval to provide accurate answers based on custom company data.

Artificial Intelligence Code

Building a Chatbot with Mistral LLM and RAG