In this article, we’ll dive into a practical example of building an Agent using LangGraph, LangChain, and an LLM backend from Groq (e.g., llama3-8b-8192). The assistant:

Uses memory to manage conversation state.
Integrates tools like Google search.
Handles decision-making on whether to call tools or directly respond.
Trims context to manage token limits.

What is Agent?

An agent in LangChain is a smart decision-making system built on top of a large language model (LLM) that can dynamically choose actions to take based on user input and the current conversation context. Agents reason step-by-step and decide whether to call a function (tool), ask follow-up questions, or return a final answer. Agents are particularly useful for complex workflows where decisions depend on runtime conditions, user input, or intermediate tool outputs. They simulate reasoning by thinking through a task and invoking tools as needed.

🧠 What is Tool Calling?

Tool calling is the mechanism that allows an agent to extend its abilities beyond just language generation. When an agent is prompted with a question, the LLM analyzes whether it can answer directly or if it needs external information or actions. If it needs a tool, the LLM returns a structured command (JSON) specifying the tool name and the required arguments. LangChain then parses this tool call, executes the actual Python function (e.g., a search API, calculator, or database lookup), and returns the result back to the agent. The agent continues reasoning with this new information and may perform additional steps or respond to the user.

LangChain parses the tool call, executes it, and passes its result back into the conversation.
LangGraph automates this flow using a graph of nodes.

📦 Imports and Initialization

# Standard Python typing.
from typing import Any, Dict, List

# Initializes a chat model like llama3 with a specified provider (Groq).
from langchain.chat_models import init_chat_model

# Provides access to Google search using Serper API.
from langchain_community.utilities import GoogleSerperAPIWrapper

# Handles message types and trimming for context management.
from langchain_core.messages import HumanMessage, SystemMessage, trim_messages

# Used for creating a structured message prompt for the model.
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

# Registers a function as a tool for the LLM to use.
from langchain_core.tools import tool

# Core of LangGraph - builds the state machine and controls the flow between nodes.
from langgraph.graph import END, MessagesState, StateGraph
from langgraph.prebuilt import ToolNode, tools_condition

# Adds memory to persist and recall conversation history.
from langgraph.checkpoint.memory import MemorySaver

⚙️ Model and Tool Setup

1. Chat Model Initialization

chat_model = init_chat_model("llama3-8b-8192", model_provider="groq")

This initializes the llama3 model hosted by Groq. You can replace it with other providers (e.g., OpenAI, Anthropic).

2. Google Search Tool

# Initializes a wrapper to interact with the Serper API.
search = GoogleSerperAPIWrapper()

@tool
def google_search(query: str) -> str:
    """Search the web for information related to a query.

    Args:
        query: The search query

    Returns:
        Search results as text
    """
    try:
        result = search.run(query)
        return result
    except Exception as e:
        return "I couldn't perform the search due to a technical issue."

Registers a custom search tool. It uses the Serper API and returns search results for the query string. This tool can be called by the LLM when needed.

3. Trimming Context for Token Efficiency

trimmer = trim_messages(
    max_tokens=512,
    strategy="last",
    token_counter=chat_model,
    include_system=True,
    allow_partial=False,
    start_on="human",
)

This trims the message list to ensure the prompt stays within the token limit (512 tokens), prioritizing recent human messages.

4. Prompt Template

instructions = "You are my personal assistant. Your role is to assist me with research, idea generation, and planning. Provide clear, concise, and actionable insights. Always prioritize accuracy and relevance in your responses. When generating ideas, aim for creativity and practicality. Help me stay organized and focused on achieving my goals efficiently."

instructions_tools = "You are my personal assistant. Use the output from the tool to answer my query accurately. Ensure your response is concise, friendly, and focused on the requested information. Use the tool's output as a reference to support your explanation. If necessary, provide a simple and clear explanation, avoiding unnecessary technical jargon."

prompt_template = ChatPromptTemplate.from_messages(
    [SystemMessage(content=instructions_tools), MessagesPlaceholder(variable_name="messages")]
)

This defines how the system message and dynamic messages will be structured for each LLM call.

🛠️Tool Binding

TOOL_LIST = [google_search]
llm_with_tools = chat_model.bind_tools(TOOL_LIST)

🔍 Explanation:

Tells the model about the tools it can call (google_search).
Allows the LLM to dynamically decide whether to use a tool based on the prompt.

🔁 LangGraph Nodes

LangGraph is used to create a state machine that defines how data flows between nodes.

graph_builder = StateGraph(MessagesState)

# Add nodes
graph_builder.add_node("query_or_respond", query_or_respond)
graph_builder.add_node("tools", ToolNode(TOOL_LIST))
graph_builder.add_node("generate", generate)

# Set entry point
graph_builder.set_entry_point("query_or_respond")

# Define edges
graph_builder.add_conditional_edges(
    "query_or_respond",
    tools_condition,
    {END: END, "tools": "tools"},
)

graph_builder.add_edge("tools", "generate")
graph_builder.add_edge("generate", END)

# Add memory
memory = MemorySaver()
graph = graph_builder.compile(checkpointer=memory)

1. Tool-Enabled LLM

llm_with_tools = chat_model.bind_tools(TOOL_LIST)

This allows the LLM to invoke external tools (like google_search) during inference.

2. Node 1: query_or_respond

def query_or_respond(state: MessagesState) -> Dict[str, List]:
    """Generate a tool call or a direct response based on the input message.

    Args:
        state: Current conversation state

    Returns:
        Updated state with AI response
    """
    messages = trimmer.invoke(state["messages"])
    prmpt = prompt_template.invoke(messages)
    response = llm_with_tools.invoke(prmpt)
    return {"messages": [response]}

This node either:

Lets the LLM respond directly.
Or triggers a tool call.

Steps:

Trim the conversation.
Create a prompt using the trimmed messages.
Call the tool-aware model.

3. Node 2: tools (ToolNode)

Executes any required tools (e.g., google_search) requested by the previous node.

4. Node 3: generate

def generate(state: MessagesState) -> Dict[str, List]:
    """Generate a comprehensive answer based on tool outputs.

    Args:
        state: Current conversation state including tool outputs

    Returns:
        Updated state with final AI response
    """
    # Get generated tool messages
    recent_tool_messages = []
    for message in reversed(state["messages"]):
        if message.type == "tool":
            recent_tool_messages.append(message)
        else:
            break
    tool_messages = recent_tool_messages[::-1]

    # Format into prompt
    docs_content = "\n\n".join(doc.content for doc in tool_messages)
    system_message_content = instructions_tools + "\n\n" + f"{docs_content}"

    # Prepare conversation context (excluding tool messages)
    conversation_messages = [
        message
        for message in state["messages"]
        if message.type in ("human", "system")
        or (message.type == "ai" and not message.tool_calls)
    ]
    prompt = [SystemMessage(content=system_message_content)] + conversation_messages

    prompt = trimmer.invoke(prompt)

    # Generate response
    response = chat_model.invoke(prompt)
    return {"messages": [response]}

This node:

Collects tool output messages.
Appends them to system instructions.
Regenerates a final response using the raw chat model (no tool-binding here).

This ensures that the tool results are incorporated into a coherent answer.

Stores conversation history in memory during execution.

🧾 Chat Function

thread_id = "1"
message = "What is the capital of France?"

config = {"configurable": {"thread_id": thread_id}}

output = graph.invoke({"messages": HumanMessage(content=message)}, config)

for message in output["messages"]:
    print(f"{message.type}")
    print(message)

This is the main inference function to interact with the assistant.

Takes a user message and a thread ID (useful for memory isolation).
Invokes the LangGraph engine with the message.
Logs the response message(s) returned by the assistant.

# Expected Output
human
content='What is the capital of France?' additional_kwargs={} response_metadata={} id='fe525805-06fd-4a8e-b726-109299c84b80'
ai
content='' additional_kwargs={'tool_calls': [{'id': 'call_pc1v', 'function': {'arguments': '{"query":"What is the capital of France?"}', 'name': 'google_search'}, 'type': 'function'}]} response_metadata={'token_usage': {'completion_tokens': 67, 'prompt_tokens': 998, 'total_tokens': 1065, 'completion_time': 0.055833333, 'prompt_time': 0.125277397, 'queue_time': 0.01990629299999999, 'total_time': 0.18111073}, 'model_name': 'llama3-8b-8192', 'system_fingerprint': 'fp_dadc9d6142', 'finish_reason': 'tool_calls', 'logprobs': None} id='run-fba65750-32c8-47ff-9e2d-d40f44f5fb2c-0' tool_calls=[{'name': 'google_search', 'args': {'query': 'What is the capital of France?'}, 'id': 'call_pc1v', 'type': 'tool_call'}] usage_metadata={'input_tokens': 998, 'output_tokens': 67, 'total_tokens': 1065}
tool
content="Paris is the capital and largest city of France. With an estimated population of 2,048,472 residents in January 2025 in an area of more than 105 km2 (41 sq ... Paris is the capital of France, the largest country of Europe with 550 000 km2 (65 millions inhabitants). Paris has 2.234 million inhabitants end 2011. France is a semi-presidential republic and its capital, largest city and main cultural and economic centre is Paris. Paris, city and capital of France, located along the Seine River, in the north-central part of the country. Paris is one of the world's most important and ... Paris is the capital and most populous city of France. Situated on the Seine River, in the north of the country, it is in the centre of the Île-de-France ... The capital and by far the most important city of France is Paris, one of the world's preeminent cultural and commercial centres. Paris is the city of romance par excellence, the fashion capital and the best example of French art de vivre. Exploring Paris is an essential rite of passage ... Besides Paris, what is the capital of France? - Answer: F. Paris became capital of France because France evolved from a federation of counties to a kingdom, where the king was living a Paris. The ... What Is the Capital of France? Paris. The very name brings to mind tree-lined boulevards, grand monuments, sidewalk cafes, and so much more." name='google_search' id='5c1000a6-b172-4b77-bceb-76dfc3e3234b' tool_call_id='call_pc1v'
ai
content='According to the provided information, the answer is: Paris.' additional_kwargs={} response_metadata={'token_usage': {'completion_tokens': 13, 'prompt_tokens': 393, 'total_tokens': 406, 'completion_time': 0.010833333, 'prompt_time': 0.049380433, 'queue_time': 0.017139350999999997, 'total_time': 0.060213766}, 'model_name': 'llama3-8b-8192', 'system_fingerprint': 'fp_a97cfe35ae', 'finish_reason': 'stop', 'logprobs': None} id='run-eb34a578-b383-4379-ae6f-ad4209ea6f1a-0' usage_metadata={'input_tokens': 393, 'output_tokens': 13, 'total_tokens': 406}

✅ Summary: What This Code Does

Component Description init_chat_model Initializes the LLM (Groq’s LLaMA3) @tool google_search A custom Google Search tool trimmer Trims message history to stay within token limit query_or_respond Decides whether to respond or call a tool ToolNode Executes tools like search generate Generates a final reply using tool output LangGraph Manages flow using a state machine MemorySaver Tracks ongoing conversations chat() Unified interface to interact with the assistant

import logging
from typing import Any, Dict, List

from langchain.chat_models import init_chat_model
from langchain_community.utilities import GoogleSerperAPIWrapper
from langchain_core.messages import HumanMessage, SystemMessage, trim_messages
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.tools import tool
from langgraph.checkpoint.memory import MemorySaver
from langgraph.graph import END, MessagesState, StateGraph
from langgraph.prebuilt import ToolNode, tools_condition

import api_keys

# os.environ["SERPER_API_KEY"] = "my-api-key"
# os.environ["GROQ_API_KEY"] = "my-api-key"

# Initialize chat model
chat_model = init_chat_model("llama3-8b-8192", model_provider="groq")

search = GoogleSerperAPIWrapper()


@tool
def google_search(query: str) -> str:
    """Search the web for information related to a query.

    Args:
        query: The search query

    Returns:
        Search results as text
    """
    try:
        result = search.run(query)
        return result
    except Exception as e:
        return "I couldn't perform the search due to a technical issue."


# Message trimmer to manage context window
trimmer = trim_messages(
    max_tokens=512,
    strategy="last",
    token_counter=chat_model,
    include_system=True,
    allow_partial=False,
    start_on="human",
)

# Load system instructions
instructions = "You are my personal assistant. Your role is to assist me with research, idea generation, and planning. Provide clear, concise, and actionable insights. Always prioritize accuracy and relevance in your responses. When generating ideas, aim for creativity and practicality. Help me stay organized and focused on achieving my goals efficiently."

instructions_tools = "You are my personal assistant. Use the output from the tool to answer my query accurately. Ensure your response is concise, friendly, and focused on the requested information. Use the tool's output as a reference to support your explanation. If necessary, provide a simple and clear explanation, avoiding unnecessary technical jargon."

prompt_template = ChatPromptTemplate.from_messages(
    [
        SystemMessage(content=instructions_tools),
        MessagesPlaceholder(variable_name="messages"),
    ]
)

# Define available tools
TOOL_LIST = [google_search]

llm_with_tools = chat_model.bind_tools(TOOL_LIST)


def query_or_respond(state: MessagesState) -> Dict[str, List]:
    """Generate a tool call or a direct response based on the input message.

    Args:
        state: Current conversation state

    Returns:
        Updated state with AI response
    """
    messages = trimmer.invoke(state["messages"])
    prmpt = prompt_template.invoke(messages)
    response = llm_with_tools.invoke(prmpt)
    return {"messages": [response]}


def generate(state: MessagesState) -> Dict[str, List]:
    """Generate a comprehensive answer based on tool outputs.

    Args:
        state: Current conversation state including tool outputs

    Returns:
        Updated state with final AI response
    """
    # Get generated tool messages
    recent_tool_messages = []
    for message in reversed(state["messages"]):
        if message.type == "tool":
            recent_tool_messages.append(message)
        else:
            break
    tool_messages = recent_tool_messages[::-1]

    # Format into prompt
    docs_content = "\n\n".join(doc.content for doc in tool_messages)
    system_message_content = instructions_tools + "\n\n" + f"{docs_content}"

    # Prepare conversation context (excluding tool messages)
    conversation_messages = [
        message
        for message in state["messages"]
        if message.type in ("human", "system")
        or (message.type == "ai" and not message.tool_calls)
    ]
    prompt = [SystemMessage(content=system_message_content)] + conversation_messages

    prompt = trimmer.invoke(prompt)

    # Generate response
    response = chat_model.invoke(prompt)
    return {"messages": [response]}


# Define and build the state graph
def build_graph():
    """Build and return the state graph for conversation flow."""
    graph_builder = StateGraph(MessagesState)

    # Add nodes
    graph_builder.add_node("query_or_respond", query_or_respond)
    graph_builder.add_node("tools", ToolNode(TOOL_LIST))
    graph_builder.add_node("generate", generate)

    # Set entry point
    graph_builder.set_entry_point("query_or_respond")

    # Define edges
    graph_builder.add_conditional_edges(
        "query_or_respond",
        tools_condition,
        {END: END, "tools": "tools"},
    )

    graph_builder.add_edge("tools", "generate")
    graph_builder.add_edge("generate", END)

    # Add memory
    memory = MemorySaver()
    return graph_builder.compile(checkpointer=memory)


# Build graph
graph = build_graph()


def chat(message: str, thread_id: str) -> str:
    """Process a user message and return an AI response.

    Args:
        message: User input message
        thread_id: Unique identifier for the conversation thread

    Returns:
        AI response text
    """
    # Set up configuration
    config = {"configurable": {"thread_id": thread_id}}

    output = graph.invoke({"messages": HumanMessage(content=message)}, config)
    
    # For Debugging
    for message in output["messages"]:
        print(f"{message.type}")
        print(message)

    # End

    return output["messages"][-1].content

Artificial Intelligence Code

Building a Agent with Tool Integration (Google Search) and Memory