Introduction

Voice-based AI assistants are becoming increasingly popular in applications ranging from customer support to personal assistants. This article walks through the implementation of a voice-enabled chatbot that allows users to interact with an AI model by speaking instead of typing.

The chatbot integrates speech recognition and text-to-speech (TTS) with a Retrieval-Augmented Generation (RAG) chatbot powered by Mistral LLM. This allows the bot to listen to user queries, retrieve relevant information, generate responses, and speak them back — creating a seamless conversational experience.

Installation

pip install pyttsx3 SpeechRecognition

1. Initializing the Voice Chatbot

The chatbot is implemented in a class called VoiceChatbot, which extends the existing text-based chatbot (ChatBot).

import pyttsx3
import speech_recognition as sr
from chatbot import ChatBot

Key Libraries Used

pyttsx3 → Converts text responses into speech.
speech_recognition → Captures and transcribes user speech.
ChatBot (imported from chatbot.py) → Handles text-based AI interactions. (chatbot code and explanation)

2. Setting Up Speech Recognition and Text-to-Speech

class VoiceChatbot():
    def __init__(self, name):
        self.name = name
        self.chatbot = ChatBot()  # Initialize the AI chatbot
        
        # Set up text-to-speech engine
        self.engine = pyttsx3.init()
        
        # Set up speech recognition
        self.voice_recognizer = sr.Recognizer()

How It Works

The chatbot is initialized with a name (e.g., "Faheem Bot").
The pyttsx3 TTS engine is used to convert text responses into speech.
speech_recognition.Recognizer() is used to capture and process user speech.

3. Capturing User Speech

The listen() method records audio from the microphone and converts it into text.

def listen(self):
    with sr.Microphone() as source:
        audio = self.voice_recognizer.listen(source, phrase_time_limit=5)
        print("Processing...")
    try:
        text = self.voice_recognizer.recognize_google(audio)
        return text
    except Exception as e:
        print("Error: " + str(e))
    return None

How It Works

Records audio from the microphone (5-second limit).
Uses Google Speech Recognition API to convert speech into text.
If speech recognition fails, it prints an error message.

This ensures the chatbot can capture spoken user input and process it for generating a response.

4. Converting AI Responses into Speech

The chatbot uses the speak() method to vocalize responses.

def speak(self, text):
    self.engine.say(text)
    self.engine.runAndWait()

How It Works

Calls pyttsx3.say(text) to convert text into speech.
Uses runAndWait() to ensure speech is fully played before moving to the next operation.

This creates a fluid, human-like conversational experience where the chatbot speaks back to the user.

5. Running the Conversational Loop

The run() method initializes the chatbot and starts an interactive voice-based conversation.

def run(self):
    self.chatbot.ask(f"You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, say that you don't know. Use three sentences maximum and keep the answer concise. Your name is {self.name}.")
    self.speak(f"Hello, I am {self.name}. How can I help you today?")

    while True:
        prompt = self.listen()
        if prompt is not None:
            print("You: " + prompt)
            response = self.chatbot.ask(prompt)
            print(self.name, ":", response)
            if "Have a great day!" in response:
                exit()
            self.speak(response)
        else:
            self.speak("I'm sorry, I didn't understand that.")

How It Works

Initializes the chatbot with a system message.
Greets the user and speaks its name.
Enters a loop where:

The chatbot listens for user input.
The recognized text is printed.
The AI model generates a response based on user input.
The response is spoken aloud using pyttsx3.
If the response is "Have a great day!", the chatbot exits the conversation.

4. If speech recognition fails, the chatbot apologizes and prompts the user to repeat.

This loop allows for a continuous, real-time conversation, just like talking to a human.

6. Running the Chatbot

To start the chatbot, run:

if __name__ == "__main__":
    bot = VoiceChatbot("Faheem Bot")
    bot.run()

This initializes and launches the voice assistant, allowing users to interact with it hands-free.

Conclusion

This voice-enabled chatbot combines speech recognition, text-to-speech, and AI-driven responses to create a natural conversational experience. By leveraging Mistral LLM and RAG, the chatbot can retrieve information and answer questions based on custom company data.

Reference

For chatbot class visit following article.

https://artificialintelligence-code.blogspot.com/2025/04/building-chatbot-with-mistral-llm-and.html

Artificial Intelligence Code

Building a Voice-Enabled AI Chatbot with Speech Recognition and LLM

Introduction

Installation

1. Initializing the Voice Chatbot

Key Libraries Used

2. Setting Up Speech Recognition and Text-to-Speech

How It Works

3. Capturing User Speech

How It Works

4. Converting AI Responses into Speech

How It Works

5. Running the Conversational Loop

How It Works

6. Running the Chatbot

Conclusion

Reference

No comments:

Post a Comment

Building a CLI-Based People Tracking and Dwell Time Analytics System Using YOLOv8 and DeepSORT

Followers