Building a Voice-Enabled AI Chatbot with Speech Recognition and LLM

 

Introduction

Voice-based AI assistants are becoming increasingly popular in applications ranging from customer support to personal assistants. This article walks through the implementation of a voice-enabled chatbot that allows users to interact with an AI model by speaking instead of typing.

The chatbot integrates speech recognition and text-to-speech (TTS) with a Retrieval-Augmented Generation (RAG) chatbot powered by Mistral LLM. This allows the bot to listen to user queries, retrieve relevant information, generate responses, and speak them back — creating a seamless conversational experience.

Installation

pip install pyttsx3 SpeechRecognition

1. Initializing the Voice Chatbot

The chatbot is implemented in a class called VoiceChatbot, which extends the existing text-based chatbot (ChatBot).

import pyttsx3
import speech_recognition as sr
from chatbot import ChatBot

Key Libraries Used

  • pyttsx3 → Converts text responses into speech.
  • speech_recognition → Captures and transcribes user speech.
  • ChatBot (imported from chatbot.py) → Handles text-based AI interactions. (chatbot code and explanation)

2. Setting Up Speech Recognition and Text-to-Speech

class VoiceChatbot():
def __init__(self, name):
self.name = name
self.chatbot = ChatBot() # Initialize the AI chatbot

# Set up text-to-speech engine
self.engine = pyttsx3.init()

# Set up speech recognition
self.voice_recognizer = sr.Recognizer()

How It Works

  • The chatbot is initialized with a name (e.g., "Faheem Bot").
  • The pyttsx3 TTS engine is used to convert text responses into speech.
  • speech_recognition.Recognizer() is used to capture and process user speech.

3. Capturing User Speech

The listen() method records audio from the microphone and converts it into text.

def listen(self):
with sr.Microphone() as source:
audio = self.voice_recognizer.listen(source, phrase_time_limit=5)
print("Processing...")
try:
text = self.voice_recognizer.recognize_google(audio)
return text
except Exception as e:
print("Error: " + str(e))
return None

How It Works

  1. Records audio from the microphone (5-second limit).
  2. Uses Google Speech Recognition API to convert speech into text.
  3. If speech recognition fails, it prints an error message.

This ensures the chatbot can capture spoken user input and process it for generating a response.

4. Converting AI Responses into Speech

The chatbot uses the speak() method to vocalize responses.

def speak(self, text):
self.engine.say(text)
self.engine.runAndWait()

How It Works

  • Calls pyttsx3.say(text) to convert text into speech.
  • Uses runAndWait() to ensure speech is fully played before moving to the next operation.

This creates a fluid, human-like conversational experience where the chatbot speaks back to the user.

5. Running the Conversational Loop

The run() method initializes the chatbot and starts an interactive voice-based conversation.

def run(self):
self.chatbot.ask(f"You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, say that you don't know. Use three sentences maximum and keep the answer concise. Your name is {self.name}.")
self.speak(f"Hello, I am {self.name}. How can I help you today?")

while True:
prompt = self.listen()
if prompt is not None:
print("You: " + prompt)
response = self.chatbot.ask(prompt)
print(self.name, ":", response)
if "Have a great day!" in response:
exit()
self.speak(response)
else:
self.speak("I'm sorry, I didn't understand that.")

How It Works

  1. Initializes the chatbot with a system message.
  2. Greets the user and speaks its name.
  3. Enters a loop where:
  • The chatbot listens for user input.
  • The recognized text is printed.
  • The AI model generates a response based on user input.
  • The response is spoken aloud using pyttsx3.
  • If the response is "Have a great day!", the chatbot exits the conversation.

4. If speech recognition fails, the chatbot apologizes and prompts the user to repeat.

This loop allows for a continuous, real-time conversation, just like talking to a human.

6. Running the Chatbot

To start the chatbot, run:

if __name__ == "__main__":
bot = VoiceChatbot("Faheem Bot")
bot.run()

This initializes and launches the voice assistant, allowing users to interact with it hands-free.

Conclusion

This voice-enabled chatbot combines speech recognition, text-to-speech, and AI-driven responses to create a natural conversational experience. By leveraging Mistral LLM and RAG, the chatbot can retrieve information and answer questions based on custom company data.

Reference

For chatbot class visit following article.

https://artificialintelligence-code.blogspot.com/2025/04/building-chatbot-with-mistral-llm-and.html

No comments:

Post a Comment

Building a CLI-Based People Tracking and Dwell Time Analytics System Using YOLOv8 and DeepSORT

  Introduction Tracking people across video frames and analyzing their behavior (like  dwell time ) is a crucial task for many real-world ap...