Introduction
Voice-based AI assistants are becoming increasingly popular in applications ranging from customer support to personal assistants. This article walks through the implementation of a voice-enabled chatbot that allows users to interact with an AI model by speaking instead of typing.
The chatbot integrates speech recognition and text-to-speech (TTS) with a Retrieval-Augmented Generation (RAG) chatbot powered by Mistral LLM. This allows the bot to listen to user queries, retrieve relevant information, generate responses, and speak them back — creating a seamless conversational experience.
Installation
pip install pyttsx3 SpeechRecognition1. Initializing the Voice Chatbot
The chatbot is implemented in a class called VoiceChatbot, which extends the existing text-based chatbot (ChatBot).
import pyttsx3
import speech_recognition as sr
from chatbot import ChatBotKey Libraries Used
pyttsx3→ Converts text responses into speech.speech_recognition→ Captures and transcribes user speech.ChatBot(imported fromchatbot.py) → Handles text-based AI interactions. (chatbot code and explanation)
2. Setting Up Speech Recognition and Text-to-Speech
class VoiceChatbot():
def __init__(self, name):
self.name = name
self.chatbot = ChatBot() # Initialize the AI chatbot
# Set up text-to-speech engine
self.engine = pyttsx3.init()
# Set up speech recognition
self.voice_recognizer = sr.Recognizer()How It Works
- The chatbot is initialized with a name (e.g.,
"Faheem Bot"). - The
pyttsx3TTS engine is used to convert text responses into speech. speech_recognition.Recognizer()is used to capture and process user speech.
3. Capturing User Speech
The listen() method records audio from the microphone and converts it into text.
def listen(self):
with sr.Microphone() as source:
audio = self.voice_recognizer.listen(source, phrase_time_limit=5)
print("Processing...")
try:
text = self.voice_recognizer.recognize_google(audio)
return text
except Exception as e:
print("Error: " + str(e))
return NoneHow It Works
- Records audio from the microphone (5-second limit).
- Uses Google Speech Recognition API to convert speech into text.
- If speech recognition fails, it prints an error message.
This ensures the chatbot can capture spoken user input and process it for generating a response.
4. Converting AI Responses into Speech
The chatbot uses the speak() method to vocalize responses.
def speak(self, text):
self.engine.say(text)
self.engine.runAndWait()How It Works
- Calls
pyttsx3.say(text)to convert text into speech. - Uses
runAndWait()to ensure speech is fully played before moving to the next operation.
This creates a fluid, human-like conversational experience where the chatbot speaks back to the user.
5. Running the Conversational Loop
The run() method initializes the chatbot and starts an interactive voice-based conversation.
def run(self):
self.chatbot.ask(f"You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, say that you don't know. Use three sentences maximum and keep the answer concise. Your name is {self.name}.")
self.speak(f"Hello, I am {self.name}. How can I help you today?")
while True:
prompt = self.listen()
if prompt is not None:
print("You: " + prompt)
response = self.chatbot.ask(prompt)
print(self.name, ":", response)
if "Have a great day!" in response:
exit()
self.speak(response)
else:
self.speak("I'm sorry, I didn't understand that.")How It Works
- Initializes the chatbot with a system message.
- Greets the user and speaks its name.
- Enters a loop where:
- The chatbot listens for user input.
- The recognized text is printed.
- The AI model generates a response based on user input.
- The response is spoken aloud using
pyttsx3. - If the response is
"Have a great day!", the chatbot exits the conversation.
4. If speech recognition fails, the chatbot apologizes and prompts the user to repeat.
This loop allows for a continuous, real-time conversation, just like talking to a human.
6. Running the Chatbot
To start the chatbot, run:
if __name__ == "__main__":
bot = VoiceChatbot("Faheem Bot")
bot.run()This initializes and launches the voice assistant, allowing users to interact with it hands-free.
Conclusion
This voice-enabled chatbot combines speech recognition, text-to-speech, and AI-driven responses to create a natural conversational experience. By leveraging Mistral LLM and RAG, the chatbot can retrieve information and answer questions based on custom company data.
Reference
For chatbot class visit following article.
https://artificialintelligence-code.blogspot.com/2025/04/building-chatbot-with-mistral-llm-and.html
No comments:
Post a Comment