Are you ready to create your own personal voice assistant? With Python this is possible! In this blog post We’ll show you how to create a personal voice assistant that can search Google, YouTube, Wikipedia, perform calculations, open applications, and more, all through simple voice commands. The assistant takes advantage of Python’s versatility and uses various API libraries to process commands and provide output.
Let’s dive in!
Why Python for Building a Voice Assistant?
Python is ideal for creating personal voice assistants due to its simplicity and large library. You can easily integrate Google Speech Recognition, Google Text-to-Speech (gTTS), WolframAlpha API for computation, and Selenium for web operations. This flexibility makes Python an ideal language for developing custom automation scripts. Including voice assistant.
Features of the Python Voice Assistant
Your personal voice assistant with just a few lines of code:
- Convert speech to text and vice versa.
- Search Google Wikipedia or YouTube using voice commands.
- Open the application installed on your system.
- Perform mathematical calculations using the WolframAlpha API.
Required Libraries
Here is a list of the external Python packages you will need.
- gTTS: Convert text to speech using Google Text-to-Speech.
- Speech_recognition: Recognizes speech and converts it to text.
- Selenium: Automate browser activities such as web searches.
- WolframAlpha: for mathematical calculations.
- Playsound: Play audio files from the system.
You can install these libraries with:
pip install gtts speechrecognition selenium wolframalpha playsound
Step-by-Step Code Breakdown
1. Default settings: Speech to Text and Text to Speech. First, we need to set up the speech recognition and text-to-speech components. It has basic code that accepts voice input, processes it, and speaks a response to the user.
import speech_recognition as sr
import playsound
from gtts import gTTS
import os
import wolframalpha
from selenium import webdriver
def assistant_speaks(output):
print("Assistant: ", output)
toSpeak = gTTS(text=output, lang='en', slow=False)
toSpeak.save("output.mp3")
playsound.playsound("output.mp3", True)
os.remove("output.mp3")
def get_audio():
r = sr.Recognizer()
with sr.Microphone() as source:
print("Listening...")
audio = r.listen(source, phrase_time_limit=5)
try:
text = r.recognize_google(audio, language='en-US')
print("You: ", text)
return text
except Exception as e:
assistant_speaks("Sorry, I couldn't understand. Please try again!")
return None
2. Main Program: User Interaction In this section, we request the user’s name. and starts looping to listen to the user’s commands. The assistant will listen until the user says “Bye” or “Exit”.
if __name__ == "__main__":
assistant_speaks("What's your name?")
name = get_audio()
assistant_speaks("Hello, " + name + ". How can I assist you today?")
while True:
assistant_speaks("What can I do for you?")
command = get_audio().lower()
if "bye" in command or "exit" in command:
assistant_speaks("Goodbye!")
break
process_text(command)
3. Processing of user orders: The process_text() function handles commands such as web searches, calculations, and application launches.
def process_text(input):
if "search" in input or "play" in input:
search_web(input)
elif "calculate" in input:
calculate_query(input)
elif "open" in input:
open_application(input)
else:
assistant_speaks("I can search the web for you. Do you want me to do that?")
response = get_audio()
if 'yes' in response:
search_web(input)
4. Web Search and Application Handling
The assistant can search the web and open applications like Chrome and Word using voice commands.
def search_web(query):
driver = webdriver.Firefox()
if 'youtube' in query:
assistant_speaks("Opening YouTube")
driver.get("http://www.youtube.com/results?search_query=" + '+'.join(query.split()[1:]))
elif 'wikipedia' in query:
assistant_speaks("Opening Wikipedia")
driver.get("https://en.wikipedia.org/wiki/" + '_'.join(query.split()[1:]))
else:
assistant_speaks("Searching Google")
driver.get("https://www.google.com/search?q=" + '+'.join(query.split()[1:]))
def open_application(query):
if "chrome" in query:
assistant_speaks("Opening Chrome")
os.startfile("C:/Program Files (x86)/Google/Chrome/Application/chrome.exe")
elif "word" in query:
assistant_speaks("Opening Word")
os.startfile("C:/ProgramData/Microsoft/Windows/Start Menu/Programs/Microsoft Office/Word.lnk")
Wrapping It All Up
When you follow these important steps finished You can combine these functions with a fully functional personal voice assistant. Your assistant:
- Understanding voice commands.
- Respond verbally.
- Search, open the application, and calculate.
You can further improve this project by integrating natural language processing (NLP) to understand more complex questions, or add additional APIs for additional functionality.
Common voice commands you can use:
- “Search Google for Python tutorials”
- “Play popular YouTube encoded songs”
- “Wikipedia Python programming language”
- “Open Microsoft Word” “Count 5+
This project is highly scalable. So feel free to add additional features as you improve your Python skills, such as smart home integration or email management.
Conclusion
Building a personal voice assistant in Python is a great way to improve your programming skills and automate everyday tasks. With libraries like gTTS, SpeechRecognition, Selenium, and WolframAlpha, you can implement a wide range of functionalities. Happy coding!