A simple voice assistant that uses vosk
for speech-to-text and silero
for text-to-speech. The assistant can be activated using a wake word detection system from picovoice
.
It can be easilty extended to include more features like weather, news, etc.
- Wake word detection
- Configurable commands that can be set using simple
.json
file - Possibility of executing shell commands or any other scripts you want
- Easy ChatGPT integration
- Text-to-speech using open source multilanguage pretrained
silero
models
- Go to https://alphacephei.com/vosk/models and choose the model you want to use
- I recommend to first try small models like
vosk-model-small-en-us-0.15
orvosk-model-small-en-us-0.4
- Download the model and extract it to the project folder
- Add the path to the model in the
.env
file
- go to https://picovoice.ai/platform/porcupine/
- register and create a wake word for the assistant
- Grab the API key and add it to the
.env
file toPICOVOICE_API_KEY
variable - Download the
.ppn
file and add it to the project folder - Add the path to the
.ppn
file in the.env
file toPICOVOICE_KEYWORD_PATH
variable
- Go to the silero repo and choose the model you want to use
- Add needed variables to the
.env
file - Cache for the model will be created after the first launch in the
.cache/torch/hub/snakers4_silero-models_master
folder
- Clone the repository
- Install poetry using
pip install poetry
- Run
poetry install
to install the dependencies - Create a
.env
file. You can use the.env.example
file as a template - Follow the instructions above to prepare the
vosk
model andpicovoice
wake word detection - Run the assistant using
poetry run python main.py
- Say the wake word and ask the assistant to tell you
current time
- You are all set!
P.S. You can add more commands following this guide
- Create a directory in the
commands
folder with the name of the command - Create a
config.json
file with the following structure:
{
"commands": [
{
"name": "hello",
"action": "voice",
"aliases": [
"hey there",
"hi",
"hello"
],
"responses": [
"Hello, how can I help you?"
]
}
]
}
- That's basically it. You can add more commands to the
commands
array. The assistant will randomly choose one of the responses from theresponses
array
- Create a directory in the
commands
folder with the name of the command - Create a
config.json
file with the following structure:
{
"commands": [
{
"name": "open_browser",
"action": "script",
"aliases": [
"open browser",
"run browser",
"open google",
"open chrome"
],
"params": {
"url": "https://www.google.com"
}
}
]
}
- The
action
field should be set toscript
if you want to execute a script or a shell command - Create a
script.py
file in the command directory - It should contain a function with name
script
that looks like this:
def script(*args, **kwargs) -> None:
...
-
You can add a
depends_on
property to the command, so it can use another command's script. In this case you don't need to create a separate folder and a new config.For example, you can add a
open_youtube
command that depends onopen_browser
command.This is useful, because you don't need to write the same script multiple times. The
open_youtube
command can just call theopen_browser
script and then open youtube. Theconfig.json
file should look like this:
{
"commands": [
{
"name": "open_youtube",
"action": "script",
"aliases": [
"open youtube",
"run youtube"
],
"depends_on": "open_browser",
"params": {
"url": "https://www.youtube.com"
}
}
]
}
- !NOT TESTED! You can use any structure you want inside the command directory. But the main script should be named
script.py
and contain a functionscript
that will be executed when the command is called.
- Go to https://platform.openai.com/api-keys and create an API key
- Add the API key to the
.env
file to theOPENAI_API_KEY
variable - Add the
chat_gpt
command to thecommands
folder - The
config.json
file should look like this:
{
"commands": [
{
"name": "chat_gpt",
"action": "chat_gpt",
"aliases": [
"tell me",
"search for"
]
}
]
}
- Current implementation of ChatGPT is not perfect. It only works with the success responses etc. Will be improved ASAP
- You may face a problem with incorrect input device. Check
device_index
in the_init_recorder
function in thespeech_to_text.py
file - Commands
open_browser
andopen_youtube
logs WARNING messages, because of the incorrect response of the script execution. This will be fixed in the future - Any other issues can be reported in the issues section. I will try to help you as soon as possible