Python-Audio-command-recognizer: A Python repository from aXlireza

# Version
 0.1.6

# Updates
    New Model layers plus more and improved dataset with an enhanced realtime program with activation key command

# How to use
 You can run the realtime.py file already which won't require you any other action, it will start displaying a table of commands and how confident the model is that the voice had the specific command (the highest will be highlighted by an asterisk)

# Dataset
 on this version, the main dataset is from http://storage.googleapis.com/download.tensorflow.org/data/mini_speech_commands.zip with some custom commands datasets(which got very few samples so not doing much good on 'em)

 You can Download the original dataset into the data folder and copy the content of data/custom_commands into the data/mini_speech_commands directory

# Roadmap
- The create_model.py file is where I train the model and store it in a keras file
- The record_audio.py records a single second voice
- The record_sequence.py records every second into a new file and is stored in the path defined in the file in sequence of 0.wav, 1.wav, 2.wav, ...
- The test_model.py, you can set a couple of wav files for test (1 second)
- finally the readltime.py opens the mic and chups the input into 1sec sequnces and puts each seconds into prediction and prints the likelihood of each command in the voice

# The problems to solve in near future
- Higher quality model

# TODO
- Voice Activation Command
- Pickup specific voice frequency
- Recognize all the voice frequencies in the room and what they say(but don't run the command unless allows by master)
- Multi step commands(like how it waits for next command after the activation key command)

# Challenges
- On the realtime.py where I try to records small cunks of data and put them back together to form a 1sec sound and feed it directly to the model(without saving it to a file)
    - It wouldn't give the same quality apparently to the model to predict correctly
    - Making too much small chunks would make the prediction repeat many many times while only one word was said
- Realized in the dataset, having to many samples for a command rather than small amount of samples for another one, makes an unfair playground for the other cammands(that have fewer samples) and bias the prediction toward the larger command sample set
- When music or other sounds are setting noise, it wouldn't understand me and would make sense of the song speech, which is not intended => So => I'm trying to filter out any frequencies rather my voice frequency which requires to understand my frequency first
aXlireza/Python-Audio-command-recognizer