Morti-OS: Machine Operated Relationship Trained Input-Output System
Morti was built and tested on Anaconda using Python 3.5
and Tensorflow 1.8
. A full list of requirements can be found in bin/requirements.txt
or just use the command below.
pip install -r bin/requirements.txt
While this is currently a work in progress, I have included a simple CLI for setting up and running Morti-OS. You can run the CLI by running the command bash Morti-CLI
from the root folder. Instruction for using the CLI will be found if run.
Basic Usage and Setup will be done through the CLI, some main features include Anaconda Environment Setup, Django Migrations, Training, Monitoring and Logging. The CLI is currently in development, as we add new modules to Morti-OS we will be changing the functionalities of the CLI, but the main goal is the same: Dummy Proof Morti.
To visualize the computational graph and the cost with TensorBoard, just run tensorboard --logdir save/
.
As an added benefit, the RTC was built with Django, so we can use Morti in a simpler more apealling manner. On the Web UI you can check Hyperparams, grab Graph data, and interact with the Neural Networks. Eventually i plan to make training and editing model parameters through the Web Interface.
Modules working with Web Interface:
- RTC Chatbot
Based off of Conchylicultor/DeepQA.
This work tries to reproduce the results of A Neural Conversational Model (Google chatbot). Using a RNN (seq2seq model) for sentence predictions. It is done using Python 3.5 and TensorFlow 1.8.
The loading corpus part of the program is inspired by the Torch neuralconvo from macournoyer. For now, DeepQA support the following dialog corpus:
- Cornell Movie Dialogs corpus (default). Already included when cloning the repository.
- OpenSubtitles (thanks to Eschnou). Much bigger corpus (but also noisier). To use it, follow those instructions and use the flag
--corpus opensubs
. - Supreme Court Conversation Data (thanks to julien-c). Available using
--corpus scotus
. See the instructions for installation. - Ubuntu Dialogue Corpus (thanks to julien-c). Available using
--corpus ubuntu
. See the instructions for installation. - Your own data (thanks to julien-c) by using a simple custom conversation format (See here for more info).
To train the model, simply run main.py
. Once trained, you can test the results with main.py --test
(results generated in 'save/model/samples_predictions.txt') or main.py --test interactive
(more fun).
Here are some flags which could be useful. For more help and options, use python main.py -h
:
--modelTag <name>
: allow to give a name to the current model to differentiate between them when testing/training.--keepAll
: use this flag when training if when testing, you want to see the predictions at different steps (it can be interesting to see the program changes its name and age as the training progress). Warning: It can quickly take a lot of storage space if you don't increase the--saveEvery
option.--filterVocab 20
or--vocabularySize 30000
: Limit the vocabulary size to and optimize the performances and memory usage. Replace the words used less than 20 times by the<unknown>
token and set a maximum vocabulary size.--verbose
: when testing, will print the sentences as they are computed.--playDataset
: show some dialogue samples from the dataset (can be use conjointly with--createDataset
if this is the only action you want to perform).
Based off of Kyubyong/tacotron .
A (Heavily Documented) TensorFlow Implementation of Tacotron: A Fully End-to-End Text-To-Speech Synthesis Model. While the title claims to be well documented i have yet to see any support for this model, I will be Reverse engineering this sytem to work with our Current Morti-OS System.
We train the model on three different speech datasets.
LJ Speech Dataset is recently widely used as a benchmark dataset in the TTS task because it is publicly available. It has 24 hours of reasonable quality samples. Nick's audiobooks are additionally used to see if the model can learn even with less data, variable speech samples. They are 18 hours long. The World English Bible is a public domain update of the American Standard Version of 1901 into modern English. Its original audios are freely available here. Kyubyong split each chapter by verse manually and aligned the segmented audio clips to the text. They are 72 hours in total. You can download them at Kaggle Datasets.
- Download LJ Speech Dataset or prepare your own data.
- Adjust hyper parameters in
hyperparams.py
. (If you want to do preprocessing, setprepro
True`. - Run
python train.py
. (If you setprepro
True, runpython prepro.py
first) - Run
python eval.py
regularly during training.
We generate speech samples based on Harvard Sentences as the original paper does. It is already included in the repo.
- Run
python synthesize.py
and check the files indata/samples/
.
RTC Chatbot:
- Train with higher word limit (current limit is 5 words)
- Merge TTS to RTC
Text-to-speech:
-
6/20/2018: Working on Pre Processor for TTS
-
Working on way to automatically format large audio files and create a transcript based on smaller audio files.
- Goal: With any video downloaded from Youtube, you can split it the audio based on silence, then you each file is passed through the processor extracting subtitles from file, and get the Formatted Transcript, with Audio ID's.
-
Currently Cleaning up audio data for SM-W
-
Audio File Created, 16PCM 22050HZ Stero (needs to be Mono)
Contributions:
- Conchylicultor: A tensorflow implementation of a Deep learning based chatbot
- Kyubyong: A TensorFlow Implementation of Tacotron: A Fully End-to-End Text-To-Speech Synthesis Model
References/Articles: