Vectorized Long Short-term Memory (LSTM) using Matlab and GPU
It supports both the regular LSTM described here and the multimodal LSTM described here.
If you are interested, visit here for details of the experiments described in the multimodal LSTM paper.
To run the code, you have to have a NVidia GPU with at least 4GB GPU memory. The code was tested in Ubuntu 14.04 and Windows 7 using MATLAB 2014b.
The task is the same as that in the char-rnn project, which is a good indicator to show if the LSTM implementation is effective.
Open the applications/writer
folder but don't enter it. Run lstm_writer_test.m
and it will start to generate. In the first a few lines of lstm_writer_val.m
you can adjust the starting character. Currently, it starts with "I", so a typical generation is like
I can be the most programmers who would be try to them. But I was anyway that the most professors and press right. It's hard to make them things like the startups that was much their fundraising the founders who was by being worth in the side of a startup would be to be the smart with good as work with an angel round by companies and funding a lot of the partners is that they want to competitive for the top was a strange could be would be a company that was will be described startups in the paper we could probably be were the same thing that they can be some to investors...
Paul Graham's essay is used in this sample. All text is stored in data/writer/all_text.mat
as a string. You may load it manually and see the content. The whole text contains about 2 million characters. To generate the training data, please run data/writer/gen_char_data_from_text_2.m
. It will generate four .mat files under data/writer/graham
, each file contains 10000 character sequences of length 50, so the four files adds upto 2 million characters.
Once the data is ready, you may run lstm_writer_train.m
under applications/writer
to start the training. During training, intermediate models will be saved under results/writer
. You may launch another Matlab and run lstm_writer_test.m
with the newly saved model instead of writer.mat
to test it.
The training procedure of the Multimodal speaker naming LSTM as well as the pre-processed data (the one you can use off-the-shelf) has been releaseed. Please follow the instruction below to perform the training.
Please go here or here to download all the pre-processed training data and put all the files under data/speaker-naming/processed_training_data/
, following the existing folder structure inside.
In addition, please go here or here to download the pre-processed multimodal validation data and put all the files under data/speaker-naming/raw_full/
, following the existing folder structure inside.
Once all the data is in place, you may start to train 3 types of models, namly the model only classifies the face features, the model only classifies the audio features and the model simultaneously classifies the face+audio multimodal features (multimodal LSTM).
To train the face only model, you may run this script.
To train the audio only model, you may run this script.
To train the face+audio multimodal LSTM model, you may run this script.
Meanwhile, you can also run tests for the aforementioned three models by using the pre-train models.
This script for testing the pre-train face only model.
This script for testing the pre-train audio only model.
This script for testing the pre-train face-audio multimodal LSTM model.
Jimmy SJ. Ren, Yongtao Hu, Yu-Wing Tai, Chuan Wang, Li Xu, Wenxiu Sun, Qiong Yan,
"Look, Listen and Learn - A Multimodal LSTM for Speaker Identification", The 30th AAAI Conference on Artificial Intelligence (AAAI-16).