/wordbot

Recurrent Neural Network that predicts word-by-word

Primary LanguageLua

Word Bot

Word bot simplifies training of corpus texts on Windows and keeps most database files in an easy to access shared folders. This is primarily based off all the hard work of rtlee9 using word-level RNN and pre-trained GloVe word vectors. For more information see the original project this is forked from. For additional insight into what is occuring in this program, please see this Eight Portions blog post.

Installation

  1. Install Docker Tools for Windows.

  2. Make a "/dockerdata" directory to use for holding the training files and corpus text files

  3. Open Oracle VirtualBox (Installed with Docker Toolbox)

  4. Make sure the "default" Virtual machine Docker installs is turned off

  5. Go to the "default" machines Settings | Shared Folders

  6. Add the "/dockerdata" directory as a shared named dockerdata. Set to automount and Full access.

  7. Run the following commands from Docker Quick Start Terminal:

  • docker-machine ssh default
  • sudo mkdir /dockerdata
  • sudo mount -t vboxsf /dockerdata /dockerdata
  • Type "exit" and hit enter

The second /dockerdata in this line is for the local directory.

  1. From the Docker Quick Start Terminal run: docker pull kboruff/wordbot

  2. After download, run docker run -v '/dockerdata/:/root/wordbot/dockerdata' -ti kboruff/wordbot bash

  3. Run "ls"

If everything went correctly, you should be in the /root/wordbot folder and a /dockerdata folder should be visible. A placeholder file named input.txt should be in the /root directory. To make sure everything is setup properly, we will move it into the dockerdata folder.

  1. Run "mv ./input.txt ~/wordbot/dockerdata"

  2. CD into the /dockerdata folder and run "ls" again.

  3. You should see the input.txt file.

  4. On the host system, go to the Dockerdata folder you made and you should see the input.txt file in it.

Training

  1. From Docker, run ./train_char.sh or ./train_word.sh

These will check if the GloVe pre-trained vector file have been downloaded already. If not, it downloads them into the /dockerdata folder /glove

  1. Let it run until it is finished.

  2. Identify the best word and character level models in the /cv_char_caps_256_2/ and move them to to /dockerdata/. Rename the files to word-rnn-trained.t7 and char-rnn-trained.t7, respectively.

Usage: sampling

  1. To sample from models, run python sample.py "I will build a"

Credits, inspiration and similar projects

This is a fork of Lars Hiller Eidnes' word-rnn, which is based on Andrej Karpathy's char-rnn.