/throne2vec

Word2Vec implementation using the Game of Thrones data-set.

Primary LanguageJupyter NotebookMIT LicenseMIT

Throne2Vec

Training a word2vec model on a data-set containing the entire Game of Thrones book collection

This notebook is based on assignment 5 of the Udacity Deep-Learning course.

Besides the data-set, what is new here:

  • Text Pre-Processing
  • Finding word analogies using the learned embedding
  • More detailed comments
  • Optimizations

This is a Jupyter notebook so explanations are included as markdowns in the notebook. Feel free to play around with it and share comments if you have any.

The GOT corpus file is not included in this repository due to book copyrights considerations, sorry about that. However, you can create your own data-set with whichever book (or text in general) you'd like. Just make sure it is in a .zip file with one or more .txt files in it.

Dependencies: