/Mogrifier-LSTM

Primary LanguagePythonMIT LicenseMIT

Mogrifier LSTM

PyTorch implementation of the paper "Mogrifier LSTM", Gábor Melis and Tomáš Kočiský and Phil Blunsom, International Conference on Learning Representations, 2020. https://openreview.net/pdf?id=SJe5P6EYvS

Dependencies

  • Compatible with Python3.6 and Pytorch 1.2.0
  • The necessary packages can be install through requirements.txt.

Setup

Install VirtualEnv using the following (optional):

$ [sudo] pip install virtualenv

We recommend creating a virtual environment(optional):

$ virtualenv -p python3 venv
$ source venv/bin/activate

Finally, install the required packages by running:

pip install -r requirements.txt
Datasets

Data is provided in /data

  • PTB
  • WikiText-2
Models

Implementations of particular models can be found in /src/components/

  • Mogrifier LSTM
  • Transformer
  • Vanilla LSTM

Training the model

For training the model with default hyperparameter settings, execute the following command:

python -m src.main -mode train -run_name testrun -dataset <DatasetName> \
-model_type <ARCHITECTURE> -gpu <GPU-ID>
  • run_name: A unique identifier for an experiment, the location for storing model checkpoints and logs are determined using this.
  • dataset: Which dataset to train and validate the model on, choose from the list of datasets mentioned above. Options,
    • ptb
    • wikitext-2
  • model_type: Which type of neural network architecture would you like to choose for running the experiments, choose from RNN (by default LSTM is used), SAN (Transformer encoder). Options
    • Mogrify : Mogrifier LSTM
    • SAN : Transformer model
    • RNN : LSTM (default)/GRU/RNN
  • gpu: For a multi-gpu machnine, specify the id of the gpu where you wish to run the training process. In case of single gpu just put 0 to use the default gpu. Note that the currently the code won't run without a GPU, we will provide support for running it on a CPU shortly.

Other hypeparameters can be found in the file src/args.py. Some important hyperparameters that might be worth noting are given below:

  • pos_encode: Only applicable when model_type is SAN, adding -pos_encode in the training command described above, will initialize a transformer that uses absolute positional encodings. Without adding it, the model will not use any form of positional encoding.
  • hidden_size: Applicable for model_type RNN only and is the hidden size to be used in the network.
  • d_model: Applicable for model_type SAN and is the size of the intermediate vectors used in the network. Eg. usage -d_model 32
  • heads: Also applicable for SAN only, specifies the number of attention heads to use.
  • depth: Number of layers that you would like to initialize your network with.

Details of other arguments can be found in src/args.py.