- This repository includes a PyTorch implementation of On Evaluating the Generalization of LSTMs in Formal Languages.
- Our paper appeared in the Proceedings of the Society for Computation in Linguistics (SCiL) 2019.
The code is written in Python, and requires PyTorch and a couple of other dependencies. If you would like to run the code locally, please install PyTorch by following the instructions on http://pytorch.org and then run the following command to install the other required packages, which are listed inside requirements.txt
:
pip install -r requirements.txt
exp_type
: The experiment type. The choices aresingle
,distribution
,window
,hidden_units
.distribution
: The distribution regime(s). The choices areuniform
,u-shaped
,left-tailed
,right-tailed
.window
: The training length window. It should be a single (or a list of) integer-pair(s) in the form of a and b, where .lstm_huints
: The number of hidden units in the LSTM model. It should be a single (or a list of) integer(s).
language
: The language in consideration. The choices areab
,abc
,abcd
, representing the languages , , and , respectively.
lstm_hlayers
: The number of hidden layers in the LSTM model. It should be a single positive integer.n_trials
: The number of trials. It should be a single positive integer.n_epochs
: The number of epochs per trial. It should be a single positive integer.sample_size
: The number of training samples. It should be a single positive integer.disp_err_n
: The total number of values in consideration. It should be a single positive integer.
Suppose we would like to investigate the influence of weight initialization on the inductive capabilities of LSTM models in the task of learning the CSL . We may then run the following command:
python main.py --exp_type single --language abc --distribution uniform --window 1 50 --lstm_hunits 3 --disp_err_n 5
Suppose we would like to investigate the influence of various distribution regimes on the inductive capabilities of LSTM models in the task of learning the CSL . We may then run the following command:
python main.py --exp_type distribution --language abc --distribution uniform u-shaped left-tailed right-tailed --window 1 50 --lstm_hunits 3 --disp_err_n 5
Suppose we would like to investigate the influence of the training window on the inductive capabilities of LSTM models in the task of learning the CSL . Assuming that we are considering three training windows [1, 30]
, [1,50]
, and [50, 100]
, we may then run the following command:
python main.py --exp_type window --language abc --distribution uniform --window 1 30 1 50 50 100 --lstm_hunits 3 --disp_err_n 5
If you would like to cite our work, please use the following BibTeX format:
@InProceedings{suzgun2019evaluating,
title={On Evaluating the Generalization of LSTM Models in Formal Languages},
author={Suzgun, Mirac and Belinkov, Yonatan and Shieber, Stuart M.},
journal={Proceedings of the Society for Computation in Linguistics (SCiL)},
pages={277--286},
year={2019},
month={January}
}
Thanks!
We thank Sebastian Gehrmann of Harvard SEAS for his insightful comments and discussions.