Syntactic parsing, the task of learning grammar structure, acts as the backbone of naturallanguage understanding and has been shown to benefit many downstream natural language processing tasks. Toward this pursuit, we introduce two different word representations to ON-LSTM model and evaluate their grammar induction performances.
To run the code, you will need an environment with the following:
- Python (>3.6)
- PyTorch
- CUDA (strongly recommended)
- NLTK
To run the evaluation script, you will also need Penn Treebank database.
-
Please make sure you have the install NLTK PTB package and have the PTB corpus in the directory. For more details, please refer to ON-LSTM repo and find instructions.
-
To run the language modeling training, use the following command
python main_gpt.py --cuda --mode GPT --learning_rate 1e-6 --lr 10 --batch_size 20 --dropoute 0.0 --dropout 0.45 --dropouth 0.3 --dropouti 0.0 --wdrop 0.45 --chunk_size 10 --seed 141 --epoch 1000
-
To test the model on the unsupervised parsing task, please use
python test_phrase_grammar.py --cuda
- Shen, Yikang, et al. "Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks." Proceedings of ICLR (2019).
- Code for the paper "Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks" https://github.com/yikangshen/Ordered-Neurons
- The Big-&-Extending-Repository-of-Transformers: Pretrained PyTorch models for Google's BERT, OpenAI GPT & GPT-2, Google/CMU Transformer-XL. https://github.com/huggingface/pytorch-pretrained-BERT