/joint-lstm-parser

Transition-based joint syntactic dependency parser and semantic role labeler using a stack LSTM RNN architecture.

Primary LanguageC++

lstm-parser

Transition-based joint syntactic dependency parser and semantic role labeler using stack LSTM RNN architecture

Required software

Checking out the project for the first time

The first time you clone the repository, you need to sync the cnn/ submodule.

git submodule init
git submodule update

Build instructions

mkdir build
cd build
cmake .. -DEIGEN3_INCLUDE_DIR=/path/to/eigen
make -j2

Train a parsing model

The data must be in the CoNLL 2009 format. For best performance, it is suggested to projectivize the treebank data. As a preprocessing step, first convert the treebank data (in CoNLL 2009 format) into transitions, *.transitions (this is a format usable by the joint parser), using the following commands.

java -jar ../jointOracle.jar -inp train.conll -lemmas true > train.transitions
java -jar ../jointOracle.jar -inp dev.conll > dev.transitions

Note that it is required to set the option ``lemmas'' to true for the training data, so that an auxilliary file, train.conll.pb.lemmas is generated, which saves the lemmas of all the predicate words. The joint parser can now run on the generated files.

parser/lstm-parse -T train.transitions -d dev.transitions -w sskip.100.vectors --propbank_lemmas train.conll.pb.lemmas -g dev.conll -e ../eval09.pl -s dev.predictions.conll --out_model joint.model -t

Link to the word vectors that we used in the ACL 2015 paper for English: sskip.100.vectors. The evaluation script, eval09.pl, is provided by CoNLL 2009. The model is written to the current directory.

Note-1: you can also run it without word embeddings by removing the -w option for both training and parsing.

Note-2: the training process should be stopped when the development Macro F1 does not substantially improve anymore.

Note-3: the default hyperparameters are the same as in the paper for the CoNLL 2009 English model. These can be changed by altering the command-line options provided in the lstm-parse.cc code.

Parse data with your parsing model

The test.conll file must also be in to the CoNLL 2009 data format .

java -jar ../jointOracle.jar -inp test.conll > test.transitions

parser/lstm-parse -T train.transitions -d test.transitions -w sskip.100.vectors --propbank_lemmas train.conll.pb.lemmas -m joint.model -s test.predictions.conll -g test.conll -e ../eval09.pl 

The parser will output a file test.predictions.conll with predicted syntax and SRL dependencies.

Citation

If you make use of this software, please cite the following:

@inproceedings{swayamdipta:2016conll,
author={Swabha Swayamdipta and Miguel Ballesteros and Chris Dyer and Noah A. Smith},
title={Greedy, Joint Syntactic-Semantic Parsing with Stack LSTMs},
booktitle={Proc. of CoNLL},
year={2016}
}

Contact

For questions and usage issues, please contact swabha@cs.cmu.edu