Frame-semantic parser for automatically detecting semantic frames and their arguments, with the respective labels, from text. Uses a softmax-margin segmental RNN model.
This README only contains instructions for running the baseline SegRNN model, instructions for other models will be updated soon.
- A C++ compiler supporting the C++11 language standard
- Boost libraries
- Eigen (newer versions strongly recommended)
- CMake
- This runs with the Python wrapper to
DyNet-v2
.
Create a dynet-base/ directory here. Follow the python installation directions to set up DyNet.
First, create a data/ directory here, download FrameNet version 1.x and place it under data/fndata-1.x/. Also create a directory data/neural/fn1.x/ for the data files in CoNLL 2009 format, which will be generated in the steps below. Now, create the train, test and dev splits from the data by rewriting the xml files provided in CoNLL 2009 format, with BIOS tags, for simplicity, by executing the following:
cd src/
python preprocess.py 2> err
The above script writes the train, dev and test files in the required format into the data/ folder.
There is plenty of noise in the data, and all the sentences which could not be converted. The location of these, along with the error is written to the standard error.
To use pretrained GloVe word embeddings, download the GloVe files and place them under data/. Run the preprocessing with an extra argument for the intended GloVe file. This trims the GloVe files to the FrameNet vocabulary, to ease memory requirements. For example, the command
python preprocess.py glove.6B.100d.txt 2> err
creates glove.6B.100d.framevocab.txt under data/.
To run the biLSTM frame identification module, execute:
cd src/
python frameid.py --mode test
To train the vanilla SegRNN model, use
cd src/
python segrnn-argid.py 2> err
To test the trained models, use the option "--mode test
".
For questions and usage issues, please contact swabha@cs.cmu.edu