Frame-semantic parser for automatically detecting FrameNet frames and their frame-elements from sentences. The model is based on softmax-margin segmental recurrent neural nets, described in our paper Frame-Semantic Parsing with Softmax-Margin Segmental RNNs and a Syntactic Scaffold. An example of a frame-semantic parse is shown below
This project is developed using Python 2.7. Other requirements include the DyNet library, and some NLTK packages.
pip install dynet
pip install nltk
python -m nltk.downloader averaged_perceptron_tagger wordnet
This codebase only handles data in the XML format specified under FrameNet. However, we first reformat the data for ease of readability.
-
First, create a
data/
directory here, download FrameNet version 1.x and place it underdata/fndata-1.x/
. Also create a directorydata/neural/fn1.x/
to convert to CoNLL 2009 format. -
Convert the data into a format similar to CoNLL 2009, but with BIO tags, by executing:
cd src/
python preprocess.py 2> err
The above script writes the train, dev and test files in the required format into the data/neural/fn1.x/
directory. There is plenty of noise in the annotations. The annotations which could not be used, along with the error messages, gets spit out to the standard error.
- [Optional, but highly recommended] If you want to use pretrained GloVe word embeddings, download and extract them under
data/
. Run the preprocessing with an extra argument for the intended GloVe file.
python preprocess.py glove.6B.100d.txt 2> err
This trims the GloVe files to the FrameNet vocabulary, to ease memory requirements. For example, the above creates data/glove.6B.100d.framevocab.txt
to be used by our models.
Frame identification is based on a bidirectional LSTM model.
To train the frame identification module, execute:
cd src/
python frameid.py
This saves the best model on validation data in the directory src/tmp/
, which will be pointed to by the symbolic link src/model.frameid.1.x
. Pre-trained models coming soon.
To test under the best model in src/model.frameid.1.x
, execute:
python frameid.py --mode test > frameid.log
frameid.log
will contain example-wise analysis. The output, in CoNLL 2009 format will be written to predicted.1.x.frameid.test.out
and in the frame-elements file format to my.predict.test.frame.elements
.
Argument identification is based on a segmental recurrent neural net model, used as a baseline in our paper.
To train an argument identifier, execute:
cd src/
python segrnn-argid.py 2> err
This saves the best model on validation data in the directory src/tmp/
, which will be pointed to by the symbolic link src/model.segrnn-argid.1.x
. Pre-trained models coming soon.
To test under the best model in src/model.segrnn-argid.1.x
, execute:
python segrnn-argid.py --mode test > argid.log
For questions and usage issues, please contact swabha@cs.cmu.edu
. If you use open-sesame for research, please cite our paper as follows:
@article{swayamdipta:17,
title={{Frame-Semantic Parsing with Softmax-Margin Segmental RNNs and a Syntactic Scaffold}},
author={Swabha Swayamdipta and Sam Thomson and Chris Dyer and Noah A. Smith},
journal={arXiv preprint arXiv:1706.09528},
year={2017}
}