This repository was used for an experimental evaluation of three natural language parsing systems:
- Stanford Probabilistic Context Free Grammar (PCFG) Parser: unlexicalized PCFG using maximum-likelihood estimates from the Penn treebank for its rule probabilities and the CKY algorithm to select the best parse from the parse forest
- MALT arc-eager parser: system for data-driven dependency-only parsing which implements 9 different deterministic
parsing algorithms and 2 machine learning packages. We use the
nivreeager
parsing algorithm alongside thelibsvm
learning algorithm for the transition predictions and its default features - Stanford Neural Dependency Parser: Also a transition-based parser, but improves on the MALT parser by a) using a different transition system, b) using embedding-based feature representations, c) using a neural network with cube activation function instead of an SVM as a classifier
To run the code, please download the following parsers:
- Stanford Core NLP library release 4.3.2 - save the extracted folder in the root of this repository. This contains both the NN and PCFG parser
- MALT Parser version - also save in repository root
- MaltEval - evaluation tool for dependency parsers
The code is separated into multiple pipeline steps, which can be configured in the config.yml
.
src.parse_malt
: Runs the MALT parser on the gold standard sentences and saves the parsed output indata/parses/malt
src.parse_pcfg
: Runs the PCFG parser on the gold standard sentences and saves the parsed output indata/parses/pcfg/conllu
src.parse_nn
: Runs the PCFG parser on the gold standard sentences and saves the parsed output indata/parses/pcfg/conllu
src.gold_standard
: Reads in thedata/gold_standard
and converts it to CoNLL-U formatsrc.eval
: Evaluates all parsers by dependency relation type and relation lengthsrc.treeview
: LaunchesMaltEval TreeViewer
(see example below)
On first execution, it's important that all pipeline steps are run. Following this, you can toggle the desired pipeline steps on and off to only run a subset. After first execution, each pipeline step will run independent of the others.
After executing the full pipeline, you can review the output of each parser using the TreeViewer
. To select which
parser to view set the treeview
flat in the config.yml
.