ash-parser

This was originally for a class project.

Utilizes a Chen and Manning (2014) style neural network parser in Python and TensorFlow. Many elements mimic SyntaxNet.

I analyze SyntaxNet's Architecture here.

parsing-config file is required to be created in the model directory before execution.

Run training_test.sh for an example of how to train a model. Evaluation during training works as well, but there is no API for tagging new input yet or serving a model.

External dependencies

NumPy
TensorFlow 1.0

Similarities to SyntaxNet

Same embedding system (configurable per-feature group deep embedding)
Same optimizer (Momentum with exponential moving average)
Lexicon builder is identical for words, tags, and labels
Map files output by SyntaxNet and AshParser should be identical
Evaluation metric is identical (SyntaxNet's corresponds to AshParser's UAS)
Feature system is almost identical (except perhaps some very rare corner cases)
Due to same architecture, accuracy should be very close to Greedy SyntaxNet

Differences from SyntaxNet:

Arc-Eager transition system also supported
Context file with redundant or boilerplate information is unnecessary
Supports GPU: training phase can complete in minutes
Pure Python3 implementation. No need for bazel
LAS (Labeled Attachment Score) prints out during evaluation
Precalculation and caching of feature bags. This makes it easier to train multiple models with the same token features but different hyperparameters
No support for structured (beam) parsing. Considering LSTM or something simpler and faster instead for the future. Accuracy loss should be in the ballpark of 1-2% due to this.
Feature groups are automatically created by groups of tag, word, and label rather than by grouping together with semicolon in a context file
Only support for the transition parser, not the POS tagger, morphological analyzer, or tokenizer
ngrams, punctuation_amount, morph tags and other features not yet implemented

xtknight/ash-parser

ash-parser