/ash-parser

Attempt to clone SyntaxNet using only Python, with GPU support

Primary LanguagePython

ash-parser

This was originally for a class project.

Utilizes a Chen and Manning (2014) style neural network parser in Python and TensorFlow. Many elements mimic SyntaxNet.

I analyze SyntaxNet's Architecture here.

parsing-config file is required to be created in the model directory before execution.

Run training_test.sh for an example of how to train a model. Evaluation during training works as well, but there is no API for tagging new input yet or serving a model.

External dependencies

  • NumPy
  • TensorFlow 1.0

Similarities to SyntaxNet

  • Same embedding system (configurable per-feature group deep embedding)
  • Same optimizer (Momentum with exponential moving average)
  • Lexicon builder is identical for words, tags, and labels
  • Map files output by SyntaxNet and AshParser should be identical
  • Evaluation metric is identical (SyntaxNet's corresponds to AshParser's UAS)
  • Feature system is almost identical (except perhaps some very rare corner cases)
  • Due to same architecture, accuracy should be very close to Greedy SyntaxNet

Differences from SyntaxNet:

  • Arc-Eager transition system also supported
  • Context file with redundant or boilerplate information is unnecessary
  • Supports GPU: training phase can complete in minutes
  • Pure Python3 implementation. No need for bazel
  • LAS (Labeled Attachment Score) prints out during evaluation
  • Precalculation and caching of feature bags. This makes it easier to train multiple models with the same token features but different hyperparameters
  • No support for structured (beam) parsing. Considering LSTM or something simpler and faster instead for the future. Accuracy loss should be in the ballpark of 1-2% due to this.
  • Feature groups are automatically created by groups of tag, word, and label rather than by grouping together with semicolon in a context file
  • Only support for the transition parser, not the POS tagger, morphological analyzer, or tokenizer
  • ngrams, punctuation_amount, morph tags and other features not yet implemented