/trex-parser

transition-based dependency parser

Primary LanguagePython

Trex-Parser : transition-based parser for unlabeled dependency parsing

Introduction

Trex-Parser is a minimalist dependency parser loosely based on the model described in the 2012 paper by Bohner and Nivre. The feature sets are one-hot vectors of words and POS tags in the stack and buffer. At each time-step, the parser chooses one of the three transitions: LeftArc, RightArc or Shift that is assigned the highest joint probability by two models: a Multiclass Perceptron and an Arc model.

The parser is tuned on the English and German datasets of CoNLL 2006 shared task.

Models

The Arc model assigns a probability to the transitions LeftArc and RightArc conditioned on the POS tags of the top 2 elements in the stack. In other words, it measures P(a|h, c): the probability of arc a, given head h and child c. For convenience we assign probability "1" to the transition Shift.

The multiclass Perceptron assigns a probability to each of the three transitions, given the full feature model.

Feature Model

Features are one-hot representations of the top w elements in stack and buffer. We take the word-form, lemma and POS tags of these elements. The size of w is determined by the hyperparameter ws (window size). The vectors are concatenated, resulting in a high-dimensional feature vector (681,145 and 1,137,613 dimensions for English and German respectively when w = 12).

However we only store 3 high-dimensional vectors: the weight vectors of the perceptron. The feature vectors, instead, are the indices of the high-dimensional feature vectors with values 1. This makes the parser feasibly fast: average inference time per sentence was 0,253s for German and 0.550s (with b1 = 25 and ws=12).

Hyper-parameters

  • epochs - number of training epochs
  • ws - window size, number of top elements in stack and buffer used for feature extraction
  • b1 - beam size
  • alpha - skip arcs that are assigned probabilities less than this value (default 0 means we only allow arcs for POS-pairs that are seen in the traning set)

TODOs

The model seems to have a bug: it attained UAS 0.935 for English on the development set, but on the test set the unlabeled attachment score was below 0.75.