An implementation of "Deep Biaffine Attention for Neural Dependency Parsing".
Details and hyperparameter choices are almost identical to those described in the paper, except that we provide the Eisner rather than MST algorithm to ensure well-formedness. Practically, projective decoding like Eisner is the best choice since PTB contains mostly (99.9%) projective trees.
Besides the basic implementations, we also provide other features to replace the POS tags (TAG), i.e., character-level embeddings (CHAR) and BERT.
python
: 3.7.0pytorch
: 1.3.0transformers
: 2.1.1
The model is evaluated on the Stanford Dependency conversion (v3.3.0) of the English Penn Treebank with POS tags predicted by Stanford POS tagger.
For all datasets, we follow the conventional data splits:
- Train: 02-21 (39,832 sentences)
- Dev: 22 (1,700 sentences)
- Test: 23 (2,416 sentences)
FEAT | UAS | LAS | Speed (Sents/s) |
---|---|---|---|
TAG | 95.90 | 94.25 | 1696.22 |
TAG + Eisner | 95.93 | 94.28 | 350.46 |
CHAR | 95.99 | 94.38 | 1464.59 |
CHAR + Eisner | 96.02 | 94.41 | 323.73 |
BERT | 96.64 | 95.11 | 438.72 |
BERT + Eisner | 96.65 | 95.12 | 214.68 |
Note that punctuation is ignored in all evaluation metrics for PTB.
Aside from using consistent hyperparameters, there are some keypoints that significantly affect the performance:
- Dividing the pretrained embedding by its standard-deviation
- Applying the same dropout mask at every recurrent timestep
- Jointly dropping the word and additional feature representations
For the above reasons, we may have to give up some native modules in pytorch (e.g., LSTM
and Dropout
),
and use custom ones instead.
As shown above, our results have outperformed the offical implementation (95.74 and 94.08). Incorporating character-level features or external embeddings like BERT can further improve the performance of the model.
You can start the training, evaluation and prediction process by using subcommands registered in parser.cmds
.
$ python run.py -h
usage: run.py [-h] {evaluate,predict,train} ...
Create the Biaffine Parser model.
optional arguments:
-h, --help show this help message and exit
Commands:
{evaluate,predict,train}
evaluate Evaluate the specified model and dataset.
predict Use a trained model to make predictions.
train Train a model.
Before triggering the subcommands, please make sure that the data files must be in CoNLL-X format. If some fields are missing, you can use underscores as placeholders. Below are some examples:
$ python run.py train -p -d=0 -f=exp/ptb.char --feat=char \
--ftrain=data/ptb/train.conllx \
--fdev=data/ptb/dev.conllx \
--ftest=data/ptb/test.conllx \
--fembed=data/glove.6B.100d.txt \
--unk=unk
$ python run.py evaluate -d=0 -f=exp/ptb.char --feat=char --tree \
--fdata=data/ptb/test.conllx
$ cat data/naive.conllx
1 Too _ _ _ _ _ _ _ _
2 young _ _ _ _ _ _ _ _
3 too _ _ _ _ _ _ _ _
4 simple _ _ _ _ _ _ _ _
5 , _ _ _ _ _ _ _ _
6 sometimes _ _ _ _ _ _ _ _
7 naive _ _ _ _ _ _ _ _
8 . _ _ _ _ _ _ _ _
$ python run.py predict -d=0 -f=exp/ptb.char --feat=char --tree \
--fdata=data/naive.conllx \
--fpred=naive.conllx
$ cat naive.conllx
1 Too _ _ _ _ 2 advmod _ _
2 young _ _ _ _ 0 root _ _
3 too _ _ _ _ 4 advmod _ _
4 simple _ _ _ _ 2 dep _ _
5 , _ _ _ _ 2 punct _ _
6 sometimes _ _ _ _ 7 advmod _ _
7 naive _ _ _ _ 2 dep _ _
8 . _ _ _ _ 2 punct _ _
All the optional arguments of the subcommands are as follows:
$ python run.py train -h
usage: run.py train [-h] [--buckets BUCKETS] [--punct] [--ftrain FTRAIN]
[--fdev FDEV] [--ftest FTEST] [--fembed FEMBED]
[--unk UNK] [--conf CONF] [--file FILE] [--preprocess]
[--device DEVICE] [--seed SEED] [--threads THREADS]
[--tree] [--feat {tag,char,bert}]
optional arguments:
-h, --help show this help message and exit
--buckets BUCKETS max num of buckets to use
--punct whether to include punctuation
--ftrain FTRAIN path to train file
--fdev FDEV path to dev file
--ftest FTEST path to test file
--fembed FEMBED path to pretrained embeddings
--unk UNK unk token in pretrained embeddings
--conf CONF, -c CONF path to config file
--file FILE, -f FILE path to saved files
--preprocess, -p whether to preprocess the data first
--device DEVICE, -d DEVICE
ID of GPU to use
--seed SEED, -s SEED seed for generating random numbers
--threads THREADS, -t THREADS
max num of threads
--tree whether to ensure well-formedness
--feat {tag,char,bert}
choices of additional features
$ python run.py evaluate -h
usage: run.py evaluate [-h] [--batch-size BATCH_SIZE] [--buckets BUCKETS]
[--punct] [--fdata FDATA] [--conf CONF] [--file FILE]
[--preprocess] [--device DEVICE] [--seed SEED]
[--threads THREADS] [--tree] [--feat {tag,char,bert}]
optional arguments:
-h, --help show this help message and exit
--batch-size BATCH_SIZE
batch size
--buckets BUCKETS max num of buckets to use
--punct whether to include punctuation
--fdata FDATA path to dataset
--conf CONF, -c CONF path to config file
--file FILE, -f FILE path to saved files
--preprocess, -p whether to preprocess the data first
--device DEVICE, -d DEVICE
ID of GPU to use
--seed SEED, -s SEED seed for generating random numbers
--threads THREADS, -t THREADS
max num of threads
--tree whether to ensure well-formedness
--feat {tag,char,bert}
choices of additional features
$ python run.py predict -h
usage: run.py predict [-h] [--batch-size BATCH_SIZE] [--fdata FDATA]
[--fpred FPRED] [--conf CONF] [--file FILE]
[--preprocess] [--device DEVICE] [--seed SEED]
[--threads THREADS] [--tree] [--feat {tag,char,bert}]
optional arguments:
-h, --help show this help message and exit
--batch-size BATCH_SIZE
batch size
--fdata FDATA path to dataset
--fpred FPRED path to predicted result
--conf CONF, -c CONF path to config file
--file FILE, -f FILE path to saved files
--preprocess, -p whether to preprocess the data first
--device DEVICE, -d DEVICE
ID of GPU to use
--seed SEED, -s SEED seed for generating random numbers
--threads THREADS, -t THREADS
max num of threads
--tree whether to ensure well-formedness
--feat {tag,char,bert}
choices of additional features