This is an implementation of the paper
Transfer Learning for Sequence Tagging with Hierarchical Recurrent Networks
Zhilin Yang, Ruslan Salakhutdinov, William W. Cohen
ICLR 2017
You might also want to refer to
Multi-Task Cross-Lingual Sequence Tagging from Scratch
Zhilin Yang, Ruslan Salakhutdinov, William W. Cohen
Preprint, 2016
Lasagne, Theano. Python 2.7.
Install Lasagne and Theano with the instructions here: https://github.com/Lasagne/Lasagne#installation
Some of the datasets are publicly available, which can be downloaded from our server.
wget http://kimi.ml.cmu.edu/transfer/data.tar.gz
tar -xvzf data.tar.gz
The above command will download the Genia and Twitter datasets, along with the Senna embeddings and an English gazeteer.
Other datasets require a LDC license; please contact your institution to access the below datasets.
Get the CoNLL 2000 chunking dataset using a LDC license, and organize the files with the following structure:
transfer/chunking/train.txt
transfer/chunking/test.txt
Get the PennTreebank 2003 dataset using a LDC license, and organize the files with the following structure:
transfer/pos_tree/dev.txt
transfer/pos_tree/test.txt
transfer/pos_tree/train.txt
Get the CoNLL 2003 Spanish NER dataset using a LDC license, and organize the files with the following structure:
transfer/span/esp.testa
transfer/span/esp.testb
transfer/span/esp.train
Get the CoNLL 2003 English NER dataset using a LDC license, and organize the files with the following structure:
transfer/eng.testa.old
transfer/eng.testb.old
transfer/eng.train
For each dataset, we first concatenate the training set and the dev set (training set always first). And then use the following function (in sample.py
) to sample a list of indices that are used for training.
def create_sample_index(rate, len):
np.random.seed(13)
return np.random.choice(len, int(rate * len))
where rate
is the labeling rate, and len
is the number of instances (training+dev). The function will return an np array of indices; other instances not in the list will be discarded during training.
You can use the above function to reproduce the data splits for comparison of different models.
The transfer learning scripts are in joint.py
and lang.joint.py
, where joint.py
is used for transfer learning within one language, and lang.joint.py
is used to cross-lingual transfer learning.
joint.py
accepts the following input formats:
python2 joint.py --tasks <target_task_name> <source_task_name> --labeling_rates <labeling_rate_for_target_task> <labeling_rate_for_source_task> [--very_top_joint]
where task names come from the list
[genia, pos, ner, chunking, ner_span, twitter_ner, twitter_pos]
and labeling rates are float numbers. The flag very_top_joint
indicates whether to share the parameters of the CRF layer or not.
Below are examples of the transfer learning settings used in our paper (Fig. 2):
# transfer from PTB to Genia
python2 joint.py --tasks genia pos --labeling_rates <labeling_rate> 1.0 --very_top_joint
# transfer from CoNLL 2003 NER to Genia
python2 joint.py --tasks genia ner --labeling_rates <labeling_rate> 1.0
# transfer from Spanish NER to Genia
python2 lang.joint.py --tasks genia ner_span --labeling_rates <labeling_rate> 1.0
# transfer from PTB to Twitter POS tagging
python2 joint.py --tasks twitter_pos pos --labeling_rates <labeling_rate> 1.0
# transfer from CoNLL 2003 to Twitter NER
python2 joint.py --tasks twitter_ner ner --labeling_rates <labeling_rate> 1.0
# transfer from CoNLL 2003 NER to PTB POS tagging
python2 joint.py --tasks pos ner --labeling_rates <labeling_rate> 1.0
# transfer from PTB POS tagging to CoNLL 2000 chunking
python2 joint.py --tasks chunking pos --labeling_rates <labeling_rate> 1.0
# transfer from PTB POS tagging to CoNLL 2003 NER
python2 joint.py --tasks ner pos --labeling_rates <labeling_rate> 1.0
# transfer from CoNLL 2003 English NER to Spanish NER
python2 lang.joint.py --tasks ner_span ner --labeling_rates <labeling_rate> 1.0
# transfer from Spanish NER to CoNLL 2003 English NER
python2 lang.joint.py --tasks ner ner_span --labeling_rates <labeling_rate> 1.0
Target | Source | Labeling Rate | With Transfer | Without Transfer |
---|---|---|---|---|
genia | PTB | 0.0 | 0.840899499608 | N/A |
genia | PTB | 0.001 | 0.916581258415 | 0.832640019292 |
genia | PTB | 0.01 | 0.963083539318 | 0.935592130383 |
genia | PTB | 0.1 | 0.981953738872 | 0.978035007335 |
genia | PTB | 1.0 | 0.990092642833 | 0.990655332489 |
genia | Eng NER | 0.001 | 0.87471269687 | 0.832640019292 |
genia | Eng NER | 0.01 | 0.941942485079 | 0.935592130383 |
genia | Eng NER | 0.1 | 0.979944132956 | 0.978035007335 |
genia | Eng NER | 1.0 | 0.989951970419 | 0.990655332489 |
genia | Span NER | 0.001 | 0.843853620305 | 0.832640019292 |
genia | Span NER | 0.01 | 0.93111070919 | 0.935592130383 |
genia | Span NER | 0.1 | 0.978718273347 | 0.978035007335 |
genia | Span NER | 1.0 | 0.989550049235 | 0.990655332489 |
PTB | Eng NER | 0.001 | 0.87471269687 | 0.841578354698 |
PTB | Eng NER | 0.01 | 0.949326669443 | 0.942871025961 |
PTB | Eng NER | 0.1 | 0.967891464976 | 0.965916979037 |
PTB | Eng NER | 1.0 | 0.974470513829 | 0.975334351428 |
Eng NER | PTB | 0.001 | 0.346473029046 | 0.335092085615 |
Eng NER | PTB | 0.01 | 0.749249658936 | 0.686385971674 |
Eng NER | PTB | 0.1 | 0.870218090812 | 0.86219588832 |
Eng NER | PTB | 1.0 | 0.91264717787 | 0.91208817241 |
Chunking | PTB | 0.001 | 0.622235477654 | 0.58375524895 |
Chunking | PTB | 0.01 | 0.867262565155 | 0.834900974403 |
Chunking | PTB | 0.1 | 0.927242176013 | 0.90649356106 |
Chunking | PTB | 1.0 | 0.953936031606 | 0.945709723506 |
Eng NER | Span NER | 0.001 | 0.346253229974 | 0.335092085615 |
Eng NER | Span NER | 0.01 | 0.726148735929 | 0.686385971674 |
Eng NER | Span NER | 0.1 | 0.865126276196 | 0.86219588832 |
Eng NER | Span NER | 1.0 | 0.912161558395 | 0.91208817241 |
Span NER | Eng NER | 0.001 | 0.164485165794 | 0.115025161754 |
Span NER | Eng NER | 0.01 | 0.604273247066 | 0.598373003917 |
Span NER | Eng NER | 0.1 | 0.765227337718 | 0.745397008055 |
Span NER | Eng NER | 1.0 | 0.848126232742 | 0.846034214619 |
Twitter POS | PTB | 0.001 | 0.020282728949 | 0.00860479409957 |
Twitter POS | PTB | 0.01 | 0.646588813768 | 0.503380454825 |
Twitter POS | PTB | 0.1 | 0.836508912108 | 0.748002458513 |
Twitter POS | PTB | 1.0 | 0.907191149355 | 0.893054701905 |
Twitter NER | Eng NER | 0.001 | 0.0137931034483 | 0.00950118764846 |
Twitter NER | Eng NER | 0.01 | 0.24154589372 | 0.0963855421687 |
Twitter NER | Eng NER | 0.1 | 0.432432432432 | 0.346534653465 |
Twitter NER | Eng NER | 1.0 | 0.6473029045 | 0.63829787234 |