Supersense Sequence Labelling

About

There are a lot of models that specialise in NER (Named Entity Recognition) task, generally optimizing their results for the CONLL-2003 shared task on NER. If we consider SuperSeq (Supersense Sequence) Labelling as an extension of NER, the models which achieve 90%+ F1 score on mentioned data fail to perform on a comparative scale. The project attempts to evaluate the NER SOTA models on the SuperSeq Labelling task, and investigate on what features need to be captured in addition, so as to extend NER for the problem.

The Project is done under the guidance of Prof. Oier at UPV-EHU, in May-June 2019.

Data for Training and Evaluation

TODO: Edit the data here, with links

Models Investigated

Perceptron Model - This model is considered the Baseline as well as SOTA for SuperSeq Labelling Task. As mentioned in the paper [1], the authors used a set of hand-refined features to create the perceptron model tagger for the data. On running the aforementioned tagger, we get an F1 score of 69.46 on the test data.
Model 1 (word level bi-LSTM + CRF) - The model as suggested in the paper [2] by Huang and Yu. The implementation of the model was borrowed from a github repository¹, and then tuned upon the data.
Model 2 (word level bi-LSTM + character level bi-LSTM + CRF) - The model as suggested in the paper [3] by Lample et al. The implementation of the model was borrowed from a github repository¹, and then tuned upon the data.
Model 3 (convNet on characters + word level bi-LSTM + CRF) - The model as suggested in the paper [4] by Ma and Hovy. The implementation of the model was borrowed from a github repository¹, and then tuned upon the data.

Statistics

Table 1: Training using train+dev data, tuning and results on test data

Model Name	Embeddings used	Data Format	F1 Score
Perceptron	-	IOB	69.46
Model 1	glove	IOBES	67.68
Model 2	glove	IOBES	67.48
Model 3	glove	IOBES	66.73

Additional Models Evaluation

TODO: Add links

The following Embeddings were used to tune the models, with the associated hyper-parameters as listed in the next section:

ElMO Embeddings
Flair Embeddings
BERT Embeddings
Stack1 - ElMO + Flair
Stack2 - ElMO + BERT
Stack3 - Flair + BERT
Stack4 - ElMO + Flair + BERT

For each of the embeddings mentioned above, a model with embedding + character-based embeddings were also tried.

Hyperparameters Tuned

Table 2: Hyperparameters tuned per embedding sequences

Parameter	Search Type	Limits
Hidden Layers	Random Integers	0 - 400
RNN Layers	Choice	1, 2
Dropout	Uniform	0 - 0.5
Learning Rate	Choice	0.05, 0.10, 0.15, 0.20
Mini Batch Size	Choice	16, 32
use_CRF	Choice	True, False

Statistics

Table3: Training using train data, tuning on dev data, and results for test data

Embedding Type	F1 Score	Tuned HyperParameters
ElMO	?	?
Flair	?	?
BERT	?	?
Stack1	?	?
Stack2	?	?
Stack3	?	?
Stack4	?	?

Table4: Training using train data, tuning and results for test data

Embedding Type	F1 Score	Tuned HyperParameters
ElMO	?	?
Flair	?	?
BERT	?	?
Stack1	?	?
Stack2	?	?
Stack3	?	?
Stack4	?	?

References

[1]: Ciaramita, M., & Altun, Y. (2010). Broad-coverage sense disambiguation and information extraction with a supersense sequence tagger, 594. https://doi.org/10.3115/1610075.1610158
[2]: Huang, Z., Xu, W., & Yu, K. (2015). Bidirectional LSTM-CRF Models for Sequence Tagging. Retrieved from http://arxiv.org/abs/1508.01991
[3]: Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., & Dyer, C. (2016). Neural Architectures for Named Entity Recognition. Retrieved from http://arxiv.org/abs/1603.01360
[4]: Ma, X., & Hovy, E. (2016). End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF. Retrieved from http://arxiv.org/abs/1603.01354

Footnotes

Github Repository by Guillaume Genthial

Akshayanti/supersense-sequence-labelling

Supersense Sequence Labelling

About

Data for Training and Evaluation

Models Investigated

Statistics

Additional Models Evaluation

Hyperparameters Tuned

Statistics

References

Footnotes