There are a lot of models that specialise in NER (Named Entity Recognition) task, generally optimizing their results for the CONLL-2003 shared task on NER. If we consider SuperSeq (Supersense Sequence) Labelling as an extension of NER, the models which achieve 90%+ F1 score on mentioned data fail to perform on a comparative scale. The project attempts to evaluate the NER SOTA models on the SuperSeq Labelling task, and investigate on what features need to be captured in addition, so as to extend NER for the problem.
The Project is done under the guidance of Prof. Oier at UPV-EHU, in May-June 2019.
TODO: Edit the data here, with links
-
Perceptron Model - This model is considered the Baseline as well as SOTA for SuperSeq Labelling Task. As mentioned in the paper [1], the authors used a set of hand-refined features to create the perceptron model tagger for the data. On running the aforementioned tagger, we get an F1 score of 69.46 on the test data.
-
Model 1 (word level bi-LSTM + CRF) - The model as suggested in the paper [2] by Huang and Yu. The implementation of the model was borrowed from a github repository1, and then tuned upon the data.
-
Model 2 (word level bi-LSTM + character level bi-LSTM + CRF) - The model as suggested in the paper [3] by Lample et al. The implementation of the model was borrowed from a github repository1, and then tuned upon the data.
-
Model 3 (convNet on characters + word level bi-LSTM + CRF) - The model as suggested in the paper [4] by Ma and Hovy. The implementation of the model was borrowed from a github repository1, and then tuned upon the data.
Table 1: Training using train+dev data, tuning and results on test data
Model Name | Embeddings used | Data Format | F1 Score |
---|---|---|---|
Perceptron | - | IOB | 69.46 |
Model 1 | glove | IOBES | 67.68 |
Model 2 | glove | IOBES | 67.48 |
Model 3 | glove | IOBES | 66.73 |
TODO: Add links
The following Embeddings were used to tune the models, with the associated hyper-parameters as listed in the next section:
- ElMO Embeddings
- Flair Embeddings
- BERT Embeddings
- Stack1 - ElMO + Flair
- Stack2 - ElMO + BERT
- Stack3 - Flair + BERT
- Stack4 - ElMO + Flair + BERT
For each of the embeddings mentioned above, a model with embedding + character-based embeddings were also tried.
Table 2: Hyperparameters tuned per embedding sequences
Parameter | Search Type | Limits |
---|---|---|
Hidden Layers | Random Integers | 0 - 400 |
RNN Layers | Choice | 1, 2 |
Dropout | Uniform | 0 - 0.5 |
Learning Rate | Choice | 0.05, 0.10, 0.15, 0.20 |
Mini Batch Size | Choice | 16, 32 |
use_CRF | Choice | True, False |
Table3: Training using train data, tuning on dev data, and results for test data
Embedding Type | F1 Score | Tuned HyperParameters |
---|---|---|
ElMO | ? | ? |
Flair | ? | ? |
BERT | ? | ? |
Stack1 | ? | ? |
Stack2 | ? | ? |
Stack3 | ? | ? |
Stack4 | ? | ? |
Table4: Training using train data, tuning and results for test data
Embedding Type | F1 Score | Tuned HyperParameters |
---|---|---|
ElMO | ? | ? |
Flair | ? | ? |
BERT | ? | ? |
Stack1 | ? | ? |
Stack2 | ? | ? |
Stack3 | ? | ? |
Stack4 | ? | ? |
[1]: Ciaramita, M., & Altun, Y. (2010). Broad-coverage sense disambiguation and information extraction with a supersense sequence tagger, 594.
https://doi.org/10.3115/1610075.1610158
[2]: Huang, Z., Xu, W., & Yu, K. (2015). Bidirectional LSTM-CRF Models for Sequence Tagging.
Retrieved from http://arxiv.org/abs/1508.01991
[3]: Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., & Dyer, C. (2016). Neural Architectures for Named Entity Recognition.
Retrieved from http://arxiv.org/abs/1603.01360
[4]: Ma, X., & Hovy, E. (2016). End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF.
Retrieved from http://arxiv.org/abs/1603.01354