SemEval-2017 Task 4 is a text sentiment classification task: Given a message, classify whether the message is of positive, negative, or neutral sentiment.
# install the environment
conda create -n allennlp python=3.6
source activate allennlp
pip install -r requirements.txt
# run the program
chmod +x run.sh && ./run.sh
In the table, the first two models are the SOTA models for this task (ensemble and single)(as far as I know). I use AllenNLP to implement the other models below.
I found that GloVe is pretty for single embedding compared to the other embeddings. The model GloVe + Bi-LSTM + attention
is similar to the model Deep Bi-LSTM+attention
, and the results are similar. The model GloVe + ELMo + BCN
achieves 0.682 F1 score with single model which is also compatible to the model LSTMs+CNNs ensemble with multiple conv. ops
.
Models | F1-score |
---|---|
LSTMs+CNNs ensemble with multiple conv. ops (Ensemble) | 0.685 |
Deep Bi-LSTM+attention (Single model) | 0.677 |
GloVe(No Grad) + 2 BiLSTM | 0.629 |
GloVe(With Grad) + 2 BiLSTM | 0.642 |
ELMo(No Grad) + 2 BiLSTM | 0.614 |
ELMo(With Grad) + 2 BiLSTM | 0.609 |
BERT(No Grad) + 2 BiLSTM | 0.599 |
BERT(With Grad) + 2 BiLSTM | 0.480 |
GloVe + Bi-LSTM + attention | 0.665 |
GloVe + ELMo + BCN | 0.682 |
- Rosenthal S, Farra N, Nakov P. SemEval-2017 task 4: Sentiment analysis in Twitter[C]//Proceedings of the 11th international workshop on semantic evaluation (SemEval-2017). 2017: 502-518.
- Baziotis C, Pelekis N, Doulkeridis C. Datastories at semeval-2017 task 4: Deep lstm with attention for message-level and topic-based sentiment analysis[C]//Proceedings of the 11th international workshop on semantic evaluation (SemEval-2017). 2017: 747-754.
- Cliche M. BB_twtr at SemEval-2017 task 4: twitter sentiment analysis with CNNs and LSTMs[J]. arXiv preprint arXiv:1704.06125, 2017.