/SentimentAnalyserLVTwitter

Scripts for training and predicting sentiments of Latvian tweets. "Pretraining and Fine-Tuning Strategies for Sentiment Analysis of Latvian Tweets 2020"

Primary LanguagePythonMIT LicenseMIT

Latvian Twitter Sentiment

Repository for the paper Pretraining and Fine-Tuning Strategies for Sentiment Analysis of Latvian Tweets. Performs 3 class(Positive, Negative and Neutral) classification on Latvian tweets. The model is trained in tweets from the domain of politics.

Data

  • Latvian Tweet Corpus. Since the twitter data cannot be shared directly due to twitter terms, kindly refer https://github.com/pmarcis/latvian-tweet-corpus for the data.
  • Format - csv with label (0 - neutral; 1 - positive; 2 - negative) and text:
label,text
1,"@maljorka Hehe, man tad labāk garšo bez nekā, nevis šādi. :D Ai, gaumes ir tik atšķirīgas."
0,@IngaStirna Ābolu šarlote.
2,"Šodien bijām pie vecmāmiņas (malka + jāaizved lietas). Es gaidīju, ka paliks labāk, un man viņas pietrūks mazāk, bet mēs ar viņu varējām sarunāties tikai caur logu un pusdienu vietā apēdām bulciņas mašīnā. Nav ok."

How to run

Performance Metrics

Accuracy score of around 76% on time-balanced dataset.

Publication

Gaurish Thakkar and Mārcis Pinnis. (2020). Pretraining and Fine-Tuning Strategies for Sentiment Analysis of Latvian Tweets. In Human Language Technologies – The Baltic Perspective - Proceedings of the Ninth International Conference Baltic HLT 2020. 55-61. IOS Press.

BibTeX

@inproceedings{thakkar2020sentiment,
  address = {Kaunas, Lithuania},
  author = {Thakkar, Gaurish and Pinnis, M\=arcis},
  booktitle = {Human Language Technologies – The Baltic Perspective - Proceedings of the Ninth International Conference Baltic HLT 2020},
  doi = {10.3233/FAIA200602},
  pages = {55--61},
  publisher = {IOS Press},
  title = {{Pretraining and Fine-Tuning Strategies for Sentiment Analysis of Latvian Tweets}},
  year = {2020}
}

Acknowledgement

The project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement no. 812997.

This work was done as a part of internship at TILDE.

License

This work is MIT licensed. See the LICENSE file for full details.