/LUPJE

LUPJE: A sentiment analysis lexicon based on Dutch Twitter messages, much like AFINN-165

MIT LicenseMIT

LUPJE

LUPJE: A sentiment analysis lexicon based on Dutch Twitter messages, much like AFINN-165 [1].

Abstract:

While there are many resources for sentiment analysis of English messages, there is relatively little material for the Dutch language. In this paper a new word list resource for sentiment analysis of messages in Dutch is proposed. This sentiment lexicon is specifically designed to handle microblog messages such as Twitter, and scores words based on emotional valence. This resource was generated by collecting words from a corpus of 5648 Dutch tweets, which were then annotated manually. The resulting list contains 2192 words scored on a Likert scale of between 1 and 5 for positive words, and -1 to -5 for negative words, and includes many words not found in previous lists. The word list was evaluated on a small collection of Dutch tweets and is capable of detecting positive or negative sentiment. The sentiment lexicon has particular applications for the analysis of sentiment in Dutch language in microblog messages on social media.

Originally this was constructed as a term project for the course 'Text and Multimedia Mining' at the Radboud university in 2017-2018. I had planned to package this as a python and/or npm libary (particularly because there was lack of NLP tools in JavaScript at the time), but never got around to it. Construction of the lexicon took a lot of time as words were humanly annotated using the same process Finn Årup Nielsen used.

However, over the past years I've come back to it several times whenever I needed to do some quick sentiment analysis on Dutch. As the lexicon was constructed on contemporary microblogging messages (Twitter) it is extremely useful for sentiment tasks in customer support and short messages.

I am releasing the list and the term paper under MIT license so others can maybe use this. It can be easily imported in existing libraries that use the same method.

For the original AFINN paper see:

[1] "A new ANEW: evaluation of a word list for sentiment analysis in microblogs", Proceedings of the ESWC2011 Workshop on 'Making Sense of Microposts': Big things come in small packages. Volume 718 in CEUR Workshop Proceedings: 93-98. 2011 May. Matthew Rowe, Milan Stankovic, Aba-Sah Dadzie, Mariann Hardey (editors)