WladimirSidorenko/PotTS

Training for sentiment analysis

gghidiu opened this issue · 3 comments

Hi there,

Please excuse my naive question. Is it possible to use this corpus to train an algorithm for sentiment analysis. Can I somehow extract a list containing the 7992 tweets that were annotated with a label representing their sentiment polarity.

I am not familiar with Java and the MMAX2 tool fails whenever I try to load any .mmax files.

Thank you in advance?

Hi,

Are you interested in message-level polarity of tweets or do you want to have text spans of targeted sentiments and know their polarity?

I rather need the message-level polarity of the tweets.

I see. Unfortunately, this corpus does not provide immediate labels of message-level polarities, but for my experiments I've induced ones based on the presence of annotated targeted sentiments and polar terms (emotional expressions). In particular, I've assigned the positive class to the messages that had positive opinions w.r.t some target or positive polar terms, and ascribed the negative polarity to the tweets with negative sentiments and polar terms. You can find these approximated annotations in TSV format here:
https://github.com/WladimirSidorenko/CGSA/tree/master/data/PotTS
(the preprocessed folder contains normalized tweets; the not-preprocessed directory has the not normalized ones)
Alternatively, you can also look at the SB10k and GermEval-2017 corpora, which have been explicitly labeled with message-level polarities and which I have converted to the same format. You can find them here:
https://github.com/WladimirSidorenko/CGSA/tree/master/data/SB10k
and here:
https://github.com/WladimirSidorenko/CGSA/tree/master/data/GermEval-2017

My experiments with PotTS and SB10k are described in Chapter V of this thesis:
https://github.com/WladimirSidorenko/Dissertation/blob/master/release/0.1.0/sidarenka_thesis.pdf

Good luck.