Let's turn multilingual!

Question

Let's turn multilingual!

diegopaucarv opened this issue 2 years ago · 1 comments

Okay, so, I have decided that I want to use this amazing IMPRESSIVE work of art for predicting sentiments in spanish. I noticed that the SentimentAnalysis tool runs on Spacy and uses a English news training dataset. I just can't find the folder were spacy or the dataset are loaded. Is changing these parameters enough? Any further ideas?

EDIT: I forked the code, edited dataset.py and made a simple way of requiring a different spacy model from the coder's input. I changed the order in FXBaseModel to require RoBERTa before the XLNET model. I plan to make new training datasets (but i just don't know how yet).

Answer 1 · 2022-10-28T11:11:29.000Z

I too want to see if I can adapt it, to Norwegian. Here are my thoughts:
SpaCy is used with neuralcoref. I once tried to adapt that framework to Norwegian but gave up. Maybe you can find an OK coreference resolver for Spanish, outside of neuralcoref. Or maybe you will do better than me on adapting that to another language.
For the dataset, I saw that the recent Semeval task on SSA, which includes TSC, or TSA- Targeted Sentiment Analysis as I have been calling it, have also Spanish data.
(OpeNER) (Agerri et al., 2013).
If you have annotated TSC-data with individual in-sentence sentiment targets, and put a layer of coreference resolution on top, then you are getting closer to a solution for the sentiment conveyed by a document towards an entity. Two problems with previous TSC annotations mentioned in the NewsMTSC are indirect sentiment and choice of words. I think indirect sentiment would be annotated OK, while choice of words is probably mostly overlooked. These are just thoughts, based on working with TSC annotations in general.
I recently studied the difference between sentence-based TSA and document-level sentiment towards entities for Norwegian professional reviews.