tweet sentiment analysis in french

Question

tweet sentiment analysis in french

Opened this issue 5 years ago · 7 comments

Deleted user commented 5 years ago

Hi,

Hope you are all well !

Is it possible with Flaubert to do some tweet sentiment analysis written in french ? If so, how can we do that ?

Vive la France ! :-)

Cheers,
X

Answer 1 · 2020-03-28T14:52:01.000Z

Hello,

Yes it is possible, for instance in FLUE (French Language Understanding Evaluation), we use the The Cross Lingual (PrettenhoferandStein,2010) dataset. We obtain the best results on the French part with Flaubert-large.

See the article here https://arxiv.org/pdf/1912.05372.pdf (sections 4.1 and 5.1)

Hope, it helps.

Answer 2 · 2020-03-28T14:53:41.000Z

Of course, a similar method can be used with other kind of sentiment analysis.

Answer 3 · 2020-03-28T16:31:52.000Z

Thanks for your reply, is there any repository available to test ?

Answer 4 · 2020-03-28T16:35:06.000Z

The FLUE part of this GitHub should help you.

https://github.com/getalp/Flaubert/tree/master/flue

Please give us feedback about your work.

Answer 5 · 2020-03-28T16:48:21.000Z

Re,

Je vais etre honnete, je ne sais pas vraiment comment faire car je ne suis pas très spécialisé en NLP.

Je peux faire le scraping avec https://github.com/twintproject/twint et l'API.

Je vous ai ajouté sur twitter mais je ne peux pas vous envoyer de message direct pour ne pas polluer l'issue avec mes questions, mon twitter est https://twitter.com/lucmichalski

En tout cas merci pour vos précisions.

Cheers,
Luc

Answer 6 · 2020-03-28T16:52:42.000Z

d'accord, si vous voulez mieux comprendre, je vous conseille alors de commencer par un tuto de huggingface, tout est expliqué ici. https://github.com/huggingface/transformers Flaubert est intégré à cette librairie donc ce qui est expliqué pour l'anglais, ne sera pas trop compliqué à adapter au français. Je vous ajoute sur Twitter

Answer 7 · 2020-03-28T21:35:25.000Z

Hi @lucmichalski,

If you want to fine-tune FlauBERT for a sentiment analysis task, you can base on the following section of the FLUE. There is only a few things to modify if you fine-tune on another task:

Data processor: use existing data processors provided by HuggingFace's Transformer if your task has the same number of labels, or add new data processors depending on your needs.
Prepare data: the input data to DataProcessor class should be in .tsv format, so you should prepare your text in that format accordingly. You may want to check the _create_examples function within the corresponding class to get the content in each column of the .tsv file correctly.

After getting the data in the right format, you can use the script run_glue.py for fine-tuning. The run_glue.py script available in the HuggingFace's Transformer currently does not include the testing and saving the best validation model after fine-tuning, so we modified it a little bit to output the test result and save the best validation model (and the last checkpoint if you want to resume training later as well). You can check it out here.