tweet sentiment analysis in french
Opened this issue · 7 comments
Hi,
Hope you are all well !
Is it possible with Flaubert to do some tweet sentiment analysis written in french ? If so, how can we do that ?
Vive la France ! :-)
Cheers,
X
Hello,
Yes it is possible, for instance in FLUE (French Language Understanding Evaluation), we use the The Cross Lingual (PrettenhoferandStein,2010) dataset. We obtain the best results on the French part with Flaubert-large.
See the article here https://arxiv.org/pdf/1912.05372.pdf (sections 4.1 and 5.1)
Hope, it helps.
Of course, a similar method can be used with other kind of sentiment analysis.
Thanks for your reply, is there any repository available to test ?
The FLUE part of this GitHub should help you.
https://github.com/getalp/Flaubert/tree/master/flue
Please give us feedback about your work.
Re,
Je vais etre honnete, je ne sais pas vraiment comment faire car je ne suis pas très spécialisé en NLP.
Je peux faire le scraping avec https://github.com/twintproject/twint et l'API.
Je vous ai ajouté sur twitter mais je ne peux pas vous envoyer de message direct pour ne pas polluer l'issue avec mes questions, mon twitter est https://twitter.com/lucmichalski
En tout cas merci pour vos précisions.
Cheers,
Luc
d'accord, si vous voulez mieux comprendre, je vous conseille alors de commencer par un tuto de huggingface, tout est expliqué ici. https://github.com/huggingface/transformers Flaubert est intégré à cette librairie donc ce qui est expliqué pour l'anglais, ne sera pas trop compliqué à adapter au français. Je vous ajoute sur Twitter
Hi @lucmichalski,
If you want to fine-tune FlauBERT for a sentiment analysis task, you can base on the following section of the FLUE. There is only a few things to modify if you fine-tune on another task:
-
Data processor: use existing data processors provided by HuggingFace's Transformer if your task has the same number of labels, or add new data processors depending on your needs.
-
Prepare data: the input data to
DataProcessor
class should be in.tsv
format, so you should prepare your text in that format accordingly. You may want to check the_create_examples
function within the corresponding class to get the content in each column of the.tsv
file correctly.
After getting the data in the right format, you can use the script run_glue.py
for fine-tuning. The run_glue.py
script available in the HuggingFace's Transformer currently does not include the testing and saving the best validation model after fine-tuning, so we modified it a little bit to output the test result and save the best validation model (and the last checkpoint if you want to resume training later as well). You can check it out here.