/RestMex23_NLP

Primary LanguageJupyter Notebook

REST-MEX 2023 NLP Competition

Data and files used in the REST-MEX23 NLP competition, both in the Clustering and Sentiment Analysis tasks. https://sites.google.com/cimat.mx/rest-mex2023

Papers

https://ceur-ws.org/Vol-3496/restmex-paper2.pdf

https://ceur-ws.org/Vol-3496/restmex-paper12.pdf

Sentiment analysis task

Sentiment analysis task in tourist texts has gained relevance in the last decade; however, the most significant scientific communication efforts have focused on the English language. Although some studies have focused on Spanish, few address Spanish who is not from Spain. These approaches are typically applied to collections taken from social networks such as tweets, so tourist texts have not been directly addressed. For this task, 250,000 reviews are provided.

The submission for this task placed 2nd in the ranking.

Clustering task

It is the first time within the framework of Rest-Mex that a Thematic Unsupervised Classification has been proposed. For this task, 100,000 news items were obtained on 4 different topics related to tourism. The idea is that given all the collected texts, 4 groups are generated. The system that obtains the classification most similar to the ideal classification (Gold Standard) will obtain the highest result. All data was obtained from google news. News spread over the last two years regarding the 4 tourism themes (for reasons of competition, these themes will not be revealed) were carefully downloaded and tagged.

The submission for this task placed 1st in the ranking.