- Besombes Romain romain.besombes@hec.edu
- El Idrissi Mokdad Badr badr.el-idrissi-mokdad@hec.edu
- Lazraq Abderrahmane abderrahmane.lazraq@hec.edu
- Majjad Ismaïl ismail.majjad@hec.edu
In this project we apply NLP pipelines to the Arabic Language and perform sentiment analysis and topic extraction on Arabic Tweets. Performing NLP on the Arabic language has its own set of difficulties due to the nature of the language like detecting stopwords and reconciling the multiple dialects.
The ultimate goal is to be able to gain insights from political Tweets in Arabic countries and compare those insights to financial or economical indicators (market, currency) to see if we could have partially predicted a historical crisis like the Arab Spring.
- Translation of tweets in https://www.kaggle.com/kazanova/sentiment140 to arabic using Google Translate API.
- ArSentD-LEV: A Multi-Topic Corpus for Target-based Sentiment Analysis in Arabic Levantine Tweets https://arxiv.org/abs/1906.01830
- Sentiment Analysis in_Arabic tweets https://www.researchgate.net/publication/271550479_Sentiment_Analysis_in_Arabic_tweets
https://github.com/aub-mind/arabert
- Exploring available arabic tweet datasets and joining them into a single big dataset with sentiment as label
- Data preprocessing
- Word Embedding
- Topic extraction
- Sentiment Analysis
- Insights into people's emotions and viewpoints on a variety of products or political decisions