List of Resources for Sentiment Analysis in General and including resources for Arabic Language as well. The list is under continous update.
- the oldest notable work in sentiment classification,
- turney using two words “excellent” and “poor” and point wise mutual PMI information to do unsupervised sentiment classification
- Published a movie reviews dataset that everyone uses until now
- used Machine learning classifiers for 3 k-folds crossvalidation
- features were basic bag of words (unigrams and/or bigrams) word existence, word freq ( no tfidf )
- other additional features : top unigrams, adjectives, position
- compared results to results of features manually selected by two manual annotators
- Accuracy of baseline (manual annotated features ~60-70%)
- Accuracy of ML ~80-83%
- very good categorized literature reviews about common features selected, techniques, domains of use in sentiment analysis
- uses Entropy Weighted Genetic Algorithm to do feature selection among each of the previous techniques
- overview of main issues and obstacles in arabic social media sentiment analysis
- semi-automatic generation of ~4k entries egyptian dialect sentiment lexicon (link available in the paper) using conjunctions
- Evaluation of Generated Lexicon
- Main usefulness : Good regulations for annotating sentiment datasets
- multi-genre annotated corpus of Modern Standard Arabic for SA
- built from different resources including Penn Arabic Treebank, Wikipedia Talk Pages and Web forums
- manually annotated :
- with Guidlines or simple Guidelines
- with Trained Annotators/ Crowdsourcing
- elaborate the importance of guidelines and training of annotators to produce dependable annota-tions
- dataset not publicly available
- Bing Liu : Sentiment Analysis and Opinion Mining Book is a thorough literature review in various issues of sentiment analysis, this could be nice to get the big picture of sentiment analysis and also to get related work in any of the issues of sentiment analysis.
- [NLP - Stanford, Dan Jurafsky & Christopher Manning] (https://www.coursera.org/course/nlp)
- Pang & LEE movie reviews:
- SNAP - Web data: Amazon reviews
- size : 34,686,770 Reviews
- homepage
- Direct links
- Paper
-
LABR: Large Arabic book database from bookreviews
-
Large Arabic Resources for Arabic Sentiment Analysis :
- Datasets of 33K reviews in Movies, Hotels, Restaurants, Products domain
- Generated Lexicon of 2K entries in domains of Books, Movies, Hotels, Restaurants and Products
- Benchmarking of standard machine learning techniques and feature representation methods over the datasets
- Available Code for Experiments as a starter kit for baseline classifiers
- homepage
- Paper : ElSahar, Hady, and Samhaa R. El-Beltagy. "Building Large Arabic Multi-Domain Resources For Sentiment Analysis." Computational Linguistics and Intelligent Text Processing. Springer Berlin Heidelberg, 2015.
-
Unweighted Opinion Mining Lexicon
- An Arabic sentiment Lexicon consisting of 4392 entries mostly of Egyptian dialect (file is .csv, unicode). Compound entries (idioms and expressions are unstemmed). Other entries are prefix stemmed but postfix unstemmed.
- Download
- Paper : El-Beltagy, Samhaa R., and Ahmed Ali. "Open issues in the sentiment analysis of arabic social media: A case study." Innovations in Information Technology (IIT), 2013 9th International Conference on. IEEE, 2013.
-
Arabic Slang Lexicon for Twitter Sentiment Analysis :
- Lexicon of ~400 terms build automatically from Matching tweets to lexico-syntactic patterns
- Paper : ElSahar, Hady, and Samhaa R. El-Beltagy. "A fully automated approach for arabic slang lexicon extraction from microblogs." Computational Linguistics and Intelligent Text Processing. Springer Berlin Heidelberg, 2014. 79-91.
- Download
The Natural Language Processing Dictionary : Glossary contains definitions of wide range of used terms in Natural language processing topics, very useful when reading papers.
- Chris. Manning : deep learning without magic part 1 : main interesting points :
- Representing each word by a feature vectors built from words in context
- Using Deep Neural networks to adapt those feature weights
- use those adapted feature vectors in multiple NLP classification problems
- the old way : AL Maas et al. : Learning Word Vectors for Sentiment Analysis
Feel Free to Send a pull Request with any updates you think it's good to add