/Scarce-Resource-Language

Work on Scarce Resource Languages

Primary LanguageJupyter Notebook

Scarce-Resource-Language

Sentiment analysis is an important area of NLP with comprehensive and growing literature. The majority of published literature concentrated towards the English language, however, very less amount of work done for other scarce resource languages (SRL). SRL are languages for which availability of NLP tools (such as POS tagger, lemmatizer, and annotated corpora) are limited and under the developing phase.

Nepali is SRL. Nepali is an Indo-Aryan language of the sub-part of Eastern Pahari, spoken by approximately 45 million people around the globe (Yadava et al., 2008). It is an official language of Nepal. It is a free word order language as compared to English, which increases the complexity in handling user-generated content. The scarcity of linguistic resources for the Nepali language creates more challenges ranges from creation, collection to generation of lexical resources and datasets. Lack of NLP resources increases the complexities of existing classification methods for Nepali text.

In this work, we have proposed Lexicon-based approach for the sentiment analysis of tweets written in Nepali Language. We have also investigated most popular Conventional Machine learning (ML) models for sentiment analysis such as Multinomial Naïve Bayes (NB), Decision Tree, Support Vector Machine (SVM) and Logistic Regression . We have considered all these ML classifiers as baselines for a lexicon-based approach.

Furthermore, we have implemented Deep learning Models: Convolution Neural Network (CNN), Long Short Term Memory(LSTM) and Hybrid model CNN-LSTM

It is observed that Lexicon-based approach has outperformed than Conventional Machine Learning models and Deep learning models have performed better than Conventional Machine Learning models and Lexicon based approach.

Cite

Rajesh Piryani, Bhawna Piryani, Vivek Kumar Singh, David Pinto, “Sentiment Analysis in Nepali: Exploring Machine Learning and Lexicon-based Approaches”. Published in Journal of Intelligent and Fuzzy Systems, IOS Press. (Impact Factor: 1.637)