/IdentifyTheSentiments

This is a hackathon in analyticsvidhya named as 'IDENTIFY THE STATEMENTS'.

Primary LanguageJupyter Notebook

IdentifyTheSentiments

The problem statement for this sentiment analysis project is described here https://datahack.analyticsvidhya.com/contest/linguipedia-codefest-natural-language-processing-1/. To solve the problem, I implemented various NLP methods, which can be found in the .ipynb file. I extracted several new features, including total number of words, characters and stop words for each tweet. I also removed URLs from those tweets, removed ten most occurring words, ten least occurring words and stop words from each tweet. Later, I converted all the words in their base forms. After these pre processing and features extraction steps, I converted the tweets in to tfidf features with the constraint of max number of features set to 10000. I concatenated previous extracted features with these tfidf features and fed them to a multinomial Naive Bayes algorithm to predict the output for the test cases, which gave an accuracy score of .8857 for this binary classification problem. Finally, I created a .csv file with the predicted 0 or 1 result from the model. Here, 0 is considered as positive and 1 as negative review.