Minor-Project---Sentiment-Analysis

A web based application which will let user to know the public's reaction and sentiments over a trending topic by analyzing the tweets and posts done by the public on their twitter or LinkedIn account.

The methodology that we have followed in this project is

  1. Data Extraction using different social media crawlers: With the help of various pre-made social media crawlers, we will extract data based on different keywords about the trending topics.

  2. Pre-processing of the extracted Data: after the extraction, we will do the preprocessing of the data. Some steps which we will follow during the preprocessing are i.) Lower casing: Converting the words into the lower case as high case letters take more dimensions. ii.) Stop words removal: a, an, and the are some stop words which are used widely in any document. They do not contribute to distinguishing between two documents so it is advised to remove them.

  3. Stemming: It is the process by which a word is converted into its root form. To build a robust model, it is essential to normalize text by removing repetition and transforming words to their base form through stemming.

  4. Lemmatization and Tokenization of the extracted Text Data: Lemmatization is the same as stemming but more accurate because it involves deriving the meaning of a word from something like a dictionary. Tokenization is one of the most important preprocessing steps in NLP. It involves breaking a stream of textual data into words, terms, sentences, symbols or some other meaningful elements called tokens. Tokenization immediately turns an unstructured string into a numerical data structure suitable for machine learning.

  5. Clustering and classification of the unstructured Data such as images and emojis: After following all the preprocessing steps, we will then apply various clustering and classification methods to determine the internal structure of data and also to remove the inefficiency of the model.

  6. Analysis of the processed data using different algorithms

  7. Visual representation of the analysis using different plots, graphs and charts.