Model Building Notes:
Tools Used: Python programming using Anaconda IDE
Python Packages: Os, Pandas, Sklearn, Numpy, Matlplotlib, Seaborn, NLTK, Statsmodels.api, Pylab, SpaCy, Gensim, PyLDAvis
Key assumption is to reflect the overall sentiment using Information extraction on reviews dictionaries. We are not attempting at single product aspect entity relations (Single product feature extraction, given the time constraint and the requirement to build a large pre-trained data set)
Models not attempted as part of solutions, considering them not required as part of our problem case: Sentiment Classifier, Ratings based analysis - Classifying products based on ratings, calculating mean rating scores of products
Data Modelling Approach -Assumptions and comments mentioned in Ipython NoteBook Data-set source link: https://www.kaggle.com/datafiniti/consumer-reviews-of-amazon-products
Problem Statement: Understanding insights from Customer Reviews - identify key product terms in the sentiments (Topic Modelling) - do sentiment analysis (classify sentiments - for future use cases) - understand product wise sentiments (Data Modeller Understanding of the problem)
Different types of NLP analysis which can be employed for our use: Given that we are dealing with Reviews, (Textual Data - Applications of NLP Techniques), following sub analysis -
- Entity Recognition (Named Entity Recognition), 2. Identifying Emerging Trends 3.Sentiment Analysis 4. Text Summarization 5. Topic Modelling and solutions deduced can be applied as a universal attempt for exploring the data.
We will restrict ourselves to Sentiment Analysis & Topic Modelling and explore advanced enhancements at later when we deploy real time sentiment monitors - probably by streaming data analysis using Databricks/ Apache Spark
Model - 1 :
Problem statement Scope and Topic Modelling: By the nature of unstructured Textual data, we want to derive insights about sentiments of consumers in terms of product features - authenticity of the product, quality of the product and from consumer standpoint, our support engine should be able to redflag fake reviews. Topic Modelling is a process to automatically identify topics present in a text object and to derive hidden patterns exhibited by a text corpus, thus assisting in better decision making.
Model -2: Aspect Based Sentiment Analysis also known as Feature Based Sentiment Analysis is a technique to find out various features, attributes, or aspects from a given text and their respective sentiments.
Refer to Model-1 for Topic Modelling and LDA Analysis Refer to Model-2 for Aspect Based Sentiment Analysis
Please download Modelling Notes Document for Problem Approach - output previews and analysis
Additional Model - Sentiment Classifier:
Sentiment analysis is the process of detecting positive or negative sentiment in text. We will build classification algorithms to detect the text
We built Logistic regression and Random forests for sentiment classification as positive or negative reviews
Additional Coding references - websites index: Github Stackexchange Paperswithcode Medium and towardsdatascience Analytics Vidhya