/NLP-Embedding-models-on-product-review

This project applied traditional models and embedding models on production ratings prodiction problem. Customer's reviews will be classified according to their ratings(1-5).

Primary LanguagePython

NLP-Embedding-models-on-product-review

This project applied traditional models and embedding models on production ratings prodiction problem. Customer's reviews will be classified according to their ratings(1-5).

Data

The dataset used in this experiment is Amazon reviews data. This dataset comes from kaggle, which contains 3,999,913 Amazon reviews from 6,643,669 users on 2,441,053 products, from the Stanford Network Analysis Project (SNAP).

Origin

The Amazon reviews dataset consists of reviews from amazon. The data span a period of 18 years, including ~35 million reviews up to March 2013. Reviews include product and user information, ratings, and a plaintext review. For more information, please refer to the following paper: J. McAuley and J. Leskovec. Hidden factors and hidden topics: understanding rating dimensions with review text. RecSys, 2013.

Description

The Amazon reviews polarity dataset is constructed by taking review score 1 and 2 as negative, and 4 and 5 as positive. Samples of score 3 is ignored. In the dataset, class 1 is the negative and class 2 is the positive. Each class has about 2,000,000 samples used train.csv and test.csv from kaggle. You need to merge this two datasets into one dataset amazon_reviews.csv and this project will automatically split them into train, dev, test dataset.

Example

image