/movie-review-sentiment-analysis

Analysing and classifying movie reviews from IMDB as positive or negative.

Primary LanguagePython

movie_review_sentiment_analysis

classifier.py: naive bayes algorithm, file read and write, data processing

data_frame.csv: 5490 most frequent words and # ocurrences data_frame_.csv: 10000 most frequent words and # ocurrences data_frame__.csv: 12000 most frequent words and # ocurrences

removed_words.csv: less frequent words from #1 data_frame (22352) removed_words_.csv: less frequent words from #2 data_frame (17842) removed_words__.csv: less frequent words from #3 data_frame (15842)

test_set.csv: 600 reviews to be classified

test_set_classified.csv: 600 reviews to be classified (using dataframe size: 5490) test_set_classified_.csv: 600 reviews to be classified (using dataframe size: 10000) test_set_classified__.csv: 600 reviews to be classified (using dataframe size: 12000)

training_set.csv: 1400 reviwes that were used to create the model