Machine-Learning-to-classify-IMBD-critics

Machine Learning project to classify text critics from IMBD, with binary and multi-class classification

For this practical work the data contains 40,000 text documents, rated with a score on a scale from one to ten, where neutral reviews, with scores of 5 and 6, were excluded.
To be able to perform machine learning techniques such as classification and clustering on text data, it is necessary to represent each document by a numerical vector, for this the Bag of Words model and the tf-idf method were used.
Based on the text documents, three main tasks were carried out, respectively:
1. Determine whether the criticism is positive or negative (binary classification process).
2. Predict the score of the critique, in a value between 1-4 and 7-10 (multi-class classification process).
3. Find through common words between different texts, groups of documents that address similar areas or themes (clustering similar areas or themes (clustering process).