/Machine-Learning-to-classify-IMBD-critics

Machine Learning project to classify text critics from IMBD, with binary and multi-class classification

Primary LanguageJupyter Notebook

Machine-Learning-to-classify-IMBD-critics

Machine Learning project to classify text critics from IMBD, with binary and multi-class classification

This practical work is based on solving a set of tasks, associated with a database with texts from IMBD movie reviews. The sklearn library was used.

  • For this practical work the data contains 40,000 text documents, rated with a score on a scale from one to ten, where neutral reviews, with scores of 5 and 6, were excluded.
  • To be able to perform machine learning techniques such as classification and clustering on text data, it is necessary to represent each document by a numerical vector, for this the Bag of Words model and the tf-idf method were used.
  • Based on the text documents, three main tasks were carried out, respectively:
    1. Determine whether the criticism is positive or negative (binary classification process).
    2. Predict the score of the critique, in a value between 1-4 and 7-10 (multi-class classification process).
    3. Find through common words between different texts, groups of documents that address similar areas or themes (clustering similar areas or themes (clustering process).