/movies-classification

Project for the course "Machine Learning on Big Data"

Primary LanguageTeX

Movies Classification

Project made for the MLBD (Machine Learning on Big Data) lecture of the MSE Master.

Goal

Use unsupervised learning algorithms to classify movies. We used Self-Organizing Maps (Kohonen) to do this.

Data

The sysnopsis of different movies were fetched from the IMDb website. The classification is based on the occurrence of the relevant words in each synopsis.

Python libraries used

  • requests : for HTTP requests
  • BeautifulSoup : for HTML parsing
  • json : to store and format datas
  • numpy : for matrices computation
  • nltk : to remove the stopwords
  • re : to extract words from the texts
  • kohonen : unsupervised learning algorithms
  • hcluster : for hierarchical clustering and visualization
  • matplotlib : for visualization

Keywords

Self-Organizing Maps, SOM, Kohonen, U-matrix, machine learning, TF-IDF, Python, clustering, classification, neural network.

Authors

  • Jacky Casas
  • Simone Cogno