Project made for the MLBD (Machine Learning on Big Data) lecture of the MSE Master.
Use unsupervised learning algorithms to classify movies. We used Self-Organizing Maps (Kohonen) to do this.
The sysnopsis of different movies were fetched from the IMDb website. The classification is based on the occurrence of the relevant words in each synopsis.
- requests : for HTTP requests
- BeautifulSoup : for HTML parsing
- json : to store and format datas
- numpy : for matrices computation
- nltk : to remove the stopwords
- re : to extract words from the texts
- kohonen : unsupervised learning algorithms
- hcluster : for hierarchical clustering and visualization
- matplotlib : for visualization
Self-Organizing Maps, SOM, Kohonen, U-matrix, machine learning, TF-IDF, Python, clustering, classification, neural network.
- Jacky Casas
- Simone Cogno