/Film-Similarity-NLP-with-KMeans-Hierarchical-Clustering

Used NLP techniques (tokenization, stemming, vectorization for TF-IDF) and clustering algorithms (Kmeans and Hierarchical clustering) to mine the "similarities" between films based on their plots provided by IMBD and Wikipedia. The dataset contains the titles of the top 100 movies on IMDb.

Primary LanguagePython

Stargazers