DC-MoviesSimilarity

alt text

Content

  1. Import and observe dataset
  2. Combine Wikipedia and IMDb plot summaries
  3. Tokenization
  4. Stemming
  5. Club together Tokenize & Stem
  6. Create TfidfVectorizer
  7. Fit transform TfidfVectorizer
  8. Import KMeans and create clusters
  9. Calculate similarity distance
  10. Import Matplotlib, Linkage, and Dendrograms
  11. Create merging and plot dendrogram
  12. Which movies are most similar?