Television Show Recommender System

Executive Summary

I analyzed the transcripts of 117,937 television episodes from 4,667 different television shows using Latent Dirichlet Allocation in order to find clusters of common language between different shows. and to then take those similarities to build a content based recommender for television shows.

System Requirements

Python==3.7.3
gensim==3.8.1
Flask==1.1.1
nltk==3.4.5
pandas==0.25.2
matplotlib==3.1.1
numpy==1.17.2
spacy==2.2.1
spacy-langdetect==0.1.2
beautifulsoup4==4.8.0

For Google Cloud Virtual Instance:

need Virtual Machine with at least 104 GBs of RAM
google-api-core==1.14.3
google-auth==1.7.1
google-auth-oauthlib==0.4.1
google-cloud==0.34.0
google-cloud-core==1.0.3
google-cloud-storage==1.23.0
google-pasta==0.1.8
google-resumable-media==0.5.0

How to Use this Repository

All final production code is in the final_code folder, while the development_code folder contains other pieces of code written during the project that ended up not being used to create the final result. The notebooks Python scripts are listed in chronological order. None of my final data is posted because of its size (2.6 GBs), but please contact me if you would like a copy!

salice/television_show_recommender

Television Show Recommender System

Executive Summary

System Requirements

How to Use this Repository