- This was the final project during my 2nd semester at Harvard University
- This repo is lacking docstrings, unit tests, design-patterns, etc.
- While I believe there are many things that could be improved with this repo (which I currently have no plan to implement), I am choosing to keep it up as I feel hit helps illustrate the change in my coding style throughout the years.
- data directory - all the data we collected scraping as well as plot transformation arrays
- submissions directory - project milestone and final submission documents
- .gitignore - self-explanatory
- eda_and_plot_transformations.ipynb - milestone 3 EDA plots and basic analysis as well as bag-of-words TFIDF, word2vec, and doc2vec plot transformations for modeling
- future_work_LDA.ipynb - notebook to explore possibilities of LDA clustering on our dataset
- modeling_analysis.ipynb - primary modeling and analysis notebook
- movie_scraper_and_prep.ipynb - data collection and initial prep
All data descriptions, EDA, modeling, and analysis is described in details in the final report found in the submissions directory.
- Andrew Lund
- Nicholas Morgan
- Amay Umradia
- Charles Webb
The PDF of our final report can be found in the submissions folder. Each individual Jupyter notebook is also well-documented. The manipulated dataframes and arrays are all saved in the data
folder, which allows for each notebook to be run independently of the others. If you wish to follow along in the order of our report, go through the notebooks in the following order:
- movie_scraper_and_prep.ipynb
- eda_and_plot_transformations.ipynb
- modeling_analysis.ipynb
- future_work_LDA.ipynb