/w08-w09-imdb-data

Using IMDb data to illustrate a few new pandas and grammar-of-graphics concepts

Primary LanguageJupyter Notebook

LSE DS105A (2023/24) - Week 08 & W09

pandas groupby-apply practices + data reshaping + intro to databases

Image Created with Stable Diffusion



This simple repository contains code and data used in Week 08 of 2023's edition of DS105A.

📚 PREPARATION

If you want to replicate the analysis in this notebook, you will need to:

  1. Clone this repository to your computer.

  2. Add it to your VS Code workspace.

  3. Go to IMDb Non-Commercial Datasets page and download all tsv.gz files from there, place all of that under the data/raw/ folder. This folder is gitignored, we don't want to push large data files to GitHub!

  4. Run:

    pip install -r requirements.txt
  5. Open the notebook and run the cells!