/disney-data-science-tasks

Creation of a Disney Movie Dataset & Analysis using Python

Primary LanguageJupyter Notebook

Disney Dataset Creation & Analysis

In this video we walk through a series of data science tasks to create a dataset on disney movies and analyze it using Python Beautifulsoup, requests, and several other libraries along the way.

Setup

To access all of the files I recommend you fork this repo and then clone it locally. Instructions on how to do this can be found here: https://help.github.com/en/github/getting-started-with-github/fork-a-repo

The other option is to click the green "clone or download" button and then click "Download ZIP". You then should extract all of the files to the location you want to edit your code.

Installing Jupyter Notebook: https://jupyter.readthedocs.io/en/latest/install.html

Background Information

This repo goes along with my video "Solving real world data science tasks with Python BeautifulSoup!

In this video we scrape Wikipedia pages to create a dataset on Disney movies.

The video is formatted with tasks for you to try to solve on your own throughout. For the best learning experience, at each task you should pause the video, try the task on your own, and then resume when you want to see how I would solve it.

We cover a wide range of Python & data science topics in this video. They include:

  • Web scraping with BeautifulSoup
  • Cleaning data
  • Testing code with Pytest
  • Pattern matching with regular expressions (Re library)
  • Working with dates (datetime library)
  • Saving & loading data with Pickle library
  • Accessing data from an API using Requests library

To see the steps to create the dataset, check out dataset-creation.ipynb
In a future video we will analyze the dataset in dataset-analysis.ipynb

Save/Load the Datasets

  • If you want to jump into a specific task, feel free to utilize the dataset checkpoints.
  • To load these files you can look at the functions found in this file.
  • If you want to just do analysis on the final dataset, check out this folder.