Web Scraping Project: Movies and TV Shows Data Analysis

Introduction

This project focuses on web scraping movie and TV show data from JustWatch using Python libraries such as Beautiful Soup and Requests. The scraped data includes details such as title, release year, IMDB rating, genres, runtime, age rating, production details, and streaming information. The primary objective of this project is to analyze and gain insights from the scraped data, including identifying top genres, average IMDB ratings, and streaming service counts.

Dataset Overview

The dataset comprises two main dataframes: one for movies and another for TV shows. Each dataframe includes the following columns:

  • Title
  • Release Year
  • IMDB Rating
  • Genres
  • Runtime
  • Age Rating
  • Production Details
  • Streaming Details

Data Wrangling

  1. Data Acquisition: Scraped movie and TV show data from JustWatch using web scraping techniques.
  2. Data Cleaning:
    • Converted data types to appropriate formats.
    • Handled null values by imputation or removal.
    • Standardized column names for consistency.
  3. Exploratory Data Analysis:
    • Calculated descriptive statistics such as mean IMDB rating.
    • Identified top genres based on frequency.
    • Counted streaming services for each movie and TV show.
    • Visualized top genres and significant streaming counts using word clouds.

Findings

  • The mean IMDB rating for movies and TV shows.
  • Top genres across movies and TV shows based on frequency.
  • Streaming service counts for movies and TV shows.
  • Insights into popular genres and streaming platforms.

Future Steps

  • Explore additional sources for data enrichment.
  • Perform sentiment analysis on user reviews to gauge audience reception.
  • Develop predictive models to forecast IMDB ratings or recommend content.
  • Enhance visualization techniques for more insightful analysis.

Repository Structure

  • Notebooks: Jupyter notebooks containing the web scraping code and data wrangling process.
  • Data: Folder containing the scraped datasets in CSV format.
  • Visualizations: Visualizations generated during exploratory data analysis.
  • README.md: Overview of the project and instructions for replicating the analysis.

References

  • JustWatch: Website
  • Python Libraries: Beautiful Soup, Requests, Pandas, Matplotlib, WordCloud

Feel free to explore the repository and provide feedback or suggestions for improvement. Thank you for your interest in this project!