melindamalone/Movies-ETL

The Movies Extract-Transform-Load (ETL) Analysis repo contains movie data extracted from Wikipedia and Kaggle in CSV and JSON file formats. The datasets were transformed by cleaning and merging the datasets, and the cleaned datasets were loaded into a movie_data SQL database. Regex was used to identify strings of characters defined by search patterns playing a critical role in cleaning the box office, budget, release date, and running time data. Lambda functions were used in the transform phase as "anonymous functions."

Jupyter Notebook

Movies Extract-Transform-Load Analysis

Overview of the Movies Extract-Transform-Load Analysis

Purpose

The purpose of Module Eight and the Movies Extract-Transform-Load (ETL) Analysis is to learn how to use the Extract-Transform-Load, also known as ETL, process to create data pipelines. Using the ETL process, movie data was extracted from Wikipedia and Kaggle in CSV and json files, then the datasets were transformed by cleaning and merging the datasets, and finally the cleaned datasets were loaded into a movie_data SQL database. During the transform step in this module, Regular Expressions, also known as Regex, were used to identify strings of characters defined by search patterns playing a critical role in cleaning the box office, budget, release date, and running time data. Lambda functions were also introduced in this module and were used in the transform phase as "anonymous functions."