imdb_data_engineering

In this project I created and orchestrated a data pipeline to analyze the IMDB movie data.

The data pipeline was created using the following tools:

  1. Data ingestion: Web scraping from IMDB using Python
  2. Data storage: Google BigQuery
  3. Data analysis: DBT
  4. Data visualization: Power BI
  5. Data orchestration: Apache Airflow
  6. Container deployment: Docker

Project Workflow

Untitled Diagram drawio(21)

End Results

1. Data Pipelines in Apache Airflow

Screenshot 2023-06-04 at 20-41-49 DAGs - Airflow

2. Tables and Views in BigQuery

Screenshot 2023-06-04 at 20-42-32 BigQuery – imdb – Google Cloud console

3. Power BI Reports

Screenshot 2023-06-04 at 20-20-09 imdb_reports pdf

Screenshot 2023-06-04 at 20-19-04 imdb_reports pdf

Screenshot 2023-06-04 at 20-18-28 imdb_reports pdf

Medium Article

https://medium.com/@bdadon50/data-engineering-project-imdb-movie-analysis-3f79de2f4ce7