/Data-Warehousing-and-Business-Intelligence-for-IMDB

Designed Multi Star schema with 10 Fact & 33 Dimension tables & developed Data Integration Pipeline by ETL workflow to load all tables (380M+ rows) from multiple sources such as CSV, MySQL, MSSQL & PostgreSQL by using Talend. Implemented Data Profiling, Error Handling, Load Statistics, Cleansing and Performance Tuning to address data quality gaps & maintained referential integrity using Alteryx & Talend. Executed SQL scripts & created interactive visualization dashboards on PowerBI & Tableau for analyzing the data to draw better insights on sales & customer segmentation

Primary LanguageTSQLMIT LicenseMIT

IMDB_Database_Warehousing_and_Business_Intelligence

Objective:

The aim of the project is to analyse the movies data from multiple sources such as IMDB MoviesLens, The Numbers and BoxOffice Mojo.com based on movies/cast/box office revenues, movie brands and franchises and perform ETL processes using Talend & Alteryx.

Description:

  • Designed Multi Star schema with 10 Fact & 33 Dimension tables & developed Data Integration Pipeline by ETL workflow to load all tables (380M+ rows) from multiple sources such as CSV, MySQL, MSSQL & PostgreSQL by using Talend
  • Implemented Data Profiling, Error Handling, Load Statistics, Cleansing and Performance Tuning to address data quality gaps & maintained referential integrity using Alteryx & Talend
  • Executed SQL scripts & created interactive visualization dashboards on PowerBI & Tableau for analyzing the data to draw better insights on sales & customer segmentation

The key topic areas are:

  • Data Architecture
  • Dimensional Data Modeling
  • Data Integration & ETL (Extract, Transform & Load)
  • Data Engineering & Data Preparation
  • Business Intelligence & Data Analytics Design
Tools Used:
Talend, Alteryx, SQL Server, MySQL, PostreSQL, PowerBI, Tableau, ER Studio

Instructor: Rick Sherman

Reference Book: Business Intelligence Guidebook: From Data Integration to Analytics

Dataset Links:

https://datasets.imdbws.com/

https://www.boxofficemojo.com/franchise/?ref_=bo_nb_fr_secondarytab

https://www.boxofficemojo.com/brand/?ref_=bo_nb_frs_secondarytab

https://grouplens.org/datasets/movielens/25m/

https://www.the-numbers.com/movies/franchises

https://www.the-numbers.com/movies/franchise/Marvel-Cinematic-Universe#tab=summary

https://www.the-numbers.com/movie/Avengers-The-(2012)#tab=box-office

References:

https://elearning.tableau.com/

https://help.talend.com/reader/KxVIhxtXBBFymmkkWJ~O4Q/8RlpZdAdKhP0IaMHXRV7yw

https://www.talend.com/

https://grouplens.org/datasets/movielens/

Talend Jobs:



IMDB_Staging - SQL Server:

IMDB_Integration - SQL Server:

IMDB_BI_Movies - SQL Server:

IMDB_BI_People - PostgreSQL:

IMDB_BI_TV - MySQL:



Alteryx Workflows:



IMDB_PostgreSQL:

IMDB - BI Movies:

• Worldwide Lifetime Gross based on Running Minutes

• Top 100 Movies by Genre, Rating & Running Time

• Top 25 Movies Based on Domestic and Foreign Revenue

• Top 25 Movies based on Number of Votes and Title Type

• Average Rating by Genre and Years

• Top 25 Title based on IMDb rating and Title Type

• People Involved with Movies

• Movies Gross

• Movies Selected Info

IMDB - BI TV:

• 100 Popular TV SHows based on User Votes

• People Detail with TV

• TV Titles based on Title Type and Avg Rating

• Ratings per Year

• Top 25 TV Title based on Rating and Title Type

• TV Series With Episode Number and Season Number

IMDB - BI People:

• People Overview

• Title Overview

• Title and People

• Director Overview