redshift-datawarehouse-etl: A Python repository from yl-coder

How to run the ETL process.

import create_tables as ct;
ct.main;
import etl as e;
e.main();

This database contains analytical information about the usage of Sparkify users song play represented as a fact table using star schema. It also contains dimension data of user information, songs, artists and timestamps of records in songplay.
Sparkify oftens want to know where their user is located at, and what songs interest them. So that they can procure better song content with music provider.
Sparkify also wants to popular song and artist in the area so that they can do song recommendation to their user in similar location.

The data is modelled using star schema. Song play data is represented as a fact table and user information, songs, artists and timestamps of records in songplay is represented as dimension table.
The ETL pipeline extract JSON files from s3, then load them into staging tables in redshift. The next step is to perform SQL transformation and load them fact and dimension table into actual star schema tables in redshift.