Sparkify Databases Project

Objective

Sparkify is a music streaming startup. As a startup, Sparkify would handle huge data as they grow up to. If Sparkify still using conventional method like store the data in JSON format, it would be a problem one day. In spite of to handle big data, Sparkify should analyze their data to make decision to speed up their business. With two goals, Sparkify should store their data into tabular format using PostgreSQL. In this project, there are two programs could be used to create tables and run an ETL process automatically.

There are three tables i.e. song table, user table, artist table, time table, and songplay table

Database Schema

Song table

Table that store song information

Columns schema:

song_id varchar
title varchar
artist_id varchar
year int
duration float

User table

Table that store user information

Columns schema:

user_id int
first_name varchar
last_name varchar
gender varchar
level varchar

Artist table

Table that store artist of particular song information

Columns schema

artist_id varchar
name varchar
location text
latitude float
longitude float

Time table

Table that store timestamp of the streamed song from user

Columns schema:

start_time bigint
hour int
day int
week int
month int
year int
weekday int

Songplay table

Table result of join another table. This table is for analytical purposes

Columns schema:

songplay_id int
start_time bigint
user_id int
level varchar
song_id varchar
artist_id varchar
session_id int
location text
user_agent text

ETL Pipeline

The pipelines of this project are:

Transform song, artist, and user data from JSON format into tabular format
Load song, artist, and user table to Postgre database
Transform log data into time table and load to Postgre database
Transform log data into songplay table and joining with song, artist, and user table
Load songplay table to Postgre

How to install database

python create_tables.py

How to run ETL process

python etl.py

BIMA/udacity-submit-lesson1

Sparkify Databases Project

Objective

Database Schema

Song table

User table

Artist table

Time table

Songplay table

ETL Pipeline

How to install database

How to run ETL process