0xCakin/Data_Modeling_with_Postgres

## PROJECT 1 - Data Modeling with Postgres

Jupyter Notebook

PROJECT 1 - Data Modeling with Postgres

This project has been created for the new startup called Sparkify.

The purpose of the project is understanding what songs users are listening and analyzing the data based on songs and user activity on the app.

Milestones of this project:

Build an ETL pipeline using Python
- Define fact and dimension tables for a star schema
- Create an ETL pipeline to transfer data from files to tables

Database Tables:

songs: fact data of the songs in the app
artists: fact data of the artists in the app
users: fact data of the app users
time: timestamp of user activity data
songplays: user listening activity data

Procedure:

Set up:

Install postgres and psycopg2-binary
Point postgres to the host port number and host name

Process:

Run create_tables.py to create the schema.
1. This will run sql_queries.py script to create & drop tables.
Run etl.py to create to populate the tables.
1. This will populate the tables with sql_queries.py

Files in this Repo:

Data
- log_data
- song_data
README.md
create_tables.py
etl.ipynb
etl.py
sql_queries.py
test.ipynb

EX Query:

    SELECT Count(*) agent_count, user_agent FROM SongPlays
    GROUP BY user_agent