This project aims to create a database that enables dimensional analysis of music played on a music streaming platform. This platform belongs to a fictional starup, Sparkify, whose data comes from log files and datasets of songs.
To provide this analytical data a dimensional modeling was created in a PostgreSQL database. This modeling will be powered by an ETL process written in Python 3.6 that can be run on demand or on a scheduled basis.
With this database we can obtain answers to several questions, among them:
- Visions on the volumetry of songs listened to in different temporalities
- Frequency of use of the platform by state
- Most Popular Artists and Songs
- Run create_tables.py from terminal.
- Run etl.py from terminal.
An interactive test of the ETL process load can be done through the notebook Jupyter test.ipynb