This is a project demonstrating how to create a data mart for a music streaming app and persists data to it via an ETL data pipeline extracting information from JSON logs and loading it into a Postgres database.
It processes the JSON record with the Panda library.
To test the project locally
- Install Jupyter Lab
- Install PostgreSQL
- Install Python3
- Install Python Panda library
- Checkout the code
- Navigate to the root of the project
- Run the command
bash run_create_etl.sh
which will create the database and populate it - Start the notebook by running the command
jupyter notebook
which will launch it in the browser - Open
test.ipynb
to run queries and view the data
├── README.md - This file.
├── create_tables.py # Python script with all methods necessary to recreate the data mart.
├── etl.ipynb # Python based Jupyter Notebook describing and executing all tasks related to the extraction, tranformation and loading of the data.
├── run_create_etl.sh # Bash shell script executing the creation of the schema and the persistence of the data into it.
├── sql_queries.py # Python script defining the data mart schema and prepared/reusable queries.
└── test.ipynb # Python based Jupyter Notebook for veryfying if the database contains any data.
Do not hesitate to submit a pull request.