This repo presents a self-authored project from a Spotify data pipeline. The idea is to create an Airflow with Docker instance, extract raw data from the Spotify API using a personal account, get data that makes sense using in-code transforms and load the final dataset into a Postgres database. The aim is to practice data engineering skills like API consumption, container programming, and task scheduling and orchestration.
- Python
- Docker
- Spotify account (not premium required)
|-- dags
|-- spotify
|-- custom_dags
-- dags --
|-- etl
-- etl scripts --
|-- utils
-- etl utils --
|-- data
|-- raw
|-- curated
|-- logs
|-- plugins
-- setup scripts --
To prepare environment the example.env
file can be used. The variables that demand attention or explanation are:
Variable | Description |
---|---|
AIRFLOW_UID | Airflow user id |
CLIENT_ID | Spotify client id |
CLIENT_SECRET | Spotify client secret |
On linux, the right Airflow user id can be obtained with $(id -u)
. For other OS the 50000 id can be placed manually on this variable. To get Spotify client and secret credentials access your dashboard from Spotify for Developers, go to “create an app”, generate fresh tokens and set the values in your environment file.
To make Airflow directories structure:
$ make airflow-folders
To initialize Airflow services:
$ make airflow-init
To run full Airflow services and Postgres database:
$ make run
After that point, all Airflow services will be raised through compose. You can check the health of containers through:
$ make healthy-check
After that, you will be able to access the Airflow web server at http://localhost/8080/home. To run the ETL process just turn on the DAG and trigger it.
Finally, to see the result, run the Postgres container with:
$ docker exec -it <container_id> /bin/sh
$ psql --username <your_db_user>
and query the data with something like:
SELECT * FROM public."spotify-tracks";
To disarm compose and exclude associated volumes and networks:
$ make disarm
Note: all make "cmd" commands are made to run on macOS. To run on linux edit the Makefile replacing the occurrences of "docker compose" with "docker-compose".