ETL_pipeline_with_Spotify

ETL(Extract-Transform-Load) process on Spotify user data

Step 1 - Extract

We can extract data with FTP(File Transfer Protocol) or API(Application Programming Interface). I used an API that provided by Spotify. I got Current User's Recently Played Tracks. We need a token to access the API. This token provided by spotify (need a Spotify account) => https://developer.spotify.com/console/get-recently-played/

Limit is max items(songs here) to return daily. default=20, min=1, max=50.
After is a date but specified in unix milliseconds format. This mean download listened songs after this date.
Before is reverse of after. Download listened songs before this date.
Using only before or after is enough.
Then click the get token button, select below option and clic request token.

After extract our data looks like that

Step 2 - Transform(Validation)

Some times data vendors might send empty file, duplicated data, null columns or row etc. We need to clean up this mess(dirty data) before uploading it to the database. Because working with dirty data gives us false information. "Garbage in, garbage out."

In this code only checked basic things with "check_if_valid_data()" function. You can look at below images to see the most common transform types.

After calling the "check_if_valid_data()" function you will see this output if everything is alright.

Step 3 - Load

In this step we will load our data to database(SQLite). I used DBeaver to execute sql query and check my database. After run load operation an sqlite file created in script directory. Open DBeaver and connect to DB via this file.

Select path(sqlite file) and select finish that's all.

After load operation you can see your data in database.

Resources:

koyuboy/ETL_pipeline_with_Spotify

ETL_pipeline_with_Spotify

Step 1 - Extract

After extract our data looks like that

Step 2 - Transform(Validation)

Step 3 - Load