/ETL_pipeline_with_Spotify

ETL(Extract-Transform-Load) process on Spotify user data

Primary LanguagePython

ETL_pipeline_with_Spotify

ETL(Extract-Transform-Load) process on Spotify user data

Step 1 - Extract

We can extract data with FTP(File Transfer Protocol) or API(Application Programming Interface). I used an API that provided by Spotify. I got Current User's Recently Played Tracks. We need a token to access the API. This token provided by spotify (need a Spotify account) => https://developer.spotify.com/console/get-recently-played/

  • Limit is max items(songs here) to return daily. default=20, min=1, max=50.
  • After is a date but specified in unix milliseconds format. This mean download listened songs after this date.
  • Before is reverse of after. Download listened songs before this date.
  • Using only before or after is enough.
  • Then click the get token button, select below option and clic request token.

image

After extract our data looks like that

image

Step 2 - Transform(Validation)

Some times data vendors might send empty file, duplicated data, null columns or row etc. We need to clean up this mess(dirty data) before uploading it to the database. Because working with dirty data gives us false information. "Garbage in, garbage out."

image

In this code only checked basic things with "check_if_valid_data()" function. You can look at below images to see the most common transform types.

image image

After calling the "check_if_valid_data()" function you will see this output if everything is alright.

image

Step 3 - Load

In this step we will load our data to database(SQLite). I used DBeaver to execute sql query and check my database. After run load operation an sqlite file created in script directory. Open DBeaver and connect to DB via this file.

image

Select path(sqlite file) and select finish that's all.

After load operation you can see your data in database.

image

Resources: