/music-hit-decorate-datasets

Python console program that fetches data from Instagram and Music Story. It will add artists' social media information into the datasets that we will use to predict a song's popularity. This is a group project from Data Science and Advanced Analytics course from the Big Data & Analytics Masters @ EAE class of 2021.

Predicting a hit Song - Decorate Datasets - Instagram / Music Story

This is a Python console program that fetches data from Instagram and Music Story.

This program's purpose is to add artists' social media information into the datasets that we will use to predict a Spotify song popularity, primarily based on its audio features.

Using Machine Learning Models to predict a song's popularity, is a group project from the Data Science and Advanced Analytics course from the Big Data & Analytics Masters @ EAE class of 2021.

Diagram

This is part of the overall project: Predicting song popularity using machine learning

Decorate datasets is highlighted in a red box đź”´

image

Usage

Console has three main purposes for examples on how to use them, see console.out

Search Music Story Entity & Instagram URL by Spotify Artist ID

 MODE_SEARCH_IG_PROFILE_BY_NAME 

Get Instagram profile and followers using instagram username as input

 MODE_SEARCH_IG_FOLLLOWERS_BY_IG_USERNAME 

Previous Step #1 resulted in: music_story_enriched_artist_.csv At first, the design we wanted was one, where we link one step to another Thus, the functions return the filename. We could link them thru command line pipes In between steps we need to exclude found records and allow the next step to complete the missing ones

Search Instagram profile and followers using artist name as input

MODE_SEARCH_IG_FOLLLOWERS_BY_ARTIST_NAME 

Previous Step #2 resulted in: instagram_enriched_artist_.csv This was also thought to pipe/link it to the final name search. In between steps we need to exclude found records and allow the next step to complete the missing ones THis way we are more efficient, only doing name based searches on not fount artists. But, we realized that even after music story had returned the artist entity this one, may be a fan account, not the real one that is why, we have manual intervention between one step to another decorate_instagram_followers_artist_based_on_name_search "instagram_enriched_artist" MS: stands for Music Story Search IS: stands for INstagram Name Search We had to run several manual searches and join them together because: * Instagram requests lock out limit * Found artists at ig, were not considered if follow count was < 2k. We didn't automate this logic

Prepare test dataset using Spotify playlist as input

MODE_PRODUCE_TEST_DATASET_BY_PLAYLIST

Taking as input a spotify playlist id, this option will iterate through the list

  • add spotify audio features
  • add instagram followers
  • save playlist items at a csv

Team

(music-hit-decorate-datasets/music-hit-analyze-data/music-hit-train-predict)

Professor

Professor Assistants

To-Do list

The: "I didn't do cause we needed to submit this ASAP"

  • Have two different consoles, to separate concerns: Instagram / Music Story
  • Package them like a hero
  • A better (or one at least) pause/halt/fail - resume mechanism. Useful when APIs timeout on too many requests.
  • Explore other services the music story database provided. Such as artists' tweets, lyrics