/exercice-audience-recommendation

Predict music for a group of users

Primary LanguageJupyter Notebook

Main goal: recommend songs for a given audience. The database is a set of tracks per user with a given score based on the user's tastes.

Installation

  • Setup environment
virtualenv venv
source venv/bin/activate
pip install -r requirements
  • Jupyter python 2 kernel (if necessary)
python2 -m ipykernel install --user
  • Download the data
mkdir data

Download the json file into this folder.

Structure of the data : Array of array of tracks

  {
    "artist": "Name",
    "genre": "Disco",
    "id": 1234,
    "score": 0.123456789
  }
  • Run the jupyter notebook
jupyter notebook
  • Run the tests
nosetests --rednose --force-color' tests

### Exploration

The exploratoring part can be seen using exploratory.ipynb

We examine the basics statistics of the database, some tops, the score distribution. We also compute the correlation between artists that we will use in our recommendation process.

Recommendation

Recommendation notebook reco.ipynb

We want to recommend songs for an audience who have different tastes. The main strategy is :

  • A track oriented recommendation
    • Find common tracks to each user with a high score
    • From this track find a correlated track unknow from the users
    • If one user doesn't know the song, the correlated song can be known by him. Thus N-1 people will like the song, and one user will be happy to say he knew this song.
  • An artist oriented recommendation
    • Same behavior to choose an artist
    • (not implemented) The song can then be pick in the artist catalog according to the knowledge of the users

We didn't use the genre field because of its unprecision :

    {
      "artist": "Mozart",
      "genre": "Electro",
      "id": 399137,
      "score": 0.6105205207831543
    },
    {
      "artist": "2be3",
      "genre": "Funk",
      "id": 988837,
      "score": 0.9371904213393983
    },
    {
      "artist": "The Offspring",
      "genre": "RnB",
      "id": 515570,
      "score": 0.8552045099017862
    },
    {
      "artist": "Booba",
      "genre": "Disco",
      "id": 346766,
      "score": 0.2840838314868537
    },

Ranking

The ranking will be :

  • First: commons songs liked sorted by number of people knowing the song and then the mean score of the track
  • Second: the correlated song with a score equal to the seed track score times the track mean score
  • Third: the songs by the common artist with a mean score above a "min score"
    • First for the artist known by everybody
    • Second less shares artist
    • Finally correlated artist songs still above a min score (in average)
  • Finally the last songs sorted by the average score

Future work

  • Have a min_score per field
  • Change cooccurrences priorities: just after common tracks reco
  • Metrics on ranking
    • Score evolution
    • Users distribution
  • Tests on engines
  • Refactoring (ranking-engine)