DeepMusicRecommendation
This work has reproduced the results by (van den Oord et al., 2013) and shown that a CNN can successfully learn features from Mel- spectrogram representations of songs. The CNN model uses four convolutional layers followed by ReLU non-linearities and max- pooling layers, a global pooling layer and three fully-connected layers to predict a song’s item-factors. These are a factorised represen- tation of user-item play counts obtained using Weighted Matrix Factorisation. An AUC score of 0.71 was achieved when reconstructing user-item preference labels with the item- factors predicted by the CNN model. In the process of predicting item-factors, , the neural network should learn musically significant features. These features can be used to find similarities between songs and hence recommend unknown songs. The visualisation of these features using t-SNE did not show clear enough patterns to confirm their significance. Further investigation of neuron activations in the final fully-connected layer is required to determine what the features represent.