Self supervised audio-visual scene analysis Pytorch implementation of "Audio-Visual Scene Analysis with Self-Supervised Multisensory Features"