MSc Thesis: "Audio-Visual Self-Supervised Representation Learning in-the-wild"
We provide checkpoints for models pre-trained on a subset of VGGSound with 50,000 videos. The former method refers to Cross-modal Instance Discrimination (xID), whereas the latter is based on the recently proposed VICReg method.
Method | Checkpoint (100 epochs) |
---|---|
xID | download link |
VICReg | download link |
To train a model using xID method run the following (assuming that DDP strategy is used):
python3 main-ssl.py configs/VGGSound-N1024.yaml --multiprocessing-distributed
For VICReg method, run:
python3 main-vicreg.py configs/VGGSound-VICReg.yaml --multiprocessing-distributed
To avoid data parallelism, discard --multiprocessing-distributed
argument and set the --gpu
argument on either of the aforementioned scripts to a specific id (e.g. 0 for the first GPU device).
For this experiment, run the following (e.g. for UCF-101 dataset and model pre-trained using xID method):
python3 eval-action-recg-linear.py configs/ucf/8at16-linear.yaml configs/VGGSound-N1024.yaml --distributed
Note that this script does not yet support multi-node evaluation.
Final results on both UCF-101 and HMDB-51 datasets are shown in the following table:
Method | Top-1 Acc. (UCF-101) | Top-5 Acc. (UCF-101) | Top-1 Acc. (HMDB-51) | Top-5 Acc. (HMDB-51) |
---|---|---|---|---|
xID | 51.20% | 80.91% | 28.08% | 61.29% |
VICReg | 39.75% | 71.30% | 21.85% | 52.69% |
For this experiment, run the following (e.g. for HMDB-51 dataset and model pre-trained using VICReg method):
python3 eval-action-recg.py configs/hmdb51/8at16-fold1.yaml configs/VGGSound-VICReg.yaml --distributed
Note that this script does not yet support multi-node evaluation.
Final results on both UCF-101 and HMDB-51 datasets are shown in the following table:
Method | Top-1 Acc. (UCF-101) | Top-5 Acc. (UCF-101) | Top-1 Acc. (HMDB-51) | Top-5 Acc. (HMDB-51) |
---|---|---|---|---|
xID | 73.22% | 92.78% | 42.85% | 73.69% |
VICReg | 59.53% | 85.94% | 34.65% | 68.96% |
In this experiment, we test the generalization performance of self-supervised models on data belonging to unknown classes (i.e. classes not found in the pre-training dataset). To perform the split on the so-called seen and unseen concepts, please use the label_similarities.ipynb
notebook. Based on our results, you can find the set of unseen concepts for UCF-101 and HMDB-51 respectively in datasets/rest_classes/
directory.
To perform this experiment, run the following (e.g. for xID model and UCF-101 dataset with 20% of training data per class for tuning the linear classifier):
python3 eval-action-recg-linear.py configs/ucf/8at16-linear.yaml configs/VGGSound-N1024.yaml --distributed --few-shot-ratio 0.2 --use-rest-classes
Final results are depicted in the following plots: