This open source project is not an official Huawei product. Huawei is not expected to provide support for this project.
This repository is the official code implementation of this NeurIPS 2022 paper.
Authors: Franz Scherr, Qinghai Guo, Timoleon Moraitis
Keywords: Self-supervised Learning, Theoretical Neuroscience, Sensory-Motor Learning, Representation Learning, Visual Features, Visual Perception, Deep Learning, Neural Networks, Embodied Intelligence, Inverse Models
TL;DR: We show, the brain's motor commands could theoretically also offer supervision to the learning process of sensory representations, a framework that also unifies various self-supervised machine-learning methods, extends them and improves performance.
Self-supervised learning (SSL) methods aim to exploit the abundance of unlabelled data for machine learning (ML), however the underlying principles are often method-specific. An SSL framework derived from biological first principles of embodied learning could unify the various SSL methods, help elucidate learning in the brain, and possibly improve ML. SSL commonly transforms each training datapoint into a pair of views, uses the knowledge of this pairing as a positive (i.e. non-contrastive) self-supervisory sign, and potentially opposes it to unrelated, (i.e. contrastive) negative examples. Here, we show that this type of self-supervision is an incomplete implementation of a concept from neuroscience, the Efference Copy (EC). Specifically, the brain also transforms the environment through efference, i.e. motor commands, however it sends to itself an EC of the full commands, i.e. more than a mere SSL sign. In addition, its action representations are likely egocentric. From such a principled foundation we formally recover and extend SSL methods such as SimCLR, BYOL, and ReLIC under a common theoretical framework, i.e. Self-supervision Through Efference Copies (S-TEC). Empirically, S-TEC restructures meaningfully the within- and between-class representations. This manifests as improvement in recent strong SSL baselines in image classification, segmentation, object detection, and in audio. These results hypothesize a testable positive influence from the brain's motor outputs onto its sensory representations.
To install the requirements:
pip install -r requirements.txt
To train a ResNet-18 with S-TEC on CIFAR-10, run this command:
python train.py --dataset cifar10 --data_dir <path-to-data> --arch resnet18 --temperature .5 \
--learning_rate 4. --manip_lambda 1. --feat_dim 64 --results_path ./s_tec_cifar10_resnet18
To train a ResNet-50 with S-TEC on CIFAR-10, run this command:
python train.py --dataset cifar10 --data_dir <path-to-data> --arch resnet50 --temperature .5 \
--learning_rate 4. --manip_lambda 1. --hidden_mlp 1024 --feat_dim 64 \
--results_path ./s_tec_cifar10_resnet50
To train instead on CIFAR-100, replace cifar10
with cifar100
.
For training on STL-10, likely multiple GPUs are necessary to achieve a total batch size of 1024. For training a ResNet-18, use the command below (assuming 2 GPUs):
python train.py --dataset stl10 --data_dir <path-to-data> --arch resnet18 --temperature .2 \
--learning_rate 1.2 --manip_lambda .3 --results_path ./s_tec_stl10_resnet18 \
--use_solarization 1 --use_gaussian_blur 1 --batch_size 512 --gpus 2
Subsequently, training the linear classifier is performed separately using this command:
python train.py --dataset stl10 --data_dir <path-to-data> --arch resnet18 --batch_size 1024 \
--max_epochs 100 --optimizer sgd --learning_rate .04 --stop_gradient 2 \
--reinitialize_supervised_head 1 --nesterov 1 --p_color_jitter -1 --p_grayscale -1 \
--ckpt_path <path-to-checkpoint> --results_path ./s_tec_stl10_resnet18_linear_fit
To train a ResNet-50, replace all above occurrences of resnet18
with resnet50
and adapt the number of GPUs and batch size per GPU accordingly.
Training a ResNet-50 on ImageNet can be achieved by executing following command (e.g. using 8 GPUs):
python train.py --dataset imagenet--data_dir <path-to-data> --arch resnet50 --temperature .1 \
--learning_rate 1.97 --manip_lambda .6 --results_path ./s_tec_imagenet_resnet50 \
--use_solarization 1 --use_gaussian_blur 1 --batch_size 210 --gpus 8 --max_epochs 100
Evaluation is performed using the following command:
python eval.py --dataset <dataset> --data_dir <path-to-data> --arch <resnetXY> --ckpt_path <path-to-checkpoint>
See commands in section pre-trained models for examples.
We include here pretrained models (trained using S-TEC and SimCLR) for ResNet-18 on CIFAR-100, and plan to release all other models in a public version.
To evaluate a model trained with S-TEC, execute:
python eval.py --dataset cifar100 --data_dir <path-to-data> --arch resnet18 \
--ckpt_path pretrained/resnet18_stec_cifar100.ckpt
It should print 0.6680
(i.e. 66.80%
).
To evaluate a model trained with SimCLR, execute:
python eval.py --dataset cifar100 --data_dir <path-to-data> --arch resnet18 \
--ckpt_path pretrained/resnet18_simclr_cifar100.ckpt
It should print 0.6540
(i.e. 65.40%
).
To cite this work, please use the following reference:
@article{scherr2022self,
title={Self-supervised learning through efference copies},
author={Scherr, Franz and Guo, Qinghai and Moraitis, Timoleon},
journal={Advances in Neural Information Processing Systems},
volume={35},
pages={4543--4557},
year={2022}
}