By Thalles Silva, Helio Pedrini, Adín Ramírez Rivera.
This repo is the official implementation of Self-supervised Learning of Contextualized Local Visual Embeddings (CLoVE), featured on the 4th Visual Inductive Priors for Data-Efficient Deep Learning Workshop (ICCV2023).
Code base written in PyTorch.
We present Contextualized Local Visual Embeddings (CLoVE), a self-supervised convolutional-based method that learns representations suited for dense prediction tasks. CLoVE deviates from current methods and optimizes a single loss function that operates at the level of contextualized local embeddings learned from output feature maps of convolution neural network (CNN) encoders. To learn contextualized embeddings, CLoVE proposes a normalized multi-head self-attention layer that combines local features from different parts of an image based on similarity. We extensively benchmark CLoVE’s pre-trained representations on multiple datasets. CLoVE reaches state-of-the-art performance for CNN-based architectures in 4 dense prediction downstream tasks, including object detection, instance segmentation, keypoint detection, and dense pose estimation.
@inproceedings{silva2023self,
title={Self-supervised Learning of Contextualized Local Visual Embeddings},
author={Silva, Thalles and Pedrini, Helio and Ram{\'\i}rez, Ad{\'\i}n},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
pages={177--186},
year={2023}
}
Epochs | Multicrop | URL | |
---|---|---|---|
CLoVE | 50 | 2x224 + 6x96 | Checkpoints |
CLoVE | 200 | 2x224 + 6x96 | Checkpoints |
CLoVE | 400 | 2x224 + 6x96 | Checkpoints |
Method | ep | ||||||
---|---|---|---|---|---|---|---|
Supervised | 100 | 38.2 | 58.2 | 41.2 | 33.3 | 54.7 | 35.2 |
Rand init | - | 26.4 | 44 | 27.8 | 29.3 | 46.9 | 30.8 |
ReSim | 200 | 39.7 | 59 | 43 | 34.6 | 55.9 | 37.1 |
InsCon | 200 | 40.3 | 60.0 | 43.5 | 35.1 | 56.7 | 37.6 |
PixPro | 400 | 40.5 | 59.8 | 44 | 35.4 | 56.9} | 37.7 |
DetCo | 200 | 39.8 | 59.7 | 43 | 34.7 | 56.3 | 36.7 |
SlotCon | 200 | 39.9 | 59.8 | 43.0 | 34.9 | 56.5 | 37.3 |
CLoVE | 200 | 40.6 | 60.0 | 44.1 | 35.4 | 56.8 | 37.8 |
CLoVE | 400 | 41.0 | 60.3 | 44.2 | 35.5 | 57.2 | 38.1 |
Method | ep | ||||||
---|---|---|---|---|---|---|---|
Supervised | 100 | 20.2 | 33.4 | 21.4 | 19.6 | 31.2 | 20.8 |
Rand init | - | 12.4 | 21.8 | 12.5 | 12.1 | 20.2 | 12.5 |
DenseCL | 200 | 20.4 | 33.5 | 21.4 | 19.9 | 31.5 | 20.9 |
PixPro | 400 | 23.8 | 38.2 | 25.2 | 23.3 | 36.1 | 24.7 |
SlotCon | 200 | 23.2 | 37.6 | 24.3 | 22.9 | 35.6 | 24.3 |
VICRegL | 200 | 7 | 13.4 | 6.4 | 7.4 | 12.7 | 7.3 |
CLoVE | 200 | 23.6 | 37.7 | 25.2 | 23.3 | 35.9 | 24.8 |
400 | 24.3 | 38.8 | 25.8 | 23.9 | 36.7 | 25.3 |
Method | ep | AP | |
---|---|---|---|
Supervised | 100 | 26.5 | 52.9 |
Rand init | - | 19.9 | 40.7 |
DenseCL | 200 | 33.1 | 61.7 |
PixPro | 400 | 35.8 | 63.7 |
VICRegL | 300 | 29.8 | 58.5 |
SlotCon | 200 | 35.2 | 63.8 |
CLoVE | 200 | 35.7 | 64.1 |
CLoVE | 400 | 37.2 | 65.3 |
This repository was built on top of several existing publicly available codes. Specifically, we have modified and integrated the following code into this project:
We welcome pull requests and issues from the community.