Tensorflow-keras implementation for Contrastive Reconstruction: a self-supervised learning algorithm that obtains image representations by jointly optimizing a contrastive and a self-reconstruction loss presented at the ICML 2021 Workshop: Self-Supervised Learning for Reasoning and Perception [Paper, Poster].
We used Python 3.7 in our experiments.
pip install -r requirements.txt
For the Oxford Flowers and Stanford Dogs dataset, the data is automatically downloaded when invoking the training script. For the Aptos2019 dataset, the data has to be downloaded manually.
Register for the Aptos 2019 Kaggle Competition
and download the train_images
folder (train_images.zip). Unzip the images into one folder (e.g. train_images
)
and then resize the images and put them in the resources folder with the following script:
python scripts/aptos2019/resize_images.py --image-dir train_images --output_dir resources/aptos2019/images
Samples from the synthetic dataset can be generated by invoking the following script:
python scripts/create_synthetic_ds.py --type <rectange-triangle|circle-square>
--output dataset.npz --num-train <n> --num-test <m>
The generated dataset that we used in our paper are included in this repository and can
be found under resources/rectangle-triangle.npz
and resources/circle-square.npz
as numpy arrays.
The respective datasets are also available as pngs files in resources/rectangle-triangle.zip
and
resources/circle-square.zip
.
To pretrain the models and reproduce our results in the paper, invoke the training script in the following way:
python train.py \
-o lars -lr 0.075 --lr-scaling sqrt -t 0.5 -wd 1e-4 \
--color-jitter-strength=1.0 --use-blur \
-bs 16 \
-m unet --depth 4 --filters 64 \
--logdir <logdir> \
-e <epochs> \
# For oxford_flowers102, stanford_dogs or any other tf dataset
--dataset <dataset> \
--linear-type categorical \
--eval-center-crop \
# For oxford_flowers102 additionally
--train-split train+validation \
# For aptos2019
--dataset aptos2019 \
--train-split all.csv \
--test-split all.csv \
--data-path resources/aptos2019 \
--linear-type diabetic \
# For synthehtic dataset
--dataset numpy \
--data-path <resources/rectangle-triangle.npz|resources/circle-square.npz> \
--height 128 --width 128 --channels 1 \
--linear-type categorical \
# For SimCLR
--lambda-con=1.0 \
--encoder-reduction ga_pooling \
--aug-impl simclr --simclr \
# For SimCLR + Attention
--lambda-con=1.0 \
--encoder-reduction ga_attention \
--aug-impl simclr --simclr \
# For Conrec
--lambda-rec=100.0 --lambda-con=1.0 \
--encoder-reduction ga_attention \
--aug-impl conrec \
# Optional Parameters
--validation-freq 20 \
--log-images \
--image-log-interval 20 \
--linear-interval 20 \
--save-epochs 100, 200
# Shuffle Buffer size is batch_size x shuffle-buffer-multiplier
--shuffle-buffer-multiplier 10
# Performs sklearn linear evaluation in another thread
--async-eval
where epochs should be at least 1200 for the stanford_dogs and aptos2019 dataset, 2700 for the oxford_flowers dataset and 1000 for the synthetic datasets. A ConRec example for the Oxford Flowers dataset would be:
python train.py \
-o lars -lr 0.075 --lr-scaling sqrt -t 0.5 -wd 1e-4 \
--color-jitter-strength=1.0 --use-blur -bs 16 -e 2700 \
-m unet --depth 4 --filters 64 \
--dataset oxford_flowers102 \
--linear-type categorical \
--eval-center-crop \
--lambda-rec=100.0 --lambda-con=1.0 \
--encoder-reduction ga_attention \
--aug-impl conrec \
--train-split train+validation \
--logdir <logdir>
There is also the possibility to train the images from an image folder with jpg/png images. The folder with the images should have the following structure:
path/to/image_dir/
split_name/ # Ex: 'train'
label1/ # Ex: 'airplane' or '0015'
xxx.png
xxy.png
xxz.png
label2/
xxx.png
xxy.png
xxz.png
split_name/ # Ex: 'test'
If we do not have labels just put all images under <path>/train/0
and specify
--data-path <path>
.
python train.py \
-o lars -lr 0.075 --lr-scaling sqrt -t 0.5 -wd 1e-4 \
--color-jitter-strength=1.0 --use-blur \
-e 1000 -bs 16 \
-m backbone --backbone densenet121 \
--lambda-rec=100.0 --lambda-con=1.0 \
--encoder-reduction ga_attention \
--aug-impl conrec \
--logdir <logdir> \
--dataset image-folder \
--data-path <path-to-image-folder> \
# Specify folder with train images
--train-split train # by default \
# Specify folder with test images
--test-split test # by default \
# or deactivate test data, no validation and eval will be performed
--no-test-data \
--linear-type none \
# It is also possible to supply a different evaluation dataset
--eval-dataset <..> \
--eval-data-path <..> \
# Center crop data for eval if all images do not have the same dimensions
--eval-center-crop
After pretraining the model, it is possible to evaluate the model
with logistic regression for various subsets of the data.
This is done by generating a json file (e.g. models.json
) which includes an entry
for every model that should be evaluated in the following format:
[
{
"name": "unet-conrec",
"path": "<path-to-model.hdf5>",
"preprocess": null,
"output_layer": "encoder_output"
},
...
]
Then we can compute the embeddings for each model, write them into a directory and finally perform linear evaluation on them.
python scripts/aptos2019/compute_embeddings.py --models models.json --out-dir resources/aptos2019/embeddings --data-path resources/aptos2019
python scripts/aptos2019/evaluate_embeddings.py --embeddings-dir resources/aptos2019/embeddings \
--label-percentages 0.1 0.25 0.5 1.0 --repetitions 5 --output resources/aptos2019/results.csv
python scripts/tf_dataset/compute_embeddings.py --dataset <oxford_flowers102|stanford_dogs> \
--models models.json --out-dir <dir>
python scripts/tf_dataset/evaluate_embeddings.py --embeddings-dir <dir> \
--label-percentages 0.1 0.25 0.5 1.0 --output results.csv
To plot the results use the output file that was generated in the evaluation script:
python scripts/plot_results.py --input results.csv --metric <accuracy|kappa_kaggle>
Furthermore, instead of using logistic regression, adding a dense layer on top of the frozen encoder and using augmentations while finetuning yielded better results for the Oxford Flowers and Stanford Dogs dataset. This can be reproduced in the following way
python finetune.py -d oxford_flowers102 \
--classes 102 -lr 0.1 -o sgd \
-e 400 -bs 64 -wd 0 \
--train-split "train+validation" \
--freeze-until encoder_output \
--validation-freq 20 --preprocess simclr \
--gpu 0 --save-model \
--logdir <logdir> \
-p <path-to-model.hdf5>
python finetune.py -d stanford_dogs \
--classes 120 -lr 0.01 -o sgd \
-e 500 -bs 64 -wd 0 \
--freeze-until encoder_output \
--validation-freq 20 --preprocess simclr \
--gpu 0 --save-model \
--logdir <logdir> \
-p <path-to-model.hdf5>
With the same script, it is also possible to train the reported baselines:
python finetune.py -d oxford_flowers102 \
--classes 102 -lr 0.3 -o lars \
-e 1000 -bs 16 -wd 5e-4 \
--train-split "train+validation" \
--validation-freq 20 --preprocess simclr \
--gpu 0 --save-model \
-m unet --depth 4 --filters 64 \
--logdir <logdir>
python finetune.py -d stanford_dogs \
--classes 120 -lr 0.3 -o lars \
-e 500 -bs 16 -wd 1e-4 \
--validation-freq 20 --preprocess simclr \
--gpu 0 --save-model \
-m unet --depth 4 --filters 64 \
--logdir <logdir>
For the Aptos2019, we used the following script and configuration:
python scripts/aptos2019/finetune.py \
--output results.csv \
--models models.json \
-o adam \
-bs 32 \
-e 25 \
-lr 5e-5 \
--preprocess diabetic \
-wd 0 \
--gpu 0 \
--folds 5 \
--repetitions 5
where models.json has the same structure as for the logisitic regression and contained paths to random initialized models in this case.
@article{dippel2021finegrained,
title={Towards Fine-grained Visual Representations by Combining Contrastive Learning with Image Reconstruction and Attention-weighted Pooling},
author={Jonas Dippel and Steffen Vogler and Johannes H\"ohne},
year={2021},
journal={arXiv preprint arXiv:2104.04323}
}