Pre-training strategies and datasets for facial representation learning

This is the PyTorch implementation for Facial Representation Learning (FRL) paper:

@inproceedings{bulat2022pre,
  title={Pre-training strategies and datasets for facial representation learning},
  author={Bulat, Adrian and Cheng, Shiyang and Yang, Jing and Garbett, Andrew and Sanchez, Enrique and Tzimiropoulos, Georgios},
  journal={ECCV},
  year={2022}
}

Model Zoo

We provide bellow some of the models trained in a self-supervised manner. More models to be added later on.

data	backbone	url
VGG	ResNet 50	model
VGG (1M)	ResNet 50	model
FPR-Flickr	ResNet 50	model

Code snippet for loading the weights in a torchvision standard resnet:

import torch
from torchvision.models import resnet50

init_weights = torch.load('flr_r50_flickr_face.pth', map_location=torch.device('cpu'))['state_dict']
converted_weights = {k.replace('module.base_net.', ''):v for k, v in init_weights.items()}

model = resnet50(weights=None)
results = model.load_state_dict(converted_weights, strict=False)
# Note: the classifier layer is not loaded (fc.weight and fc.bias)
# similarly,  the projection layers used for pre-training are discarded.
print(results)

Installation

To use the code, clone the repo and install the following packages:

git clone https://github.com/1adrianb/unsupervised-face-representation

Requirements

Python >= 3.8
Numpy
pytorch: install instructions
torchvision: conda install torchvision -c pytorch
apex: install instructions
OpenCV: pip install opencv-python
H5Py: conda install h5py
tensorboard: pip install tensorboard
pandas

Note, if you are using pytorch > 1.10 and experience issues with apex, please see #1282. Alternatively you can switch to the native pytorch amp.

Training

bash scripts/run.sh

Before running the script make sure to set the appropiate paths. The models released in the paper were trained using 64 K40 GPUs.

Preparing data

For instructions regarding getting the data, see DATASET.md

Acknowledgement

We thank the original authors for releasing their code: SwAV, MoCo, BYOL, and vissl which we base our code base upon.

peternara/unsupervised-face-representation