
Code for "Self-Lifting: A Novel Framework For Unsupervised Voice-Face Association Learning,ICMR,2022"

Primary LanguagePython

Code for Self-Lifting: A Novel Framework For Unsupervised Voice-Face Association Learning,ICMR,2022




Download file from Baidu Disk (code:9d0a) or GoogleDrive and unzip it to the project root. The dataset folder structure is shown below:

└── voxceleb
    ├── cluster
    │   ├── movie2jpg_path.pkl
    │   ├── movie2wav_path.pkl
    │   └── train_movie_list.pkl
    ├── eval
    │   ├── test_matching_10.pkl
    │   ├── test_matching_g.pkl
    │   ├── test_matching.pkl
    │   ├── test_retrieval.pkl
    │   ├── test_verification.pkl
    │   ├── test_verification_g.pkl
    │   └── valid_verification.pkl
    ├── face_input.pkl
    └── voice_input.pkl


1. Train Self-Lifting Framework:

python sl.py

2. Train a baseline:

python baseline/1_ccae.py

python baseline/2_deepcluster.py

python baseline/3_barlow.py

use wandb to view the training process:

  1. Create wb_config.json file in the ./configs folder, using the following content:

      "WB_KEY": "Your wandb auth key"
  2. add --dryrun=False to the training command, for example: python sl.py --dryrun=False

Model Checkpoints

You can get the final model checkpoints at here (code:4ae6).

Backbone Models

The Inception-V1 model is based on facenet_pytorch.

The ECAPA-TDNN model is based on SpeechBrain. While this model is trained with Vox1+Vox2, thus we retrained one only with Vox2. The checkpoint can be found here.

We also offer demo scripts for extracting the embeddings in scripts/.