JialeTao/MTIA

can not download checkpoints

Closed this issue · 13 comments

I have no idea how to download the pretrained checkpoints. Could you please provide a baidudisk link? Thanks a lot.

Hi @yanerzidefanbaba , using git lfs to clone the project may helps. The baidudisk link of the checkpoints is here:
https://pan.baidu.com/s/1Zlr309OcsDuz5FaULQJYWQ passwd:qp6v

thanks for your reply and your exciting work. But the model I trained performed much worse than the one you provided. I think this may be caused by the loss weights. In original vox.yaml, generator_gan,discriminator_gan,feature_matching,bg_fg_mask and fg_mask_concentration were all set to 0. Could you please provide the correct value of these parameters?

The provided config file is the correct setting. Could you provide some training details of your experiments, such as log of losses?

Thanks a ton. My perceptual loss is 115, equivariance_value loss is 0.2249 and equivariance_jacobian is 0.4228. Are these far from your results?

Yes, the result is abnormal. After training the last epoch on the voxceleb dataset, the perceptual loss is usually around 80 and the equivariance loss is around 0.1. Currentlty I'm not sure of the reason. What's your training enviroments and did you make some custom changes to the code?

I decide to train it again cause I am not very sure whether I have changed the training environments. But I remember I didn't change anything except my data are in .mp4 format. Does that affect the outcome of the results?

That's OK. No, it doesn't affect the result but the training speed because of the io limit of the mp4 format. When you change the way to read data, make sure that the image/video is normalized to [0,1].

Thanks for your tips, I will try it again

is it possible to put the checkpoints somewhere else, I'm having an impossible time downloading them through baidu (or git)

大佬,我还是复现不出来,perceptual还是在110左右开始收敛,equivariance_value在0.22,epoch在70左右,repeat num=2 我的配置文件如下
dataset_params:
root_dir: /run/media/root/2/vox
frame_shape: [256, 256, 3]
id_sampling: True
pairs_list:
augmentation_params:
flip_param:
horizontal_flip: True
time_flip: True
jitter_param:
brightness: 0.1
contrast: 0.1
saturation: 0.1
hue: 0.1

model_params:
use_bg_predictor: False
common_params:
num_kp: 10
num_channels: 3
estimate_jacobian: True
generator_params:
block_expansion: 64
max_features: 512
num_down_blocks: 2
num_bottleneck_blocks: 6
estimate_occlusion_map: True
skips: True
dense_motion_params:
block_expansion: 64
max_features: 1024
num_blocks: 5
scale_factor: 0.25
discriminator_params:
scales: [1]
block_expansion: 32
max_features: 512
num_blocks: 4
sn: True

train_params:
num_epochs: 200
num_repeats: 1
epoch_milestones: [60, 90]
lr_generator: 2.0e-4
lr_discriminator: 2.0e-4
lr_kp_detector: 2.0e-4
lr_bg_predictor: 2.0e-4
batch_size: 10
scales: [1, 0.5, 0.25, 0.125]
clip_generator_grad: False
clip_kp_detector_grad: True
clip: 1
checkpoint_freq: 5
transform_params:
sigma_affine: 0.05
sigma_tps: 0.005
points_tps: 5
loss_weights:
generator_gan: 0
discriminator_gan: 0
feature_matching: [0, 0, 0, 0]
perceptual: [10, 10, 10, 10, 10]
equivariance_value: 10
equivariance_jacobian: 10
bg_fg_mask: 0
fg_mask_concentration: 0

reconstruction_params:
num_videos: 1000
format: '.mp4'

animate_params:
num_pairs: 50
format: '.mp4'
normalization_params:
adapt_movement_scale: False
use_relative_movement: True
use_relative_jacobian: True

visualizer_params:
kp_size: 5
draw_border: True
colormap: 'gist_rainbow'

MODEL:

default

TAG_PER_JOINT: True
HIDDEN_HEATMAP_DIM: -1
MULTI_TRANSFORMER_DEPTH: [12, 12]
MULTI_TRANSFORMER_HEADS: [16, 16]
MULTI_DIM: [48, 48]
NUM_BRANCHES: 1
BASE_CHANNEL: 32

default

ESTIMATE_JACOBIAN: True
TEMPERATURE: 0.1
DATA_PREPROCESS: False
FIX_IMG2MOTION_ATTENTION: False

INIT_WEIGHTS: False
NAME: pose_tokenpose_b
NUM_JOINTS: 10
PRETRAINED: ''
TARGET_TYPE: gaussian
TRANSFORMER_DEPTH: 12
TRANSFORMER_HEADS: 8
TRANSFORMER_MLP_RATIO: 3
POS_EMBEDDING_TYPE: 'sine-full'
INIT: true
DIM: 192 # 443
PATCH_SIZE:

  • 4
  • 4
    IMAGE_SIZE:
  • 256
  • 256
    HEATMAP_SIZE:
  • 64
  • 64
    SIGMA: 2
    EXTRA:
    PRETRAINED_LAYERS:
    • 'conv1'
    • 'bn1'
    • 'conv2'
    • 'bn2'
    • 'layer1'
    • 'transition1'
    • 'stage2'
    • 'transition2'
    • 'stage3'
      FINAL_CONV_KERNEL: 1
      STAGE2:
      NUM_MODULES: 1
      NUM_BRANCHES: 2
      BLOCK: BASIC
      NUM_BLOCKS:
      • 4
      • 4
        NUM_CHANNELS:
      • 32
      • 64
        FUSE_METHOD: SUM
        STAGE3:
        NUM_MODULES: 4
        NUM_BRANCHES: 3
        BLOCK: BASIC
        NUM_BLOCKS:
      • 4
      • 4
      • 4
        NUM_CHANNELS:
      • 32
      • 64
      • 128
        FUSE_METHOD: SUM
        代码部分除了train.py里面的nn.DataParallelWithCallback因为pytorch版本不一样换了一个和dataset因为是mp4格式所以改动了一下其余的都没改。不知道是哪里的问题 是epoch还不够吗

@yanerzidefanbaba It seems the training is not enough. The "num_repeats" is set to 150 by default, if set to 1, train 60 epochs is even less than train 1 epoch of the default config. Since the leraning rate is decayed by a factor of 10 at 60 and 90 epochs, so the losses may still look converging.