garvita-tiwari/neuralgif

The pretrained ClothSeq models are unusable

bluestyle97 opened this issue · 2 comments

I'm trying to generate meshes with the pretrained clothseq model from this link: https://nextcloud.mpi-klsb.mpg.de/index.php/s/FweAP5Js58Q9tsq?path=%2Fpretrained_models%2Fsingle_shape%2Fclothseq_1. However, I got:

Traceback (most recent call last):
  File "/public/home/xujl1/projects/human-animation/neuralgif/generator.py", line 34, in <module>
    train(opt)
  File "/public/home/xujl1/projects/human-animation/neuralgif/generator.py", line 18, in train
    gen = gen( opt=opt, checkpoint=args.checkpoint, resolution=resolution)
  File "/public/home/xujl1/projects/human-animation/neuralgif/models/generate_shape.py", line 47, in __init__
    self.load_checkpoint_path(checkpoint)
  File "/public/home/xujl1/projects/human-animation/neuralgif/models/generate_shape.py", line 178, in load_checkpoint_path
    self.model_occ.load_state_dict(checkpoint['model_state_occ_dict'])
  File "/public/home/xujl1/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1482, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for CanSDF:
        size mismatch for layers.0.weight: copying a param with shape torch.Size([960, 3075]) from checkpoint, the shape in current model is torch.Size([960, 75]).
(base)

I modified config['model']['CanSDF']['num_parts'] from 24 to 1024, then I got:

Traceback (most recent call last):
  File "/public/home/xujl1/projects/human-animation/neuralgif/generator.py", line 34, in <module>
    train(opt)
  File "/public/home/xujl1/projects/human-animation/neuralgif/generator.py", line 18, in train
    gen = gen( opt=opt, checkpoint=args.checkpoint, resolution=resolution)
  File "/public/home/xujl1/projects/human-animation/neuralgif/models/generate_shape.py", line 47, in __init__
    self.load_checkpoint_path(checkpoint)
  File "/public/home/xujl1/projects/human-animation/neuralgif/models/generate_shape.py", line 181, in load_checkpoint_path
    self.model_wgt.load_state_dict(checkpoint['model_state_wgt_dict'])
  File "/public/home/xujl1/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1482, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for WeightPred:
        size mismatch for layers.0.weight: copying a param with shape torch.Size([960, 837]) from checkpoint, the shape in current model is torch.Size([960, 27]).

Then I modified config['model']['WeightPred']['num_parts'] from 24 to 834, then I got:

File "/public/home/xujl1/projects/human-animation/neuralgif/generation_iterator.py", line 41, in gen_iterator
    logits, min, max,can_pt = gen.generate_mesh(data)
  File "/public/home/xujl1/projects/human-animation/neuralgif/models/generate_shape.py", line 109, in generate_mesh
    weight_pred = self.model_wgt(pointsf, body_enc_feat, pose_in)
  File "/public/home/xujl1/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/public/home/xujl1/projects/human-animation/neuralgif/models/network/net_modules.py", line 68, in forward
    x_net = self.actvn(self.layers[i](x_net))
  File "/public/home/xujl1/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/public/home/xujl1/anaconda3/lib/python3.9/site-packages/torch/nn/modules/linear.py", line 103, in forward
    return F.linear(input, self.weight, self.bias)
  File "/public/home/xujl1/anaconda3/lib/python3.9/site-packages/torch/nn/functional.py", line 1848, in linear
    return torch._C._nn.linear(input, weight, bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (1000000x27 and 837x960)

I think the problem is that the clothseq.yaml you provided in this repo is not the same as the configurations you used in your training. Can you fix this problem?

Sorry I used the wrong architecture settings and I have fixed this.

Hi, I still cannot use the pretrained checkpoint to generate human meshes. I follow the steps in prepare_data/clothseq_data.py to generate the test data of 3 ClothSeq sequences, and then run generator.py with the pretrained checkpoints. However, the generated meshes look like this:

JacketPants/000437
image

ShrugsPants/000016
image

I modified the data preprocessing script prepare_data/clothseq_data.py, since there is no provided reg_mesh file and I generate reg_mesh files by forwarding the SMPL body model with each frame's beta, pose and transl parameters. And other code remains unchanged.

Can you give me some advice o how to solve this problem? Thanks a lot!