training configuration and training problems

Question

training configuration and training problems

Dxk0103 opened this issue 2 years ago · 4 comments

We tested your last uploaded checkpoint and found that your latest uploaded checkpoint has better performance by comparison. You used eikonal_loss and man_loss on the new training code, what are the weights of these two loses?
In the new model/load_data, why do you have to use the original amass_data, if you set flip=false in the parameter, the format of amass_data is not streaming and can't be input to the network directly, when flip is true, the amass_data is not used.
Besides, when training the model (your original code), you often need to initialize the model parameters several times to avoid the problem of gradient disappearance, have you ever encountered this kind of problem?
Hope to get your reply soon!

Answer 1 · 2023-02-24T12:22:51.000Z

We have used following values for the weight: 0.5,0.5 for eikonal and distance and 0.1 for manifold loss

   if self.flip:
     amass_poses, _  = quat_flip(amass_poses)

The input should be amass_poses(It was a typo). We have updated this.

We have gradient disappearance wrt input, i.e. in the eikonal term.

Answer 2 · 2023-02-24T12:29:27.000Z

the amass_data is a batchnum_points69 tensor, can not reshape to (-1,21,4)

Answer 3 · 2023-02-24T12:33:09.000Z

We run a script(offline) to convert axis angle poses to quaternions and pick 21 joints.
pose_seq = np.load(os.path.join(ds_dir, seq))['pose_body'][:, :63]
pose_seq = torch.from_numpy(pose_seq.reshape(len(pose_seq), 21, 3))
pose_seq = axis_angle_to_quaternion(pose_seq).detach().numpy()
print('done for....{}, pose_shape...{}'.format(seq, len(pose_seq)))
np.savez(os.path.join(outds_dir, seq), pose=pose_seq)

Answer 4 · 2023-02-26T06:27:36.000Z

Thank you! The above problem has been solved!
I am sorry to bother you again, but in chapter 4.1 of your thesis, you mentioned the use of multi-stage training method, but the scripts of load_data and create_data do not show non-manifold poses with different distances, can you please explain the training process? Because we originally trained your network with dist loss, but we could not reach the metrics of your newly uploaded checkpoint.
Also, can you share the download link of the SMPL model? When I use the model downloaded from https://smpl.is.tue.mpg.de, I get the error of SMPL model in experriments/motion_denoisy.py: pose_offsets = torch.matmul(pose_feature, posedirs) .view(batch_size, -1, 3), where posedirs has the wrong dimension.
Your help would be appreciated!
Best Wishes!