I think there is something wrong with steps in softargmax

    heatmaps = heatmaps.reshape((heatmaps.shape[0], num_joints, z_dim, y_dim, x_dim))

    accu_x = heatmaps.sum(dim=2)
    accu_x = accu_x.sum(dim=2)
    accu_y = heatmaps.sum(dim=2)
    accu_y = accu_y.sum(dim=3)
    accu_z = heatmaps.sum(dim=3)
    accu_z = accu_z.sum(dim=3)

    accu_x = accu_x * torch.cuda.comm.broadcast(torch.arange(x_dim), devices=[accu_x.device.index])[0]
    accu_y = accu_y * torch.cuda.comm.broadcast(torch.arange(y_dim), devices=[accu_y.device.index])[0]
    accu_z = accu_z * torch.cuda.comm.broadcast(torch.arange(z_dim), devices=[accu_z.device.index])[0]

    accu_x = accu_x.sum(dim=2, keepdim=True)
    accu_y = accu_y.sum(dim=2, keepdim=True)
    accu_z = accu_z.sum(dim=2, keepdim=True)

    x = x / float(hm_width) - 0.5
    y = y / float(hm_height) - 0.5
    z = z / float(hm_depth) - 0.5

as your code showed above, the torch.arange(x_dim) results in [0, 1, 2, 3, ..., x_xim-1]. The max num of sum of accu_x is x_dim-1 as well as accu_y and accu_z.
so I think it is invalid to do x / float(hm_width) - 0.5,
you should change x = x / float(hm_width) - 0.5 to x = x / float(hm_width-1) - 0.5

@bucktoothsir Could you be more specific?

@lck1201 check my updates

Yes, you are correct. We will fix the problem in the future. However, currently we generate ground truth heatmap in consistent way, see function

integral-human-pose/pytorch_projects/common_pytorch/loss/integral.py

Line 86 in 536a7b3

    
           def generate_joint_location_label(config, patch_width, patch_height, joints, joints_vis):

So there would not be big problems.

@lck1201 Hi~ . Whether it may be better to modify these three lines of code as shown below?

accu_x = accu_x * torch.cuda.comm.broadcast(torch.arange(1, x_dim+1).type(torch.cuda.FloatTensor), devices=[accu_x.device.index])[0]

accu_y = accu_y * torch.cuda.comm.broadcast(torch.arange(1, y_dim+1).type(torch.cuda.FloatTensor), devices=[accu_y.device.index])[0]

accu_z = accu_z * torch.cuda.comm.broadcast(torch.arange(1, z_dim+1).type(torch.cuda.FloatTensor), devices=[accu_z.device.index])[0]

@fabro66 Hi, in essence, range shoule be [0, dim]. But I don't know whether performance will differ if changed to torch.arange(1, x_dim+1) or x = x / float(hm_width-1) - 0.5. Maybe you can give a try.