dluvizon/scene-aware-3d-multi-human

IndexError: tensors used as indices must be long, byte or bool tensors

Opened this issue · 1 comments

Hi.
when I

python -m mhmocap.predict_mupots --configs_yml D:\Seesea\human\sa-3d-mh\configs\predict_mupots.yml --ts_id 1 --num_iter 100 --output_path D:\Seesea\human\sa-3d-mh\data\mupots-3d-eval\TS1\output\result_output

i got this error:

Info: writing output to D:\Seesea\human\sa-3d-mh\data\mupots-3d-eval\TS1\output\result_output\TS1
DEBUG:: joint_confidence_thr>> 0.5
DEBUG:: erode_segmentation_iters>> 0
DEBUG:: erode_backmask_iters>> 5
DEBUG:: renormalize_depth>> True
DEBUG:: post_process_depth>> True
DEBUG:: H3DHCustomSequenceData
DEBUG:: erode_segmentation_iters 0
DEBUG:: erode_backmask_iters 5
DEBUG:: use_hrnet_pose False
DEBUG:: joint_coef_thr 0.5
DEBUG:: max_num_people None
Images_path: ./data/mupots-3d-eval/TS1\images
Image data: (201, 256, 256, 3) 0 255
Depth data: (201, 256, 256) 0.0 1.0
Segmentation data: (201, 256, 256) 0 4
Background mask data: (201, 256, 256) 0 1
ROMP predictions: 201 dict_keys(['cam', 'poses', 'betas'])
Found 201 images with predictions from AlphaPose with idx: [1, 2, 3, 4, 5, 6]
AlphaPose:: found max 4 predictions per frame from AlphaPose!
AlphaPose data: (201, 4, 17, 3)
DEBUG:: pvis [1. 1. 1. 0.] threshold is 0.125
ROMP predictions (final): 201 dict_keys(['cam', 'poses', 'betas', 'valid'])
Filtering 2D poses with One-Euro filter.
DEBUG:: H3DHCustomSequenceData: using cam: {'K': array([[1.8784500e+02, 0.0000000e+00, 1.2877887e+02],
[0.0000000e+00, 1.8923062e+02, 1.2986162e+02],
[0.0000000e+00, 0.0000000e+00, 1.2500000e-01]], dtype=float32), 'fov': 68.54204142245207, 'Kd': None, 'image_size': (256, 256)}
0%| | 0/201 [00:00<?, ?it/s]
0%| | 0/100 [00:09<?, ?it/s]
Traceback (most recent call last):
File "D:\Anaconda3\lib\runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "D:\Anaconda3\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "D:\Seesea\human\sa-3d-mh\mhmocap\predict_mupots.py", line 100, in
log = predictor.run()
File "D:\Seesea\human\sa-3d-mh\mhmocap\predict.py", line 343, in run
log = self.optim_smpl.fit(self.dataloader, num_iter=self.num_iter, verbose=True)
File "D:\Seesea\human\sa-3d-mh\mhmocap\optimizer.py", line 401, in fit
idx_var = self.__eval_batch_optimized_variables(idx_data['idxs'])
File "D:\Seesea\human\sa-3d-mh\mhmocap\optimizer.py", line 683, in __eval_batch_optimized_variables
idx_var['min_z'] = softplus(self.zmin_lin[idxs])
IndexError: tensors used as indices must be long, byte or bool tensors

Then I try to change the code in optimizer.py,like this

def __eval_batch_optimized_variables(self, idxs):
    idx_var = {} # Indexed variables
    batch_size = idxs.shape[0]
    idx_var['scale_factor'] = torch.pow(1.1, self.xscale_factor)

    #修改idx_var['min_z'] = softplus(self.zmin_lin[idxs])
    idx_var['min_z'] = softplus(self.zmin_lin[idxs.long()])


    idx_var['max_z'] = (
        idx_var['min_z'].detach().clone()
        + self.min_delta_z
        #修改+ softplus(self.zmax_lin[idxs])
        + softplus(self.zmax_lin[idxs.long()])
     ) # (batch, 1, 1)

    #修改idx_var['poses_smpl'] = self.poses_smpl[idxs].view(-1, 72)
    idx_var['poses_smpl'] = self.poses_smpl[idxs.long()].view(-1, 72)
    idx_var['betas_smpl'] = self.betas_smpl.tile((batch_size, 1, 1)).view(-1, 10)
    #修改idx_var['valid_smpl'] = self.valid_smpl[idxs]
    idx_var['valid_smpl'] = self.valid_smpl[idxs.long()]

    results = self.SMPLPY(betas=idx_var['betas_smpl'], poses=idx_var['poses_smpl'])
    verts = results['verts'].view(batch_size, self.num_people, -1, 3)
    joints_smpl24 = results[self.smpl_sparse_joints_key].view(batch_size, self.num_people, -1, 3)

    idx_var['poses_smpl'] = idx_var['poses_smpl'].view(batch_size, self.num_people, 72)
    idx_var['betas_smpl'] = idx_var['betas_smpl'].view(batch_size, self.num_people, 10)
    #修改idx_var['poses_T'] = self.poses_T[idxs]
    idx_var['poses_T'] = self.poses_T[idxs.long()]

    idx_var['verts_smpl_abs'] = idx_var['scale_factor'] * verts + idx_var['poses_T'] # (batch, N, V, 3)
    idx_var['joints_smpl_abs'] = idx_var['scale_factor'] * joints_smpl24 + idx_var['poses_T'] # (batch, N, J, 3)

    #修改idx_var['intrinsics'] = torch.tile(self.cam_intrinsics[idxs], (1, self.num_people, 1, 1)).view(-1, 3, 3)
    idx_var['intrinsics'] = torch.tile(self.cam_intrinsics[idxs.long()], (1, self.num_people, 1, 1)).view(-1, 3, 3)

after that, i got this

DEBUG:: H3DHCustomSequenceData
DEBUG:: erode_segmentation_iters 0
DEBUG:: erode_backmask_iters 5
DEBUG:: use_hrnet_pose False
DEBUG:: joint_coef_thr 0.5
DEBUG:: max_num_people None
Images_path: ./data/mupots-3d-eval/TS1\images
Image data: (201, 256, 256, 3) 0 255
Depth data: (201, 256, 256) 0.0 1.0
Segmentation data: (201, 256, 256) 0 4
Background mask data: (201, 256, 256) 0 1
ROMP predictions: 201 dict_keys(['cam', 'poses', 'betas'])
Found 201 images with predictions from AlphaPose with idx: [1, 2, 3, 4, 5, 6]
AlphaPose:: found max 4 predictions per frame from AlphaPose!
AlphaPose data: (201, 4, 17, 3)
DEBUG:: pvis [1. 1. 1. 0.] threshold is 0.125
ROMP predictions (final): 201 dict_keys(['cam', 'poses', 'betas', 'valid'])
Filtering 2D poses with One-Euro filter.
DEBUG:: H3DHCustomSequenceData: using cam: {'K': array([[1.8784500e+02, 0.0000000e+00, 1.2877887e+02],
[0.0000000e+00, 1.8923062e+02, 1.2986162e+02],
[0.0000000e+00, 0.0000000e+00, 1.2500000e-01]], dtype=float32), 'fov': 68.54204142245207, 'Kd': None, 'image_size': (256, 256)}
WARNING: Variable number of images in the batches. len(dataset)=201, batch_size=10
0%| | 0/201 [00:00<?, ?it/s]
100%|████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [43:21<00:00, 26.01s/it]
D:\Seesea\human\sa-3d-mh\mhmocap\predict.py:157: RuntimeWarning: divide by zero encountered in log
axs.plot(np.log(loss_depth), c='b', label='Depth loss')
D:\Seesea\human\sa-3d-mh\mhmocap\predict.py:159: RuntimeWarning: divide by zero encountered in log
axs.plot(np.log(reg_vel), c='darkorange', label='Reg. 3D Pose Velocity')
D:\Seesea\human\sa-3d-mh\mhmocap\predict.py:160: RuntimeWarning: divide by zero encountered in log
axs.plot(np.log(reg_filter_verts), c='darkgreen', label='Reg. 3D Vert. Smooth')
D:\Seesea\human\sa-3d-mh\mhmocap\predict.py:161: RuntimeWarning: divide by zero encountered in log
axs.plot(np.log(reg_ref_poses), c='m', label='Reg. Ref. Poses')
D:\Seesea\human\sa-3d-mh\mhmocap\predict.py:162: RuntimeWarning: divide by zero encountered in log
axs.plot(np.log(reg_scale), c='y', label='Reg. Scale')
D:\Seesea\human\sa-3d-mh\mhmocap\predict.py:163: RuntimeWarning: divide by zero encountered in log
axs.plot(np.log(reg_contact), c='k', label='Reg. Contact')
D:\Seesea\human\sa-3d-mh\mhmocap\predict.py:164: RuntimeWarning: divide by zero encountered in log
axs.plot(np.log(reg_foot_sliding), c='gold', label='Reg. Food Slid.')

DEBUG:: >> stage1_optvar:
>> scale_factor: [[[nan]]

[[nan]]

[[nan]]]
>> scene_depth_min / scene_depth_max: [[0.6931472]] [[1.3862944]]
100%|████████████████████████████████████████████████████████████████████████████████████████████████| 201/201 [00:40<00:00, 4.94it/s]

Then I try to vis the result
python -m mhmocap.visualization --input_path data\mupots-3d-eval\TS1\output\result_output\TS1 --output_path data\mupots-3d-eval\TS1\output\vis_output

I got this
图片1

No human, only scene.

I tried to reinstall it, but the results didn't change. I don't know what the problem is.

Hi @Seeseallllll ,
Have you checked the values in idxs before casting it to long?
This first error that you mentioned is not happening in my side (and it should not).

It also seems that you are running on Windows. I only tried it on Linux machines.
Here are the specs where I tested the code: https://github.com/dluvizon/scene-aware-3d-multi-human#11-hwsw-requirements