About the processing of Human3.6M dataset
Opened this issue · 20 comments
Hi, thank you for your great work. I have a question about processing Human3.6M dataset. The question occurs when I vis the joints_2d value of one frame. For example, in
img = cv2.imread(img_paths[0])
temp = draw_skeleton(img, j2ds[0], dataset='spin', unnormalize=False, thickness=2)
cv2.imshow('img', temp)
cv2.waitKey(0)
cv2.destroyAllWindows()
cv2.waitKey(1)
of lib/data_utils/h36m_utils.py
, I saw no point of human body. The calculated joints_img values are all much larger than the actual img size with 1002 * 1000. The same situation as the bbox coordinates. Please tell me if I made something wrong or there are some details I missed. Here I give the points of each stage as below shows,
img_path = ''/s_01_act_02_subact_01_ca_01/s_01_act_02_subact_01_ca_01_000001.jpg''
joint_world = np.array([-91.67900,154.40401,907.26099
39.87789,145.00247,923.98785
-188.47031,14.07711,475.16879
-261.84055,186.55286,61.43892
-223.23566,163.80551,890.53418
-11.67599,160.89919,484.39148
-51.55030,220.14624,35.83440
-132.34781,215.73018,1128.83960
-97.16740,202.34435,1383.14661
-112.97073,127.96946,1477.44568
-120.03289,190.96477,1573.40002
25.89546,192.35947,1296.15710
107.10581,116.05029,1040.50623
129.83810,-48.02492,850.94806
-230.36955,203.17923,1311.96387
-315.40536,164.55284,1049.17468
-350.77136,43.44213,831.34729])
joint_cam = [2010.42700,4087.25537,1292.84644
1886.65796,4075.91113,1245.64563
1928.91736,4507.91309,1333.64587
1977.67346,4957.13770,1379.72803
2134.19580,4098.59961,1340.04712
2031.51624,4481.37549,1537.77136
2157.32617,4915.08838,1489.10266
2078.00024,3878.57568,1212.85498
2046.96667,3628.18262,1163.56909
2033.97595,3521.32764,1219.12964
2068.22290,3438.07544,1147.56335
1928.06787,3718.17041,1139.55640
1816.48694,3959.70459,1223.14160
1724.87122,4117.47461,1396.59180
2167.39746,3691.38574,1229.23682
2222.94946,3938.15894,1346.70288
2201.05029,4128.45068,1510.03296]
joint_img = [2293.13818,4131.44580,1292.84644
2246.83618,4258.04932,1245.64563
2168.68262,4381.59473,1333.64587
2153.83154,4624.87061,1379.72803
2336.17871,4013.76196,1340.04712
2025.24133,3848.66016,1537.77136
2171.42334,4290.73535,1489.10266
2474.36963,4173.13672,1212.85498
2526.92822,4081.93237,1163.56909
2422.92334,3819.14282,1219.12964
2576.23364,3942.19556,1147.56335
2449.90332,4247.40625,1139.55640
2213.05371,4218.24072,1223.14160
1926.74341,3887.58179,1396.59180
2531.49927,3950.21460,1229.23682
2402.62939,3860.20703,1346.70288
2181.58545,3642.56445,1510.03296]
Thank you.
Hi @Ironbrotherstyle ,
sorry for the late reply. I somehow missed the notification.
I checked my codes and I didn't find any problem. In your data, the joint_cam value looks weird.
Did you use our h36m camera annotation?
How can I get your h36m camera annotation? Thank you very much.
Hi @Ironbrotherstyle ,
You can find our preprocessed annotations below
https://drive.google.com/drive/folders/1kgVH-GugrLoc9XyvP6nRoaFpw3TmM5xK
Hi @Ironbrotherstyle ,
You can find our preprocessed annotations below
https://drive.google.com/drive/folders/1kgVH-GugrLoc9XyvP6nRoaFpw3TmM5xK
Sorry to bother you again.
I am sure that I have downloaded your data form the link: link. And I have unzipped the annotation.zip
, smpl_param.zip
. But only to find that I can not produce the result presented in h36m_train_25fps_nosmpl_db.pt
which you gave. Take /s_01_act_02_subact_01_ca_01/s_01_act_02_subact_01_ca_01_000001.jpg
as an example (mentioned in former questions). The joints3D stored in h36m_train_25fps_nosmpl_db.pt
you provided for s_01_act_02_subact_01_ca_01_000001.jpg
is
joints3D =
[[ 0.17673077 0.32104865 -5.2038817 ]
[ 0.17673077 0.32104865 -5.2038817 ]
[ 0.17673077 0.32104865 -5.2038817 ]
[ 0.17673077 0.32104865 -5.2038817 ]
[ 0.17673077 0.32104865 -5.2038817 ]
[ 0.17673077 0.32104865 -5.2038817 ]
[ 0.17673077 0.32104865 -5.2038817 ]
[ 0.17673077 0.32104865 -5.2038817 ]
[ 0.17673077 0.32104865 -5.2038817 ]
[ 0.17673077 0.32104865 -5.2038817 ]
[ 0.17673077 0.32104865 -5.2038817 ]
[ 0.17673077 0.32104865 -5.2038817 ]
[ 0.17673077 0.32104865 -5.2038817 ]
[ 0.17673077 0.32104865 -5.2038817 ]
[ 0.17673077 0.32104865 -5.2038817 ]
[ 0.17673077 0.32104865 -5.2038817 ]
[ 0.17673077 0.32104865 -5.2038817 ]
[ 0.17673077 0.32104865 -5.2038817 ]
[ 0.17673077 0.32104865 -5.2038817 ]
[ 0.17673077 0.32104865 -5.2038817 ]
[ 0.17673077 0.32104865 -5.2038817 ]
[ 0.17673077 0.32104865 -5.2038817 ]
[ 0.17673077 0.32104865 -5.2038817 ]
[ 0.17673077 0.32104865 -5.2038817 ]
[ 0.17673077 0.32104865 -5.2038817 ]
[ 0.1468992 0.82783306 0.19625664]
[ 0.02108921 0.39412037 0.2449255 ]
[ 0.12376885 0.01134411 0.04720068]
[-0.12376909 -0.01134416 -0.04720068]
[-0.08150972 0.42065766 0.04079962]
[-0.03275359 0.86988246 0.08688211]
[ 0.19062322 0.04119563 0.21718645]
[ 0.21252236 -0.14909631 0.05385637]
[ 0.15697043 -0.39586952 -0.06360912]
[-0.08235921 -0.36908498 -0.1532898 ]
[-0.19394009 -0.12755072 -0.06970453]
[-0.28555584 0.03021911 0.10374594]
[ 0.03653957 -0.45907274 -0.12927723]
[ 0.05779592 -0.6491798 -0.14528322]
[ 0. 0. 0. ]
[ 0.17673077 0.32104865 -5.2038817 ]
[ 0.06757315 -0.20867959 -0.07999134]
[ 0.17673077 0.32104865 -5.2038817 ]
[ 0.02354887 -0.5659276 -0.07371664]
[ 0.17673077 0.32104865 -5.2038817 ]
[ 0.17673077 0.32104865 -5.2038817 ]
[ 0.17673077 0.32104865 -5.2038817 ]
[ 0.17673077 0.32104865 -5.2038817 ]
[ 0.17673077 0.32104865 -5.2038817 ]]
The joints2D stored in h36m_train_25fps_nosmpl_db.pt
for s_01_act_02_subact_01_ca_01_000001.jpg
is
[[ 0. 0. 0. ]
[ 0. 0. 0. ]
[ 0. 0. 0. ]
[ 0. 0. 0. ]
[ 0. 0. 0. ]
[ 0. 0. 0. ]
[ 0. 0. 0. ]
[ 0. 0. 0. ]
[ 0. 0. 0. ]
[ 0. 0. 0. ]
[ 0. 0. 0. ]
[ 0. 0. 0. ]
[ 0. 0. 0. ]
[ 0. 0. 0. ]
[ 0. 0. 0. ]
[ 0. 0. 0. ]
[ 0. 0. 0. ]
[ 0. 0. 0. ]
[ 0. 0. 0. ]
[ 0. 0. 0. ]
[ 0. 0. 0. ]
[ 0. 0. 0. ]
[ 0. 0. 0. ]
[ 0. 0. 0. ]
[ 0. 0. 0. ]
[506.216 622.7914 1. ]
[479.83392 530.7903 1. ]
[500.99265 447.99222 1. ]
[445.81503 441.72485 1. ]
[456.16092 537.1746 1. ]
[467.204 634.1008 1. ]
[515.4759 456.40582 1. ]
[520.3363 413.175 1. ]
[508.13968 355.92737 1. ]
[453.8017 359.16052 1. ]
[429.87268 415.51346 1. ]
[412.80936 452.7784 1. ]
[480.90833 339.61743 1. ]
[485.61975 296.0767 1. ]
[473.6541 444.88696 1. ]
[ 0. 0. 0. ]
[488.14777 397.20282 1. ]
[ 0. 0. 0. ]
[478.3514 317.69824 1. ]
[ 0. 0. 0. ]
[ 0. 0. 0. ]
[ 0. 0. 0. ]
[ 0. 0. 0. ]
[ 0. 0. 0. ]]
All the joints2D and joints3D are reasonable. However, the result produced by your annotation.zip
and lib/data_utils/h36m_utils.py
is quite abnormal. Do you know how to solve the problem? Many thanks!
Nothing to be sorry. I appreciate your patience and interest.
The joint_world is correct above. So maybe the camera parameters are the problem.
Camera parameters for Subject 1 should be like below:
{'1': {'R': [[-0.9153617321513369, 0.40180836633680234, 0.02574754463350265], [0.051548117060134555, 0.1803735689384521, -0.9822464900705729], [-0.399319034032262, -0.8977836111057917, -0.185819527201491]], 't': [-346.05078140028075, 546.9807793144001, 5474.481087434061], 'f': [1145.04940458804, 1143.78109572365], 'c': [512.541504956548, 515.4514869776]}, '2': {'R': [[0.9281683400814921, 0.3721538354721445, 0.002248380248018696], [0.08166409428175585, -0.1977722953267526, -0.976840363061605], [-0.3630902204349604, 0.9068559102440475, -0.21395758897485287]], 't': [251.42516271750836, 420.9422103702068, 5588.195881837821], 'f': [1149.67569986785, 1147.59161666764], 'c': [508.848621645943, 508.064917088557]}, '3': {'R': [[-0.9141549520542256, -0.40277802228118775, -0.045722952682337906], [-0.04562341383935874, 0.21430849526487267, -0.9756999400261069], [0.4027893093720077, -0.889854894701693, -0.214287280609606]], 't': [480.482559565337, 253.83237471361554, 5704.207679370455], 'f': [1149.14071676148, 1148.7989685676], 'c': [519.815837182153, 501.402658888552]}, '4': {'R': [[0.9141562410494211, -0.40060705854636447, 0.061905989962380774], [-0.05641000739510571, -0.2769531972942539, -0.9592261660183036], [0.40141783470104664, 0.8733904688919611, -0.2757767409202658]], 't': [51.88347637559197, 378.4208425426766, 4406.149140878431], 'f': [1145.51133842318, 1144.77392807652], 'c': [514.968197319863, 501.882018537695]}}
Are they the same with yours?
and did you use this function for world-to-camera transformation?
def world2cam(world_coord, R, t):
cam_coord = np.dot(R, world_coord.transpose(1,0)).transpose(1,0) + t.reshape(1,3)
return cam_coord
ey the same with yours?
Thank you for your reply. Yes, them are the same, my camera paras are,
{'1': {'R': [[-0.9153617321513369, 0.40180836633680234, 0.02574754463350265], [0.051548117060134555, 0.1803735689384521, -0.9822464900705729], [-0.399319034032262, -0.8977836111057917, -0.185819527201491]], 't': [1841.10702774543, 4955.28462344526, 1563.4453958977], 'f': [1145.04940458804, 1143.78109572365], 'c': [512.541504956548, 515.4514869776]}, '2': {'R': [[0.9281683400814921, 0.3721538354721445, 0.002248380248018696], [0.08166409428175585, -0.1977722953267526, -0.976840363061605], [-0.3630902204349604, 0.9068559102440475, -0.21395758897485287]], 't': [1761.27853428116, -5078.00659454077, 1606.2649598335], 'f': [1149.67569986785, 1147.59161666764], 'c': [508.848621645943, 508.064917088557]}, '3': {'R': [[-0.9141549520542256, -0.40277802228118775, -0.045722952682337906], [-0.04562341383935874, 0.21430849526487267, -0.9756999400261069], [0.4027893093720077, -0.889854894701693, -0.214287280609606]], 't': [-1846.7776610084, 5215.04650469073, 1491.97246576518], 'f': [1149.14071676148, 1148.7989685676], 'c': [519.815837182153, 501.402658888552]}, '4': {'R': [[0.9141562410494211, -0.40060705854636447, 0.061905989962380774], [-0.05641000739510571, -0.2769531972942539, -0.9592261660183036], [0.40141783470104664, 0.8733904688919611, -0.2757767409202658]], 't': [-1794.78972871109, -3722.69891503676, 1574.89272604599], 'f': [1145.51133842318, 1144.77392807652], 'c': [514.968197319863, 501.882018537695]}}
and the process of obtaining joints_img are borrowed from your code,
joints_world = np.array(joints_3d[str(int(action))][str(int(subaction))][str(index)]) # [-91.67900,154.40401,907.26099 ...
joints_cam = world2cam(joints_world, R, t) # [2010.42700,4087.25537,1292.84644 ...
joints_img = cam2pixel(joints_cam, f, c) # [2293.13818,4131.44580,1292.84644 ...
joints_valid = np.ones((h36m_joint_num, 1))
that is what confused me.
The translation parameters are different from the one I downloaded.
The translation parameters are different from the one I downloaded.
So weird. I re-downloaded your data, and they look the same as yours. Thank you so much.
Hi! I observed that in h36m pre-processing, all the code about loading smpl parameters are commented, so when should we use these?
You can use it. Just uncomment them:)
You can find our preprocessed annotations below
https://drive.google.com/drive/folders/1kgVH-GugrLoc9XyvP6nRoaFpw3TmM5xK
@hongsukchoi Hi, it seems that this link has been invalid. could u please share it again?
The reason why I wonder this preprocessed annotations is because there's something wrong with my 'annotations'. My 'images' structure is:
However, my annotations for Subject 1 are:
It seems that annotations doesn't include cam=2/3/4 but only with ca=1 and the dict number is 1 less than the images size. So I wonder is there any solution with it. Thank you very much.
Hi!
I think my former colleague changed something. Let me check.
And for the second question, which file are you using? Human36M has two evaluation protocols and one protocol only uses the frontal camera data (cam4).
Hi! Thank you for your checking.
I think I need all four camera annotations for my processed data. Could you please provide preprocessed annotations with whole version? It would be a great help to me. Thanks.
Hi! This link is invalid now, could you please share it again?
HI I updated the link
Check out here:
https://github.com/hongsukchoi/Pose2Mesh_RELEASE#data
HI I updated the link
Check out here: https://github.com/hongsukchoi/Pose2Mesh_RELEASE#data
Hello @hongsukchoi,
Could you tell me, how did you generate the smpl parameters for H36m, did you do it by passing the videos/images to smplify-x with camera values?
Or any other method(REPO)
I wish to generate the smpl parameters directly from the 3d groundtruth keypoints, if possible.
Since Human3.6m has 2 sets of 3d groundtruth keypoint CDF files
1. Original coordinate system -- Positions_3D cdf files
2. Transformed coordinate system (Camera Specific) -- Positions_3D_mono cdf files.
Which file should we consider using for smpl parameter generation?
I appreciate your response in advanced.
We used the camera coordinate values.
Check out this repo: https://github.com/mks0601/NeuralAnnot_RELEASE/blob/main/Human3.6M/demo_smplx.py
You will find what you want
@hongsukchoi,
Thanks for your response.
I saw from this issue, you performed Neural Body on Human3.6m :zju3dv/neuralbody#27 (comment)
I am also planning to do the same, but I am unable to understand how to get the accurate smpl parameters in neuralbody format. I tried to use ROMP and VIBE, but the rendering was not accurate at all. Since they use weak_perspective camera, I could not render it.
Could you tell me, how did you actually generate the smpl parameters for Neural Body, I have sorted out the segmentation mask and processed it precisely, but not been able to get the accurate smpl.
I hope to hear from you on this matter.
Thank you
Hi, refer to this code.
import torch.utils.data as data
from lib.utils import base_utils
from PIL import Image
import numpy as np
import json
import glob
import os
import imageio
import cv2
from lib.config import cfg
from lib.utils.if_nerf import if_nerf_data_utils as if_nerf_dutils
from lib.utils.feat_utils import *
from plyfile import PlyData
import os.path as osp
import random
import smplx
from pycocotools.coco import COCO
import torch
class Dataset(data.Dataset):
def __init__(self, split):
super(Dataset, self).__init__()
self.root_path = osp.join('data', 'h36m')
self.img_path = osp.join(self.root_path, 'images')
self.mask_path = osp.join(self.root_path, 'masks')
self.annot_path = osp.join(self.root_path, 'annotations')
self.preprocessed_path = osp.join(self.root_path, 'preprocessed')
self.split = split
self.smpl_layer = smplx.create('./data', 'smpl')
if self.split == 'train':
subject_list = [1, 5, 6, 7, 8]
sampling_ratio = 50
input_cam_idxs = ['1', '2', '3', '4']
render_cam_idxs = ['1', '2', '3', '4']
else:
subject_list = [9, 11]
sampling_ratio = 500
input_cam_idxs = ['1', '2', '3', '4']
render_cam_idxs = ['1', '2', '3', '4']
# aggregate annotations from each subject
db = COCO()
cameras = {}
smpl_params = {}
for subject in subject_list:
# data load
with open(osp.join(self.annot_path, 'Human36M_subject' + str(subject) + '_data.json'), 'r') as f:
annot = json.load(f)
if len(db.dataset) == 0:
for k, v in annot.items():
db.dataset[k] = v
else:
for k, v in annot.items():
db.dataset[k] += v
# camera load
with open(osp.join(self.annot_path, 'Human36M_subject' + str(subject) + '_camera.json'), 'r') as f:
cameras[str(subject)] = json.load(f)
# smpl parameter load
with open(osp.join(self.annot_path, 'Human36M_subject' + str(subject) + '_SMPL_NeuralAnnot.json'), 'r') as f:
smpl_params[str(subject)] = json.load(f)
db.createIndex()
self.cam_info = {}
self.datalist = {}
self.data_idx = []
for aid in db.anns.keys():
ann = db.anns[aid]
image_id = ann['image_id']
img = db.loadImgs(image_id)[0]
img_path = osp.join(self.img_path, img['file_name'])
mask_path = osp.join(self.mask_path, img['file_name'][:-4] + '.png')
img_shape = (img['height'], img['width'])
if not osp.isfile(mask_path):
continue
# check subject and frame_idx
frame_idx = img['frame_idx'];
if frame_idx % sampling_ratio != 0:
continue
# check smpl parameter exist
subject = img['subject'];
action_idx = img['action_idx'];
subaction_idx = img['subaction_idx'];
frame_idx = img['frame_idx'];
cam_idx = img['cam_idx'];
if subject == 11 and action_idx == 2 and subaction_idx == 2:
continue
try:
smpl_param = smpl_params[str(subject)][str(action_idx)][str(subaction_idx)][str(frame_idx)]
except KeyError:
continue
# camera parameter
cam_param = cameras[str(subject)][str(cam_idx)]
R, t, f, c = np.array(cam_param['R'], dtype=np.float32), np.array(cam_param['t'], dtype=np.float32).reshape(3, 1), np.array(cam_param['f'], dtype=np.float32), np.array(cam_param['c'], dtype=np.float32)
K = np.array([[f[0], 0, c[0]], [0, f[1], c[1]], [0, 0, 1]], dtype=np.float32).reshape(3, 3)
# camera
if str(subject) not in self.cam_info:
self.cam_info[str(subject)] = {}
if str(cam_idx) not in self.cam_info[str(subject)]:
self.cam_info[str(subject)][str(cam_idx)] = {'R': R, 't': t, 'K': K}
# path and smpl parameters
if str(subject) not in self.datalist:
self.datalist[str(subject)] = {}
if str(action_idx) not in self.datalist[str(subject)]:
self.datalist[str(subject)][str(action_idx)] = {}
if str(subaction_idx) not in self.datalist[str(subject)][str(action_idx)]:
self.datalist[str(subject)][str(action_idx)][str(subaction_idx)] = {}
if str(frame_idx) not in self.datalist[str(subject)][str(action_idx)][str(subaction_idx)]:
self.datalist[str(subject)][str(action_idx)][str(subaction_idx)][str(frame_idx)] = {'img_path': {}, 'mask_path': {}, 'smpl_param': smpl_param}
seq_name = f's_{subject:02d}_act_{action_idx:02d}_subact_{subaction_idx:02d}'
filename = f'{frame_idx + 1:06d}'
vertex_path = osp.join(self.preprocessed_path, 'vertices', seq_name, filename + '.npy')
vertex_rgb_path = osp.join(self.preprocessed_path, 'vertices_rgb', seq_name, filename + '.npy')
self.datalist[str(subject)][str(action_idx)][str(subaction_idx)][str(frame_idx)]['vertices_path'] = vertex_path
self.datalist[str(subject)][str(action_idx)][str(subaction_idx)][str(frame_idx)]['vertices_rgb_path'] = vertex_rgb_path
if str(cam_idx) not in self.datalist[str(subject)][str(action_idx)][str(subaction_idx)][str(frame_idx)]['img_path']:
self.datalist[str(subject)][str(action_idx)][str(subaction_idx)][str(frame_idx)]['img_path'][str(cam_idx)] = img_path
self.datalist[str(subject)][str(action_idx)][str(subaction_idx)][str(frame_idx)]['mask_path'][str(cam_idx)] = mask_path
if self.split == 'train':
if str(cam_idx) in input_cam_idxs:
valid_render_cam_idxs = []
for render_cam_idx in render_cam_idxs:
render_mask_path = mask_path.replace('ca_0' + str(cam_idx), 'ca_0' + str(render_cam_idx))
if osp.isfile(render_mask_path) and osp.getsize(render_mask_path) > 1500:
valid_render_cam_idxs.append(render_cam_idx)
if len(valid_render_cam_idxs) == 0:
continue
self.data_idx.append(
{'subject': str(subject), 'action_idx': str(action_idx), 'subaction_idx': str(subaction_idx), 'frame_idx': str(frame_idx), 'input_cam_idx': str(cam_idx), 'render_cam_idxs': valid_render_cam_idxs})
else:
if str(cam_idx) in input_cam_idxs:
for render_cam_idx in render_cam_idxs:
render_mask_path = mask_path.replace('ca_0' + str(cam_idx), 'ca_0' + str(render_cam_idx))
if not osp.isfile(render_mask_path) or osp.getsize(render_mask_path) < 1500:
continue
self.data_idx.append(
{'subject': str(subject), 'action_idx': str(action_idx), 'subaction_idx': str(subaction_idx), 'frame_idx': str(frame_idx), 'input_cam_idx': str(cam_idx), 'render_cam_idxs': [render_cam_idx]})
def load_3d_data(self, smpl_param, subject, cam_idx):
pose = torch.FloatTensor(smpl_param['pose']).float().view(1, -1)
shape = torch.FloatTensor(smpl_param['shape']).float().view(1, -1)
trans = torch.FloatTensor(smpl_param['trans']).float().view(1, -1)
output = self.smpl_layer(global_orient=pose[:, :3], body_pose=pose[:, 3:], betas=shape, transl=trans)
xyz = output.vertices[0].detach().numpy()
# obtain the original bounds for point sampling
min_xyz = np.min(xyz, axis=0)
max_xyz = np.max(xyz, axis=0)
min_xyz -= 0.05
max_xyz += 0.05
bounds_world = np.stack([min_xyz, max_xyz], axis=0)
mesh = xyz
joint = np.dot(self.smpl_layer.J_regressor, mesh)
# transform smpl from the world corodinate to the camera coordinate
R_input = np.array(self.cam_info[subject][cam_idx]['R'], dtype=np.float32)
T_input = np.array(self.cam_info[subject][cam_idx]['t'], dtype=np.float32) / 1000.
xyz = np.dot(R_input, xyz.transpose(1, 0)).transpose(1, 0) + T_input.reshape(1, 3)
# obtain the bounds for coord construction
min_xyz = np.min(xyz, axis=0)
max_xyz = np.max(xyz, axis=0)
min_xyz -= 0.05
max_xyz += 0.05
bounds = np.stack([min_xyz, max_xyz], axis=0)
# construct the coordinate
dhw = xyz[:, [2, 1, 0]]
min_dhw = min_xyz[[2, 1, 0]]
max_dhw = max_xyz[[2, 1, 0]]
voxel_size = np.array(cfg.voxel_size)
coord = np.round((dhw - min_dhw) / voxel_size).astype(np.int32)
# construct the output shape
out_sh = np.ceil((max_dhw - min_dhw) / voxel_size).astype(np.int32)
x = 32
out_sh = (out_sh | (x - 1)) + 1
return coord, out_sh, bounds_world, bounds, mesh, joint
def affine_transform(self, img, mask, out_shape):
bbox = cv2.boundingRect(mask.astype(np.uint8)) # x, y, w, h
bbox = process_bbox(bbox, img.shape[1], img.shape[0], out_shape)
trans = get_affine_trans_mat(bbox, out_shape)
img = cv2.warpAffine(img, trans, (int(out_shape[1]), int(out_shape[0])), flags=cv2.INTER_LINEAR)
mask = cv2.warpAffine(mask, trans, (int(out_shape[1]), int(out_shape[0])), flags=cv2.INTER_NEAREST)
img[mask == 0] = 0
return img, trans
def load_mask(self, mask_path, img_shape):
mask_cihp_cropped_resized = imageio.imread(mask_path)
# restore mask to the original image space
height, width, _ = img_shape
mask_cihp = cv2.resize(mask_cihp_cropped_resized, (width, height), interpolation=cv2.INTER_NEAREST)
mask = (mask_cihp != 0).astype(np.uint8)
border = 5
kernel = np.ones((border, border), np.uint8)
mask_erode = cv2.erode(mask.copy(), kernel)
mask_dilate = cv2.dilate(mask.copy(), kernel)
mask[(mask_dilate - mask_erode) == 1] = 100
return mask
def load_2d_input_view(self, img_path, mask_path, subject, cam_idx, mesh):
img = imageio.imread(img_path).astype(np.float32) / 255.
# img = cv2.resize(img, (cfg.mask_shape[1], cfg.mask_shape[0]))
mask = self.load_mask(mask_path, img.shape)
assert img.shape[:2] == mask.shape[:2], print(img.shape, mask.shape)
orig_img_shape = img.shape
K = np.array(self.cam_info[subject][cam_idx]['K'], dtype=np.float32)
R = np.array(self.cam_info[subject][cam_idx]['R'], dtype=np.float32)
T = np.array(self.cam_info[subject][cam_idx]['t'], dtype=np.float32) / 1000.
# affine transform for feature extraction
img, affine_trans_mat = self.affine_transform(img, mask, cfg.input_img_shape)
return img, R, T, K, affine_trans_mat
def load_2d_render_view(self, img_path, mask_path, subject, cam_idx, bounds_world):
img = imageio.imread(img_path).astype(np.float32) / 255.
# img = cv2.resize(img, (cfg.mask_shape[1], cfg.mask_shape[0]))
mask = self.load_mask(mask_path, img.shape)
assert img.shape[:2] == mask.shape[:2], print(img.shape, mask.shape)
orig_img_shape = img.shape
K = np.array(self.cam_info[subject][cam_idx]['K'], dtype=np.float32)
R = np.array(self.cam_info[subject][cam_idx]['R'], dtype=np.float32)
T = np.array(self.cam_info[subject][cam_idx]['t'], dtype=np.float32) / 1000.
H, W = cfg.render_img_shape[0], cfg.render_img_shape[1]
img = cv2.resize(img, (W, H), interpolation=cv2.INTER_LINEAR)
mask = cv2.resize(mask, (W, H), interpolation=cv2.INTER_NEAREST)
img[mask == 0] = 0
K[0] = K[0] / orig_img_shape[1] * cfg.render_img_shape[1]
K[1] = K[1] / orig_img_shape[0] * cfg.render_img_shape[0]
rgb, ray_o, ray_d, near, far, coord_, mask_at_box = if_nerf_dutils.sample_ray_h36m(img, mask, K, R, T, bounds_world, cfg.N_rand, self.split)
return rgb, ray_o, ray_d, near, far, coord_, mask_at_box
def __len__(self):
return len(self.data_idx)
def __getitem__(self, index):
if self.split == 'train':
subject, action_idx, subaction_idx, frame_idx, input_cam_idx, render_cam_idxs = self.data_idx[index]['subject'], self.data_idx[index]['action_idx'], self.data_idx[index]['subaction_idx'], \
self.data_idx[index]['frame_idx'], self.data_idx[index]['input_cam_idx'], self.data_idx[index]['render_cam_idxs']
else:
subject, action_idx, subaction_idx, frame_idx, input_cam_idx, render_cam_idxs = self.data_idx[index]['subject'], self.data_idx[index]['action_idx'], self.data_idx[index]['subaction_idx'], \
self.data_idx[index]['frame_idx'], self.data_idx[index]['input_cam_idx'], self.data_idx[index]['render_cam_idxs']
data = self.datalist[subject][action_idx][subaction_idx][frame_idx]
# load mesh
coord, out_sh, bounds_world, bounds, mesh, joint = self.load_3d_data(data['smpl_param'], subject, input_cam_idx)
# prepare input view data
img, R, T, K, affine = self.load_2d_input_view(data['img_path'][input_cam_idx], data['mask_path'][input_cam_idx], subject, input_cam_idx, mesh)
# prepare render view data
rgb_list, ray_o_list, ray_d_list, near_list, far_list, mask_at_box_list = [], [], [], [], [], []
for cam_idx in render_cam_idxs:
rgb, ray_o, ray_d, near, far, _, mask_at_box = self.load_2d_render_view(data['img_path'][cam_idx], data['mask_path'][cam_idx], subject, cam_idx, bounds_world)
rgb_list.append(rgb);
ray_o_list.append(ray_o);
ray_d_list.append(ray_d);
near_list.append(near);
far_list.append(far);
mask_at_box_list.append(mask_at_box);
rgb, ray_o, ray_d, near, far, mask_at_box = np.concatenate(rgb_list), np.concatenate(ray_o_list), np.concatenate(ray_d_list), np.concatenate(near_list), np.concatenate(far_list), np.concatenate(mask_at_box_list)
"""
# for debug
filename = str(random.randint(1,500))
vis = img.copy() * 255
cv2.imwrite(filename + '.jpg', vis)
_mesh = np.dot(R, mesh.transpose(1,0)).transpose(1,0) + T.reshape(1,3)
x = _mesh[:,0] / _mesh[:,2] * K[0][0] + K[0][2]
y = _mesh[:,1] / _mesh[:,2] * K[1][1] + K[1][2]
xy1 = np.stack((x,y,np.ones_like(x)),1)
xy = np.dot(affine, xy1.transpose(1,0)).transpose(1,0)
vis = img.copy()*255
for v in range(len(xy)):
vis = cv2.circle(vis, (int(xy[v][0]), int(xy[v][1])), 3, (255,0,0) ,-1)
cv2.imwrite(filename + '_mesh.jpg', vis)
"""
# intermediate supervision
verts_rgb = np.load(data['vertices_rgb_path']).astype(np.float32)
verts_mask = np.zeros(6890, dtype=np.float32)
verts_mask[verts_rgb[:, 0] != 0] = 1
verts_mask = verts_mask.astype(bool)
ret = {
'verts_rgb': verts_rgb, 'verts_mask': verts_mask,
'img': img, 'R': R, 'T': T, 'K': K, 'affine': affine, 'coord': coord, 'out_sh': out_sh, 'bounds': bounds, 'mesh': mesh, 'joint': joint, 'rgb': rgb, 'ray_o': ray_o, 'ray_d': ray_d, 'near': near, 'far': far,
'mask_at_box': mask_at_box, 'subject_idx': int(subject), 'action_idx': int(action_idx), 'subaction_idx': int(subaction_idx), 'frame_idx': int(frame_idx), 'input_cam_idx': int(input_cam_idx),
'render_cam_idx': int(render_cam_idxs[0])}
return ret
Thank you for the code.
So you used the Neural_annot.json and the camera.josn file which you referred to me to use in your previous response.
Is there any other data that I need to pass, or any other code I need to run apart from these.
If so, do let me know