Displaying poses_3D question
adammpolak opened this issue · 4 comments
@Daniil-Osokin thank you again for this great work!
Question regarding demo.py line 101
poses_3d, poses_2d = parse_poses(inference_result, input_scale, stride, fx, is_video)
edges = []
if len(poses_3d):
poses_3d = rotate_poses(poses_3d, R, t)
poses_3d_copy = poses_3d.copy()
x = poses_3d_copy[:, 0::4]
y = poses_3d_copy[:, 1::4]
z = poses_3d_copy[:, 2::4]
poses_3d[:, 0::4], poses_3d[:, 1::4], poses_3d[:, 2::4] = -z, x, -y
poses_3d = poses_3d.reshape(poses_3d.shape[0], 19, -1)[:, :, 0:3]
edges = (Plotter3d.SKELETON_EDGES + 19 * np.arange(poses_3d.shape[0]).reshape((-1, 1, 1))).reshape((-1, 2))
plotter.plot(canvas_3d, poses_3d, edges)
I understand poses_3d = rotate_poses(poses_3d, R, t)
is used if extrinsics are provided (so the world coordinates would update).
What is going on with:
y = poses_3d_copy[:, 1::4]
z = poses_3d_copy[:, 2::4]
poses_3d[:, 0::4], poses_3d[:, 1::4], poses_3d[:, 2::4] = -z, x, -y
I can't wrap my head around what is happening to the poses_3d here and for what reason? It seems like poses_3d_copy
is never used by demo.py, so what is it for?
Why at the end is poses_3d equal to -z, x, -y
?
If it helps I printed the output of the transformations to poses_3d
:
[[ -83.382454 -131.38255 61.1158 0.81301624 -99.85116
-147.36206 54.76044 0.8590477 -79.05287 -81.69709
65.751114 -1. -86.26206 -129.5979 47.067123
0.8386035 -79.0436 -105.39628 35.807697 0.7910373
-77.33806 -83.324005 34.038242 0.8767383 -75.01585
-78.021454 54.871674 0.62968177 -74.41736 -42.293255
53.6231 -1. -65.61671 -8.406743 55.606236
-1. -89.153206 -133.53835 74.301636 0.7498561
-96.38407 -119.759384 96.08518 0.6998326 -107.42982
-126.904724 92.74384 0.59725463 -82.1068 -81.90032
74.50095 0.6391789 -79.82081 -45.627525 77.24726
-1. -70.16078 -11.660636 79.59954 -1.
-98.39009 -148.4475 53.131226 0.8671008 -87.54239
-145.86267 51.475685 0.7745691 -100.569756 -151.13942
58.977722 0.8082759 -95.438286 -148.54828 63.06624
0.71000135]]
poses_3d after -z, x, -y (line 107)
[[ -61.1158 -83.382454 131.38255 0.81301624 -54.76044
-99.85116 147.36206 0.8590477 -65.751114 -79.05287
81.69709 -1. -47.067123 -86.26206 129.5979
0.8386035 -35.807697 -79.0436 105.39628 0.7910373
-34.038242 -77.33806 83.324005 0.8767383 -54.871674
-75.01585 78.021454 0.62968177 -53.6231 -74.41736
42.293255 -1. -55.606236 -65.61671 8.406743
-1. -74.301636 -89.153206 133.53835 0.7498561
-96.08518 -96.38407 119.759384 0.6998326 -92.74384
-107.42982 126.904724 0.59725463 -74.50095 -82.1068
81.90032 0.6391789 -77.24726 -79.82081 45.627525
-1. -79.59954 -70.16078 11.660636 -1.
-53.131226 -98.39009 148.4475 0.8671008 -51.475685
-87.54239 145.86267 0.7745691 -58.977722 -100.569756
151.13942 0.8082759 -63.06624 -95.438286 148.54828
0.71000135]]
poses_3d after reshape (line 109)
[[[ -61.1158 -83.382454 131.38255 ]
[ -54.76044 -99.85116 147.36206 ]
[ -65.751114 -79.05287 81.69709 ]
[ -47.067123 -86.26206 129.5979 ]
[ -35.807697 -79.0436 105.39628 ]
[ -34.038242 -77.33806 83.324005]
[ -54.871674 -75.01585 78.021454]
[ -53.6231 -74.41736 42.293255]
[ -55.606236 -65.61671 8.406743]
[ -74.301636 -89.153206 133.53835 ]
[ -96.08518 -96.38407 119.759384]
[ -92.74384 -107.42982 126.904724]
[ -74.50095 -82.1068 81.90032 ]
[ -77.24726 -79.82081 45.627525]
[ -79.59954 -70.16078 11.660636]
[ -53.131226 -98.39009 148.4475 ]
[ -51.475685 -87.54239 145.86267 ]
[ -58.977722 -100.569756 151.13942 ]
[ -63.06624 -95.438286 148.54828 ]]]
It seems like the final transform at line 109 gets the joints in (x, y, z)
Or is the final transform in a different format? (-z, x, -y)?
Also, it seems like that final transformation has the origin relative to the detected body, rather than the camera position. How do I get the coordinates to be in camera space?
Hi! rotate_poses
transforms coordinates from camera space to world space (so poses_3d
from parse_poses
is in the camera space). The next axes swapping is used to match the 3D visualizer code, it looks like a legacy extra transform, which may be refactored.
Hope, it is clear now.
Thank you!