Inaccuracy in Joint Annotations of Ev2Hands-R Dataset
pmkalshetti opened this issue · 5 comments
Hi @Chris10M,
Thank you for the great work and for making the code available.
To get an insight into the Ev2Hands-R data, I plotted the ground-truth joint annotations on the event frames (plot_data.py
). I observed that for some frames, the joint annotations used to train the Ev2Hands model are not accurate; the projected 2D skeleton does not coincide accurately with the event cloud, as seen in the attached images (column 1: input RGB image, column 2: input event frame, column 3: input event frame + ground-truth 2D joint skeleton). The fingers are straight in the input RGB and event frames, but the annotated skeleton appears to have bent fingers. This would have adversely affected the network training, leading to suboptimal accuracy.
Is this inaccuracy in the ground-truth joint annotations due to the motion tracking system (Captury)? Or could this be due to some misalignment in the synchronization of the event and RGB streams?
Hi,
Thanks for trying out the dataset. The bent fingers are an issue with the tracking, as I did not use a proper skeleton size for the hand. This led the Captury software to fit the hand-keypoints with a default bone length (longer). Hence the fingers appear bent.
To resolve this problem, we have created MANO parameters that fit very close to the GT. Below is the 3D overlay of the MANO on the Event and RGB streams.
Best,
Christen
rgb_render.mp4
event_render.mp4
Oh yes, I tried plotting the joints obtained from the data provided in MANO parameters; they fit much better to the input frames compared to the joints in the .pickle files in the Ev2Hands-R dataset (obtained from Captury). Thank you for your help :)
Concerning this new information, I have a couple of questions.
-
How did you estimate these MANO parameters? Could you share the corresponding code?
-
According to the code, the joints used as targets in the code for finetuning and evaluation correspond to the ones obtained by Captury and not the ones corresponding to these MANO parameters. This is based on the following observations:
- The loss uses
forward_non_mano_data
(losses.py) asmano_gt
is set to 0 (ev2hands_r.py) - The dataset uses the pickle file provided in the Ev2Hands-R dataset (evaluation_stream.py), which appears to contain the joints obtained from Captury.
- The loss uses
Please correct me if I am missing something.
--
Pratik
Hi,
Answering to your question,
- The following procedure is used for estimating the MANO parameters,
- Computing 2D keypoints from the multi-view camera images.
- Triangluate the 2D keypoints and obtain the 3D keypoints.
- Use the 3D keypoints to perform IK and obtain the MANO parameters.
I am not planning to release the code for this procedure but it is very similar to EasyMocap
- Unfortunately, I found this issue after doing the evaluations. So, the metrics reported in the paper are evaluated with the Captury joints.
We also tentatively plan to release more data with more participants (with better joint fitting) and the camera extrinsics for the event + RGB camera.
Best,
Christen
Your answers help clear up my doubts about the dataset. Thank you!
The camera extrinsic between the Event and RGB cameras, along with the RGB camera intrinsic, would be very useful. I look forward to this release.