Inaccuracy in Joint Annotations of Ev2Hands-R Dataset

Question

Inaccuracy in Joint Annotations of Ev2Hands-R Dataset

pmkalshetti opened this issue 7 months ago · 5 comments

Thank you for the great work and for making the code available.

To get an insight into the Ev2Hands-R data, I plotted the ground-truth joint annotations on the event frames (plot_data.py). I observed that for some frames, the joint annotations used to train the Ev2Hands model are not accurate; the projected 2D skeleton does not coincide accurately with the event cloud, as seen in the attached images (column 1: input RGB image, column 2: input event frame, column 3: input event frame + ground-truth 2D joint skeleton). The fingers are straight in the input RGB and event frames, but the annotated skeleton appears to have bent fingers. This would have adversely affected the network training, leading to suboptimal accuracy.

Is this inaccuracy in the ground-truth joint annotations due to the motion tracking system (Captury)? Or could this be due to some misalignment in the synchronization of the event and RGB streams?

Answer 1 · 2024-05-04T13:49:35.000Z

Hi,

Thanks for trying out the dataset. The bent fingers are an issue with the tracking, as I did not use a proper skeleton size for the hand. This led the Captury software to fit the hand-keypoints with a default bone length (longer). Hence the fingers appear bent.

To resolve this problem, we have created MANO parameters that fit very close to the GT. Below is the 3D overlay of the MANO on the Event and RGB streams.

Best,
Christen

rgb_render.mp4

event_render.mp4

Answer 2 · 2024-05-04T23:58:43.000Z

Oh yes, I tried plotting the joints obtained from the data provided in MANO parameters; they fit much better to the input frames compared to the joints in the .pickle files in the Ev2Hands-R dataset (obtained from Captury). Thank you for your help :)

Concerning this new information, I have a couple of questions.

How did you estimate these MANO parameters? Could you share the corresponding code?
According to the code, the joints used as targets in the code for finetuning and evaluation correspond to the ones obtained by Captury and not the ones corresponding to these MANO parameters. This is based on the following observations:
- The loss uses forward_non_mano_data (losses.py) as mano_gt is set to 0 (ev2hands_r.py)
- The dataset uses the pickle file provided in the Ev2Hands-R dataset (evaluation_stream.py), which appears to contain the joints obtained from Captury.

Please correct me if I am missing something.

--
Pratik

Answer 3 · 2024-05-05T10:07:57.000Z

Hi,

Answering to your question,

The following procedure is used for estimating the MANO parameters,
1. Computing 2D keypoints from the multi-view camera images.
2. Triangluate the 2D keypoints and obtain the 3D keypoints.
3. Use the 3D keypoints to perform IK and obtain the MANO parameters.

I am not planning to release the code for this procedure but it is very similar to EasyMocap

Unfortunately, I found this issue after doing the evaluations. So, the metrics reported in the paper are evaluated with the Captury joints.

We also tentatively plan to release more data with more participants (with better joint fitting) and the camera extrinsics for the event + RGB camera.

Best,
Christen

Answer 4 · 2024-05-06T05:11:34.000Z

Your answers help clear up my doubts about the dataset. Thank you!

The camera extrinsic between the Event and RGB cameras, along with the RGB camera intrinsic, would be very useful. I look forward to this release.

Answer 5 · 2024-05-08T19:09:44.000Z

Hi Pratik,

The camera parameters are added along with the MANO parameters.

Best,
Christen