facebookresearch/projectaria_tools

ADT skeleton mismatch with grayscale image

ArmandB opened this issue · 2 comments

Hello, I'm looking to use the ADT dataset for depth estimation using the video and IMU data, taking into consideration dynamic factors such as people.

There seems to be some cases where the depth image doesn't match the grayscale when humans are involved. This example is for Apartment_release_golden_skeleton_seq100_10s_sample using the depth_images_with_skeleton.vrs and the video.vrs for the 'camera-slam-left' stream and the 345-2 streams. The hands and arm in the depth image don't seem to match with the grayscale causing the utensils to appear like they're floating in the air throughout the sequence.
image

I'm noticing that the depth image also seems to omit the suit and IR markers while the synthetic grayscale file will completely omit the humans, so it's possible that my use case is a poor fit for this dataset. Any help trying to do depth estimation using the video and IMU data of the ADT dataset would be much appreciated and thank you in advance!:D

Hi @ArmandB, thanks for your question and interest in ADT!

The misalignment with body mesh and true hand/arm positions is a known limitation with ADT. Our ground truthing system often had a hard time properly tracking the hand and wrist angle. For that reason, we offer a version of the data which fully excludes the human meshes. We think this poor tracking performance is due to the fact that the bodysuit we were using only had a few hand markers, and the cameras are mounted so high relative to the human that it provides poor visibility.

We understand this is not ideal, and we hope that we can add better hand tracking to the dataset in the future. If you do have access to an Aria Research Kit, you can run our Machine Perception Services which (as of recently) provide hand tracking results. We hope to integrate this into the publicly available ADT data in the near future.

@nickcharron Thank you so much for your speedy response! I appreciate you. Good to know. I will stick to the synthetic data without human meshes as you suggest. Will also stay on the lookout for the MPS hand tracking getting integrated into the ADT dataset. All the best!