facebookresearch/projectaria_tools

Inquiry Regarding Depth Camera Integration in Aria Glasses

RunsenXu opened this issue · 2 comments

Dear Project Aria Team,

The Aria glasses are truly impressive, and I appreciate the innovative technology and design that have gone into their development.

However, I have a query regarding the design choices, specifically about depth perception capabilities. While I understand that depth can be obtained through stereo cameras, I am curious as to why a dedicated depth camera was not included in the Aria glasses. Depth is a crucial piece of information for many applications, and a hardware depth camera could potentially offer greater accuracy than software-based depth perception.

Could you please share the considerations or constraints that led to the decision not to integrate a depth camera into the Aria glasses? Understanding the rationale behind this choice would provide valuable insights into the design and technological priorities of the Aria project.

Thank you for your time and for any information you can provide. I am genuinely interested in learning more about your innovative approach to smart glasses technology.

Best regards,
Runsen

Hello @RunsenXu ,
You can find the details about the choice made for Projet Aria Glasses in the whitepaper Project Aria: A New Tool for Egocentric Multi-Modal AI Research.

Designing a device to capture as much data as possible and make it fit in a small form factor forces to take choice regarding the sensors you can use, the battery life you can expect and the final weight of the devices (i.e "depth sensing" is power hungry, this is not a passive sensor, but a design that emits light)

So conclusion:

  • not including depth sensing is a design choice due to FormFactor/BatteryLife
  • consider the outcome of not having a depth sensor as a challenge regarding ego-centric capabilities ;-)
    • as the user is moving you have multiple frame capture from the scene and so using Multiple View Sterevision you can extract back depth information from the scene.
    • or using ML depth estimation networks and making them metric (using camera baseline and multiple frames)

Hi @SeaOtocinclus,

That makes sense! Thank you for your reply. Great work!

Best,
Runsen