facebookresearch/projectaria_tools

ASE : Insights About Long Processing Time for Semantic Segmentation of Scenes

anassmu opened this issue · 2 comments

I am currently working with the Aria dataset for semantic segmentation tasks. Each scene in the dataset contains around 350-1700 depth and instance images. My current workflow ( #48 #49 #1 ) involves undistorting these images, unprojecting them to 3D space, applying transformations, and creating a 3D scene with semantic information. Additionally, due to the large size of the generated point clouds, downsampling is necessary, which further adds to the processing time. On average, this workflow takes about 1-5 hours per scene. Given the size of the dataset (around 100,000 scenes), this approach is proving to be impractical.

Current Workflow

  • Undistortion of Depth and Instance Images: Each scene's images are processed to correct for distortion.
  • Unprojection to 3D Space: The undistorted images are then converted into 3D point clouds.
  • Transformation and Scene Creation: These point clouds are transformed and combined to create a complete aligned 3D scene.
  • Instance Mapping and Downsampling: The instances are mapped to their respective classes, and the point cloud is downsampled to manage its size.

Issues Encountered

  • Extensive Processing Time: It takes approximately 30 minutes to process one scene, making it infeasible to handle the entire dataset.
  • Large Point Clouds: The size of the point clouds is substantial, necessitating downsampling, which adds to the processing time.

Questions and Requests for Alternatives

  • Are there any existing functions or tools within the Aria framework that can expedite this process, particularly for semantic segmentation tasks?
  • Is it possible to directly retrieve these semantically segmented point clouds without going through the entire workflow mentioned above?
  • Regarding the semi-dense point cloud and the bounding boxes provided in the dataset, is it feasible to use them for segmentation tasks? I noticed that not all classes have corresponding bounding boxes.
  • Any suggestions or guidance on optimizing this process or alternative approaches that can be adopted would be greatly appreciated.
  • will C++ script run faster than Python ?

Thank you so much !

Hi @anassmu , Thanks for this thorough post and really appreciate you going into such details with each question. I will try to answer your questions one by one.

Are there any existing functions or tools within the Aria framework that can expedite this process, particularly for semantic segmentation tasks?

Currently there are no existing functions within the projectaria-tools to expedite semantic segmentation tasks. Please note here that some level of parallelism has already been incorporated e.g. the undistort function in projectaria-tools is already multithreaded.

Having said that, there are two things to remember here:

  1. This is one of the largest indoor datasets provided (100K), with an option to use a fragment of it according to a user’s need. E.g. you can easily use 1K, 10K, 20K etc, according to your need and available compute. If you wanted to use the whole 100K, you will need an equivalent amount of compute. You can fasten up the process of semantic segmentation using techniques like multiprocessing to make the maximal use of your available compute.
  2. The dataset was provided with the intent for indoor reconstruction in mind and hence point clouds, poses, images were provided with additional cues like depth/instances. On further requests we extended the dataset to have instances associated with classes. So the onus currently lies upon the user to generate derivatives of this which includes Semantic segmentation data. Having more data like segmented points from us will also mean, that each of the dataset chunks get heavier, which gets infeasible in terms of storage as well as download sizes

Is it possible to directly retrieve these semantically segmented point clouds without going through the entire workflow mentioned above?

You have to follow the workflow as above, there is no direct way at the moment. You can fasten it up by using better utilisation of compute (e.g. multiprocessing) and/or more compute.

Regarding the semi-dense point cloud and the bounding boxes provided in the dataset, is it feasible to use them for segmentation tasks? I noticed that not all classes have corresponding bounding boxes.

Currently the bounding boxes / language commands exist only for walls/doors/windows. As stated in issue #21 , we will provide more information about object poses/ bounding boxes for other classes in a future version.

Any suggestions or guidance on optimizing this process or alternative approaches that can be adopted would be greatly appreciated.

Not clear what you mean here. I have already provided some options above.

Will C++ script run faster than Python ?

You can already achieve faster computations using multiprocessing in python, you can definitely get another level of speed up with c++ using pybinds (if you want to stick with python).

Hope this helps!

Seems there is no more activity here. Closing the task for now. Please feel free to open it back if you have more questions.