umautobots/LONER

Open3D error during running LONERSLAM

alexjunholee opened this issue · 12 comments

Hi,

Thanks for your amazing work and for opening the repository!

I tried running LONER with the following command in the example folder...
python run_loner.py ../cfg/fusion_portable/garden.yaml

The program runs a bit, and suddenly crashes with runtime error.

Elapsed Time: 7.327515125274658. Per Iteration: 0.07327515125274658, Its/Sec: 13.647191208800361
100%|████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:07<00:00,  7.01it/s]
Elapsed Time: 7.13842511177063. Per Iteration: 0.0713842511177063, Its/Sec: 14.008692174287697
100%|████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:07<00:00,  7.07it/s]
Elapsed Time: 7.073652505874634. Per Iteration: 0.07073652505874634, Its/Sec: 14.13696812459341
100%|████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:06<00:00,  7.78it/s]
Elapsed Time: 6.430612802505493. Per Iteration: 0.06430612802505493, Its/Sec: 15.5506175027422
 16%|███████████████▌                                                                                 | 8/50 [00:01<00:07,  5.29it/s]
[Open3D WARNING] [KDTreeFlann::SetRawData] Failed due to no data.
 18%|█████████████████▍                                                                               | 9/50 [00:01<00:06,  6.13it/s]
[Open3D WARNING] [KDTreeFlann::SetRawData] Failed due to no data.
Process Process-6:
Traceback (most recent call last):
  File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/user/LONER/src/tracking/tracker.py", line 156, in run
    self.update()
  File "/home/user/LONER/src/tracking/tracker.py", line 117, in update
    tracked = self.track_frame(frame)
  File "/home/user/LONER/src/tracking/tracker.py", line 213, in track_frame
    registration = o3d.pipelines.registration.registration_icp(
RuntimeError: [Open3D Error] (open3d::pipelines::registration::RegistrationResult open3d::pipelines::registration::RegistrationICP(const open3d::geometry::PointCloud&, const open3d::geometry::PointCloud&, double, const Matrix4d&, const open3d::pipelines::registration::TransformationEstimation&, const open3d::pipelines::registration::ICPConvergenceCriteria&)) /root/Open3D/cpp/open3d/pipelines/registration/Registration.cpp:128: TransformationEstimationPointToPlane and TransformationEstimationColoredICP require pre-computed normal vectors for target PointCloud.

Looks like the point cloud registration is failing due to lack of points, are you familiar with this issue?
or would this issue be related to my environment, of not using docker? I'm currently on Ubuntu 20.04, CUDA 11.7

Thanks!
Alex

sethgi commented

Interesting I haven’t seen that one before. Can you additionally share the PyTorch version and open3d version and I’ll take a look?

sethgi commented

So far, I've been unable to reproduce this on my system. Can you also please clarify:

  • Have you observed this multiple times?
  • If so, does this happen every time you run? Does it happen at roughly the same time/iteration?
  • Have you made any changes to the settings? Anything to do with downsampling different amounts?
  • Have you tried any other dataset? If so, does this happen on those?
  • Specifically what data did you download? Can you provide an exact path?

Unfortunately this may be hard to debug without lots more detailed information - the simplest short-term solution would be pulling our docker configuration and trying that.

I'd like to make sure this gets fixed even if it is platform-specific - so far we've done limited testing outside the container so any information you could provide would be a great help!

Hi sethgi,

Thanks for your detailed response!
to followup on your question:

  • Yes, this issue is reproducible on every run, at the same iteration.
  • No, i did not change the setting, and I only tried 20220216_garden_day.bag for now, it was just located under /home/user/git/LONER/examples/20220216_garden_day.bag, extracting file to /home/user/LonerSLAM.

I am currently using...

  • Pytorch = 2.0.1+cu117
  • open3d = 0.17.0

I will update you with the results from other bags if possible.
Thanks!
Alex

sethgi commented

Thanks. I still haven't been able to reproduce this. I have some other things I need to prioritize for the minute but will get back to this when I can. I haven't tested in pytorch 2 yet either... though I don't see why that would cause this particular problem.

In the meantime, the quickest path for you would be to just use the docker configuration we supply for now. Sorry for the trouble!

Hi sethgi !
I also meet the same runtime error as alexjunholee.

I tried to run run_loner.py with sequence 20220216_garden_day and received the errors below:

Elapsed Time: 3.740060806274414. Per Iteration: 0.03740060806274414, Its/Sec: 26.73753320594084
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:03<00:00, 13.29it/s]
Elapsed Time: 3.7640249729156494. Per Iteration: 0.03764024972915649, Its/Sec: 26.56730513733522
 42%|███████████████████████████████████████████████████████████████████████████████████████████████████▉                                                                                                                                          | 21/50 [00:01<00:02, 13.61it/s][Open3D WARNING] [KDTreeFlann::SetRawData] Failed due to no data.
[Open3D WARNING] [KDTreeFlann::SetRawData] Failed due to no data.
Process Process-6:
Traceback (most recent call last):
  File "/opt/conda/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/opt/conda/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/user/LonerSLAM/src/tracking/tracker.py", line 156, in run
    self.update()
  File "/home/user/LonerSLAM/src/tracking/tracker.py", line 117, in update
    tracked = self.track_frame(frame)
  File "/home/user/LonerSLAM/src/tracking/tracker.py", line 213, in track_frame
    registration = o3d.pipelines.registration.registration_icp(
RuntimeError: [Open3D Error] (open3d::pipelines::registration::RegistrationResult open3d::pipelines::registration::RegistrationICP(const open3d::geometry::PointCloud&, const open3d::geometry::PointCloud&, double, const Matrix4d&, const open3d::pipelines::registration::TransformationEstimation&, const open3d::pipelines::registration::ICPConvergenceCriteria&)) /root/Open3D/cpp/open3d/pipelines/registration/Registration.cpp:128: TransformationEstimationPointToPlane and TransformationEstimationColoredICP require pre-computed normal vectors for target PointCloud.

I can reproduce this error every time I run. I am currently using:

  • open3d = 0.17.0
  • torch = 1.10.0a0+0aef44c

Thanks a lot for your nice code and help!

ok.... will look into this more. So far i've been unable to reproduce but will try again now that it's happening to more people.

Also tagging @rahulswa08 @5hloke in case either of you have time to look.

Ok... this is a fairly inelegant way to debug, but can either of you @RoyAPTX4869 or @alexjunholee do one of the following:

a) pull the debug_icp branch
b) make the change manually by pasting the below code below line 205 of tracking/tracker.py (right under for icp_settings in self._settings.icp.schedule)

            import pickle
            logdir = f"{self._settings.log_directory}/icp_debug/frame_{self._frame_count}"
            os.makedirs(logdir, exist_ok=True)
            o3d.io.write_point_cloud(f"{logdir}/frame.pcd", frame_point_cloud)
            o3d.io.write_point_cloud(f"{logdir}/reference.pcd", self._reference_point_cloud)
            np.save(f"{logdir}/initial", initial_guess)
            with open(f"{logdir}/settings.pkl", 'wb+') as pkl_file:
                pickle.dump(icp_settings, pkl_file)

Then when you run it'll dump a bunch of debug files into the output directory garden_<datecode>/icp_debug/...

Then please just upload here the folder from the frame where it crashed and I can look into it that way. Sorry for all the trouble!

Ok... this is a fairly inelegant way to debug, but can either of you @RoyAPTX4869 or @alexjunholee do one of the following:

a) pull the debug_icp branch b) make the change manually by pasting the below code below line 205 of tracking/tracker.py (right under for icp_settings in self._settings.icp.schedule)

            import pickle
            logdir = f"{self._settings.log_directory}/icp_debug/frame_{self._frame_count}"
            os.makedirs(logdir, exist_ok=True)
            o3d.io.write_point_cloud(f"{logdir}/frame.pcd", frame_point_cloud)
            o3d.io.write_point_cloud(f"{logdir}/reference.pcd", self._reference_point_cloud)
            np.save(f"{logdir}/initial", initial_guess)
            with open(f"{logdir}/settings.pkl", 'wb+') as pkl_file:
                pickle.dump(icp_settings, pkl_file)

Then when you run it'll dump a bunch of debug files into the output directory garden_<datecode>/icp_debug/...

Then please just upload here the folder from the frame where it crashed and I can look into it that way. Sorry for all the trouble!

debug_icp.zip

Thanks for your help! I uploaded the last 10 frames in the zip file because I'm not quite certain about which frame the error specifically occurred in. Hope the file can help.

Thanks again for help!

Thanks! looking into this - also, I realize I never asked if you're able to run successfully on other sequences? does this happen on Canteen and MCR also?

Thanks! looking into this - also, I realize I never asked if you're able to run successfully on other sequences? does this happen on Canteen and MCR also?

I just tested on Canteen and it can run perfectly.

Ok I think I finally have reproduced the error... it looks like the data sequence I originally used for testing is slightly different from the version currently listed on the website. I'm going to reach out to the datset authors and I'll let you know what I hear back.

Sorry for the wait on this.

-Seth

going to temporarily reopen this in case anyone still has issues.