drprojects/superpoint_transformer

FRNN - RuntimeError: Unknown layout

Closed this issue · 8 comments

Hello, the following error occurred while I was training the Dales Dataset with the instruction python src/train.py experiment=semantic/dales_11g. I am sure I have placed the dataset in the location described in dataset. How can I solve it? My environment is Cuda 12.1 and RTX3090.

Error executing job with overrides: ['experiment=semantic/dales_11g']
Traceback (most recent call last):
  File "src/train.py", line 167, in main
    metric_dict, _ = train(cfg)
  File "/home/user/桌面/Python/superpoint_transformer-master/src/utils/utils.py", line 48, in wrap
    raise ex
  File "/home/user/桌面/Python/superpoint_transformer-master/src/utils/utils.py", line 45, in wrap
    metric_dict, object_dict = task_func(cfg=cfg)
  File "src/train.py", line 132, in train
    trainer.fit(model=model, datamodule=datamodule, ckpt_path=cfg.get("ckpt_path"))
  File "/home/user/anaconda3/envs/spt/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 544, in fit
    call._call_and_handle_interrupt(
  File "/home/user/anaconda3/envs/spt/lib/python3.8/site-packages/pytorch_lightning/trainer/call.py", line 44, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/home/user/anaconda3/envs/spt/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 580, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "/home/user/anaconda3/envs/spt/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 947, in _run
    self._data_connector.prepare_data()
  File "/home/user/anaconda3/envs/spt/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/data_connector.py", line 94, in prepare_data
    call._call_lightning_datamodule_hook(trainer, "prepare_data")
  File "/home/user/anaconda3/envs/spt/lib/python3.8/site-packages/pytorch_lightning/trainer/call.py", line 179, in _call_lightning_datamodule_hook
    return fn(*args, **kwargs)
  File "/home/user/桌面/Python/superpoint_transformer-master/src/datamodules/base.py", line 144, in prepare_data
    self.dataset_class(
  File "/home/user/桌面/Python/superpoint_transformer-master/src/datasets/base.py", line 223, in __init__
    super().__init__(root, transform, pre_transform, pre_filter)
  File "/home/user/anaconda3/envs/spt/lib/python3.8/site-packages/torch_geometric/data/in_memory_dataset.py", line 57, in __init__
    super().__init__(root, transform, pre_transform, pre_filter, log)
  File "/home/user/anaconda3/envs/spt/lib/python3.8/site-packages/torch_geometric/data/dataset.py", line 97, in __init__
    self._process()
  File "/home/user/桌面/Python/superpoint_transformer-master/src/datasets/base.py", line 647, in _process
    self.process()
  File "/home/user/桌面/Python/superpoint_transformer-master/src/datasets/base.py", line 682, in process
    self._process_single_cloud(p)
  File "/home/user/桌面/Python/superpoint_transformer-master/src/datasets/base.py", line 710, in _process_single_cloud
    nag = self.pre_transform(data)
  File "/home/user/anaconda3/envs/spt/lib/python3.8/site-packages/torch_geometric/transforms/compose.py", line 24, in __call__
    data = transform(data)
  File "/home/user/桌面/Python/superpoint_transformer-master/src/transforms/transforms.py", line 23, in __call__
    return self._process(x)
  File "/home/user/桌面/Python/superpoint_transformer-master/src/transforms/neighbors.py", line 46, in _process
    neighbors, distances = knn_1(
  File "/home/user/桌面/Python/superpoint_transformer-master/src/utils/neighbors.py", line 53, in knn_1
    distances, neighbors, _, _ = frnn.frnn_grid_points(
  File "/home/user/桌面/Python/superpoint_transformer-master/src/dependencies/FRNN/frnn/frnn.py", line 331, in frnn_grid_points
    idxs, dists, sorted_points2, pc2_grid_off, sorted_points2_idxs, grid_params_cuda = _frnn_grid_points.apply(
  File "/home/user/anaconda3/envs/spt/lib/python3.8/site-packages/torch/autograd/function.py", line 553, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/home/user/桌面/Python/superpoint_transformer-master/src/dependencies/FRNN/frnn/frnn.py", line 174, in forward
    idxs, dists = _C.find_nbrs_cuda(sorted_points1, sorted_points2,
RuntimeError: Unknown layout

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

Hi @xiarobin, as you can see in the traceback, the error seems to come from FRNN. This is the library we are using for fast neighbor search on GPU. Several users have reported issues with this dependency. Please make sure FRNN is properly installed. Also, look into passed issues related to FRNN to see if the solution is not already there.

It is possible that the error trace is not entirely returned, you can try setting HYDRA_FULL_ERROR=1 as suggested. Maybe this will return a more informative feedback.

PS: if you ❤️ or simply use this project, don't forget to give it a ⭐, it means a lot to us !

Hi @xiarobin, as you can see in the traceback, the error seems to come from FRNN. This is the library we are using for fast neighbor search on GPU. Several users have reported issues with this dependency. Please make sure FRNN is properly installed. Also, look into passed issues related to FRNN to see if the solution is not already there.

It is possible that the error trace is not entirely returned, you can try setting HYDRA_FULL_ERROR=1 as suggested. Maybe this will return a more informative feedback.

Hi @drprojects , I have correctly installed the FRNN library according to install.sh and set HYDRA-FULL_ERROR=1 as per your suggestion, but the above error still occurs.

QQ20240419-152401

From your screenshot, we can't really tell whether the installation went through all the way.

In any case, this is a FRNN-related issue. So you should investigate in this direction:

After a 1-minute search of your error message on Google:

People seem to solve this by downgrading PyTorch version to 2.1.0. Can you please try this and let us know ?

pinning torch to 2.1.0 works for me, on ubuntu 22.04 and arch.

I could install and run with torch 2.2.0 without problem on my end. It seems this issues is machine-dependent and can be fixed with a downgrade to torch 2.1.0. Closing this now.

Hello, I would like to ask the specific configuration. I tried a lot of torch versions include 2.1.0 and 2.2.0 and 2.2.2 and 2.3.0.