RuntimeError & Machine Type Inquery

Question

RuntimeError & Machine Type Inquery

Mollylulu opened this issue 3 years ago · 5 comments

Running pre-collate on 3D data...
Traceback (most recent call last):
  File "s3dis_vis.py", line 100, in <module>
    dataset = S3DISFusedDataset(cfg.data)
  File "/xxx/torch_points3d/datasets/segmentation/multimodal/s3dis.py", line 767, in __init__
    self.train_dataset = S3DISSphereMM(
  File "/xxx/torch_points3d/datasets/segmentation/multimodal/s3dis.py", line 596, in __init__
    super().__init__(root, *args, **kwargs)
  File "/xxx/torch_points3d/datasets/segmentation/multimodal/s3dis.py", line 178, in __init__
    super(S3DISOriginalFusedMM, self).__init__(
  File "/home/xxx/lib/python3.8/site-packages/torch_geometric/data/in_memory_dataset.py", line 56, in __init__
    super().__init__(root, transform, pre_transform, pre_filter)
  File "/home/xxx/lib/python3.8/site-packages/torch_geometric/data/dataset.py", line 87, in __init__
    self._process()
  File "/home/xxx/lib/python3.8/site-packages/torch_geometric/data/dataset.py", line 170, in _process
    self.process()
  File "/xxx/torch_points3d/datasets/segmentation/multimodal/s3dis.py", line 655, in process
    super().process()
  File "/xxx/torch_points3d/datasets/segmentation/multimodal/s3dis.py", line 418, in process
    data_list = self.pre_collate_transform(data_list)
  File "/home/xxx/lib/python3.8/site-packages/torch_geometric/transforms/compose.py", line 19, in __call__
    data = [transform(d) for d in data]
  File "/home/xxx/lib/python3.8/site-packages/torch_geometric/transforms/compose.py", line 19, in <listcomp>
    data = [transform(d) for d in data]
  File "/xxx/torch_points3d/core/data_transform/features.py", line 541, in __call__
    data = self._process(data)
  File "/xxx/torch_points3d/core/data_transform/features.py", line 500, in _process
    neighbors = nn_finder(xyz_search, xyz_query, None, None)
  File "/xxx/torch_points3d/core/spatial_ops/neighbour_finder.py", line 17, in __call__
    return self.find_neighbours(x, y, batch_x, batch_y)
  File "/xxx/torch_points3d/core/spatial_ops/neighbour_finder.py", line 263, in find_neighbours
    return torch.LongTensor(gpu_index_flat.search(y_np, k)[1]).to(x.device)
  File "/xxx/lib/python3.8/site-packages/faiss/__init__.py", line 322, in replacement_search
    self.search_c(n, swig_ptr(x), k, swig_ptr(D), swig_ptr(I))
  File "/xxx/lib/python3.8/site-packages/faiss/swigfaiss_avx2.py", line 9009, in search
    return _swigfaiss_avx2.GpuIndex_search(self, n, x, k, distances, labels)
RuntimeError: Error in virtual void* faiss::gpu::StandardGpuResourcesImpl::allocMemory(const faiss::gpu::AllocRequest&) at
/root/miniconda3/conda-bld/faiss-pkg_1639741185190/work/faiss/gpu/StandardGpuResources.cpp:452:
Error: 'err == cudaSuccess' failed: StandardGpuResources: alloc fail type TemporaryMemoryOverflow dev 0
space Device stream 0x558ecfc66c70 size 22479120128 bytes (cudaMalloc error out of memory [2])

Hi, I run the s3dis_visualization.ipynb under notebooks for s3dis dataset. It seems need a huge memory for both CPU & GPU. And I got this OOM error, which hints that it requires over 20G GPU to preprocess the data. 😢
Therefore, I wonder know the machine type of yours as a reference. And this preprcossing looks like not memory-friendly, Is there any way to walk around this 20+G GPU memory requirement.
Thanks! and looking forward to your help.

Answer 1 · 2022-04-25T07:49:07.000Z

Hi, thanks for using this repo and for the feedback !

Indeed you seem to be encountering issues with GPU-accelerated nearest neighbor search using FAISS. It is a problem I have not solved yet, but for the meantime, you can try doing this step on the CPU instead.

To this end, please set use_faiss: False in conf/data/segmentation/multimodal/s3disfused-sparse.yaml :

    - transform: PCAComputePointwise
      params:
            num_neighbors: 50  # heuristic: at least 30
            # r: 0.1  # heuristic: 2 * voxel - using r will force CPU computation
            # use_full_pos: True  # Possible if GridSampling3D.setattr_full_pos = True
            use_faiss: False

This will move the neighbor computation on CPU using KEOPS.

In any case, this preprocessing step will always be quite memory-hungry, even on the CPU. So I recommend you do not have any another important tasks running on your machine when you start preprocessing the datasets.

FYI I have 64G of RAM and 32G of GPU on my machine and have not tested this project with less memory. If you do not have access to a 30+G GPU, you will be able to run inference from pretrained models but training large multimodal models may be tricky. If you run into this problem, please let me known in a separate issue, I may have some tricks to help.

Please let me know how that goes !

Answer 2 · 2022-04-26T05:49:21.000Z

well noted, thank you for your kind help 🌹

Answer 3 · 2022-04-26T08:05:34.000Z

Sure ! Please let me know if you managed to preprocess and train as you wanted 😉

Answer 4 · 2022-04-29T09:23:43.000Z

Hello @Mollylulu, have you succeeded in running the preprocessing on S3DIS ?

Answer 5 · 2022-05-04T18:49:16.000Z

Closing this issue since I think the new default config with CPU preprocessing should solve this