RuntimeError & Machine Type Inquery
Mollylulu opened this issue · 5 comments
Running pre-collate on 3D data...
Traceback (most recent call last):
File "s3dis_vis.py", line 100, in <module>
dataset = S3DISFusedDataset(cfg.data)
File "/xxx/torch_points3d/datasets/segmentation/multimodal/s3dis.py", line 767, in __init__
self.train_dataset = S3DISSphereMM(
File "/xxx/torch_points3d/datasets/segmentation/multimodal/s3dis.py", line 596, in __init__
super().__init__(root, *args, **kwargs)
File "/xxx/torch_points3d/datasets/segmentation/multimodal/s3dis.py", line 178, in __init__
super(S3DISOriginalFusedMM, self).__init__(
File "/home/xxx/lib/python3.8/site-packages/torch_geometric/data/in_memory_dataset.py", line 56, in __init__
super().__init__(root, transform, pre_transform, pre_filter)
File "/home/xxx/lib/python3.8/site-packages/torch_geometric/data/dataset.py", line 87, in __init__
self._process()
File "/home/xxx/lib/python3.8/site-packages/torch_geometric/data/dataset.py", line 170, in _process
self.process()
File "/xxx/torch_points3d/datasets/segmentation/multimodal/s3dis.py", line 655, in process
super().process()
File "/xxx/torch_points3d/datasets/segmentation/multimodal/s3dis.py", line 418, in process
data_list = self.pre_collate_transform(data_list)
File "/home/xxx/lib/python3.8/site-packages/torch_geometric/transforms/compose.py", line 19, in __call__
data = [transform(d) for d in data]
File "/home/xxx/lib/python3.8/site-packages/torch_geometric/transforms/compose.py", line 19, in <listcomp>
data = [transform(d) for d in data]
File "/xxx/torch_points3d/core/data_transform/features.py", line 541, in __call__
data = self._process(data)
File "/xxx/torch_points3d/core/data_transform/features.py", line 500, in _process
neighbors = nn_finder(xyz_search, xyz_query, None, None)
File "/xxx/torch_points3d/core/spatial_ops/neighbour_finder.py", line 17, in __call__
return self.find_neighbours(x, y, batch_x, batch_y)
File "/xxx/torch_points3d/core/spatial_ops/neighbour_finder.py", line 263, in find_neighbours
return torch.LongTensor(gpu_index_flat.search(y_np, k)[1]).to(x.device)
File "/xxx/lib/python3.8/site-packages/faiss/__init__.py", line 322, in replacement_search
self.search_c(n, swig_ptr(x), k, swig_ptr(D), swig_ptr(I))
File "/xxx/lib/python3.8/site-packages/faiss/swigfaiss_avx2.py", line 9009, in search
return _swigfaiss_avx2.GpuIndex_search(self, n, x, k, distances, labels)
RuntimeError: Error in virtual void* faiss::gpu::StandardGpuResourcesImpl::allocMemory(const faiss::gpu::AllocRequest&) at
/root/miniconda3/conda-bld/faiss-pkg_1639741185190/work/faiss/gpu/StandardGpuResources.cpp:452:
Error: 'err == cudaSuccess' failed: StandardGpuResources: alloc fail type TemporaryMemoryOverflow dev 0
space Device stream 0x558ecfc66c70 size 22479120128 bytes (cudaMalloc error out of memory [2])
Hi, I run the s3dis_visualization.ipynb
under notebooks
for s3dis dataset
. It seems need a huge memory for both CPU & GPU. And I got this OOM
error, which hints that it requires over 20G GPU to preprocess the data. 😢
Therefore, I wonder know the machine type of yours as a reference. And this preprcossing looks like not memory-friendly, Is there any way to walk around this 20+G
GPU memory requirement.
Thanks! and looking forward to your help.
Hi, thanks for using this repo and for the feedback !
Indeed you seem to be encountering issues with GPU-accelerated nearest neighbor search using FAISS
. It is a problem I have not solved yet, but for the meantime, you can try doing this step on the CPU instead.
To this end, please set use_faiss: False
in conf/data/segmentation/multimodal/s3disfused-sparse.yaml
:
- transform: PCAComputePointwise
params:
num_neighbors: 50 # heuristic: at least 30
# r: 0.1 # heuristic: 2 * voxel - using r will force CPU computation
# use_full_pos: True # Possible if GridSampling3D.setattr_full_pos = True
use_faiss: False
This will move the neighbor computation on CPU using KEOPS
.
In any case, this preprocessing step will always be quite memory-hungry, even on the CPU. So I recommend you do not have any another important tasks running on your machine when you start preprocessing the datasets.
FYI I have 64G
of RAM and 32G
of GPU on my machine and have not tested this project with less memory. If you do not have access to a 30+G
GPU, you will be able to run inference from pretrained models but training large multimodal models may be tricky. If you run into this problem, please let me known in a separate issue, I may have some tricks to help.
Please let me know how that goes !
well noted, thank you for your kind help 🌹
Sure ! Please let me know if you managed to preprocess and train as you wanted 😉
Hello @Mollylulu, have you succeeded in running the preprocessing on S3DIS ?
Closing this issue since I think the new default config with CPU preprocessing should solve this