google-research/jax3d

Out of Memory when trying to train real360 scene

cyz2727327 opened this issue · 0 comments

Hi all, I am running this with RTX8000 (48g GPU RAM) with 128G CPU RAM, on the Linux Ubuntu. I was able to successfully complete synthetic and forwardFacing scene, but when I tried to train real360 scene, a bunch of out-of-memory error comes out. Can anyone give me some tips to fix this please? thanks a lot !!!

2022-10-04 22:39:41.417477: W external/org_tensorflow/tensorflow/core/common_runtime/device/device_host_allocator.h:46] could not allocate pinned host memory of size: 17179869184
2022-10-04 22:39:42.742658: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:796] failed to alloc 17179869184 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-10-04 22:39:42.742753: W external/org_tensorflow/tensorflow/core/common_runtime/device/device_host_allocator.h:46] could not allocate pinned host memory of size: 17179869184
2022-10-04 22:39:54.170550: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:796] failed to alloc 17179869184 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-10-04 22:39:54.170644: W external/org_tensorflow/tensorflow/core/common_runtime/device/device_host_allocator.h:46] could not allocate pinned host memory of size: 17179869184
2022-10-04 22:39:55.645782: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:796] failed to alloc 17179869184 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-10-04 22:39:55.645855: W external/org_tensorflow/tensorflow/core/common_runtime/device/device_host_allocator.h:46] could not allocate pinned host memory of size: 17179869184
2022-10-04 22:39:55.645870: W external/org_tensorflow/tensorflow/core/common_runtime/bfc_allocator.cc:479] Allocator (xla_gpu_host_bfc) ran out of memory trying to allocate 720B (rounded to 768)requested by op
2022-10-04 22:39:55.645919: W external/org_tensorflow/tensorflow/core/common_runtime/bfc_allocator.cc:491]