XinJCheng/CSPN

CUDA OOM error

terechsama opened this issue · 0 comments

Im trying to run eval.py script with GTX 1070 8GB Vram card. I've set batch_size to 1 and tried various n_samples but error still prevails.
The error im getting is:
====TOTAL MEMORY==== 8589934592 GeForce GTX 1070 Memory Usage: Allocated: 0.0 GB Cached: 0.0 GB Traceback (most recent call last): File "eval.py", line 122, in <module> net.cuda() File "C:\Users\Lab_admin\anaconda3\envs\cspn\lib\site-packages\torch\nn\modules\module.py", line 258, in cuda return self._apply(lambda t: t.cuda(device)) File "C:\Users\Lab_admin\anaconda3\envs\cspn\lib\site-packages\torch\nn\modules\module.py", line 185, in _apply module._apply(fn) File "C:\Users\Lab_admin\anaconda3\envs\cspn\lib\site-packages\torch\nn\modules\module.py", line 185, in _apply module._apply(fn) File "C:\Users\Lab_admin\anaconda3\envs\cspn\lib\site-packages\torch\nn\modules\module.py", line 191, in _apply param.data = fn(param.data) File "C:\Users\Lab_admin\anaconda3\envs\cspn\lib\site-packages\torch\nn\modules\module.py", line 258, in <lambda> return self._apply(lambda t: t.cuda(device)) RuntimeError: CUDA error: out of memory ==> evaluating model with cspn and unet on nyudepth ==> Preparing data.. ==> Building model.. {'norm_type': '8sum', 'step': 24, 'kernel': 3} ==> Resuming from best model.. ==> model dict with addtional module, remove it...

In eval.py I've just added this in beginning:
os.environ["CUDA_VISIBLE_DEVICES"] = '0' print("====TOTAL MEMORY====") print(torch.cuda.get_device_properties(0).total_memory) print(torch.cuda.get_device_name(0)) print('Memory Usage:') print('Allocated:', round(torch.cuda.memory_allocated(0)/1024**3,1), 'GB') print('Cached: ', round(torch.cuda.memory_cached(0)/1024**3,1), 'GB')

Does anyone knows how to fix this problem?