Out of memory?
ulatekh opened this issue · 0 comments
I've been learning CUDA and pytorch just so that I could run this project. (Doing so has been something of a trial by fire.)
I built my own pytorch from the repo's v0.4.0 tag, and have it running (partially) on two machines, both running Fedora Core 30: one with a Quadro P2000 with 4 GB of main memory, 5 GB of video memory, using SM 6.0, CUDA 9.1, and gcc 5.1.1, and another machine with an RTX 2060 with 32 GB of main memory, 6 GB of video memory, using SM 6.0/7.0, CUDA 9.2 (10.1 had terrible build problems with pytorch 0.4.0), and gcc 6.2.1.
Both machines can run the data/bag.avi test, but when I try to run the data/Human6 test, once it gets to the inpainting part, the RTX 2060 machine gets this:
THCudaCheck FAIL file=$(PYTORCH)/aten/src/THC/generic/THCStorage.cu line=58 error=2 : out of memory
Traceback (most recent call last):
File "demo.py", line 15, in
inpaint(args)
File "$(VOM)/inpaint.py", line 96, in inpaint
inputs = to_var(inputs)
File "$(VOM)/inpainting/utils.py", line 170, in to_var
x = x.cuda()
RuntimeError: cuda runtime error (2) : out of memory at $(PYTORCH)/aten/src/THC/generic/THCStorage.cu:58
The Quadro P2000 machine fails the inpainting part of the data/Human6 test with:
ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm).
Traceback (most recent call last):
File "demo.py", line 15, in
inpaint(args)
File "$(VOM)/inpaint.py", line 74, in inpaint
for seq, (inputs, masks, info) in enumerate(DTloader):
File "$(PYTORCH)/utils/data/dataloader.py", line 280, in next
idx, batch = self._get_batch()
File "$(PYTORCH)/utils/data/dataloader.py", line 259, in _get_batch
return self.data_queue.get()
File "/usr/lib64/python3.7/multiprocessing/queues.py", line 352, in get
res = self._reader.recv_bytes()
File "/usr/lib64/python3.7/multiprocessing/connection.py", line 216, in recv_bytes
buf = self._recv_bytes(maxlength)
File "/usr/lib64/python3.7/multiprocessing/connection.py", line 407, in _recv_bytes
buf = self._recv(4)
File "/usr/lib64/python3.7/multiprocessing/connection.py", line 379, in _recv
chunk = read(handle, remaining)
File "$(PYTORCH)/utils/data/dataloader.py", line 178, in handler
_error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid
Insufficient RAM, I guess? Any insight into what part is so memory-intensive, and what could be done about it?
Assuming these problems are surmountable...do you know if the algorithm is amenable to removing something that doesn't appear in the first frame, and that fades in/out? My first intended project is to remove the credit text from this video.
Thank you for any insights into these issues!