CUDA out of memory
MJHutchinson opened this issue · 1 comments
First off, thanks for making this, looks great!
I downloaded the repo and I'm trying to run examples to test out the repo before moving on. Unfortunately I'm running into a problem with running the training in that almost immediately CUDA runs out of memory. I'm running on a GTX 1050 with 4GB of RAM (about 3GB avalible to use for training), same as the 980 you mentioned you were running on? I was just wondering if you had any ideas about what could be causing this issue! Full error message below.
python main.py --network_type rnn --dataset ptb --controller_optim adam --controller_lr 0.00035 --shared_optim sgd --shared_lr 20.0 --entropy_coeff 0.0001
2018-02-16 22:22:54,351:INFO::[*] Make directories : logs/ptb_2018-02-16_22-22-54
2018-02-16 22:22:59,204:INFO::# of parameters: 146,014,000
2018-02-16 22:22:59,315:INFO::[*] MODEL dir: logs/ptb_2018-02-16_22-22-54
2018-02-16 22:22:59,316:INFO::[*] PARAM path: logs/ptb_2018-02-16_22-22-54/params.json
train_shared: 0%| | 0/14524 [00:00<?, ?it/s]
/home/mjhutchinson/Documents/Machine Learning/ENAS-pytorch/models/controller.py:96: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
probs = F.softmax(logits)
/home/mjhutchinson/Documents/Machine Learning/ENAS-pytorch/models/controller.py:97: UserWarning: Implicit dimension choice for log_softmax has been deprecated. Change the call to include dim=X as an argument.
log_prob = F.log_softmax(logits)
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1518244421288/work/torch/lib/THC/generic/THCStorage.cu line=58 error=2 : out of memory
Traceback (most recent call last):
File "main.py", line 45, in <module>
main(args)
File "main.py", line 34, in main
trainer.train()
File "/home/mjhutchinson/Documents/Machine Learning/ENAS-pytorch/trainer.py", line 87, in train
self.train_shared()
File "/home/mjhutchinson/Documents/Machine Learning/ENAS-pytorch/trainer.py", line 143, in train_shared
loss.backward()
File "/home/mjhutchinson/.conda/envs/pytorch/lib/python3.6/site-packages/torch/autograd/variable.py", line 167, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
File "/home/mjhutchinson/.conda/envs/pytorch/lib/python3.6/site-packages/torch/autograd/__init__.py", line 99, in backward
variables, grad_variables, retain_graph)
RuntimeError: cuda runtime error (2) : out of memory at /opt/conda/conda-bld/pytorch_1518244421288/work/torch/lib/THC/generic/THCStorage.cu:58
If there's any other info that would be helpful pleas let me know!
What I am using is gpu980ti which has 6GB memory. You can reduce --shared_embed
and --shared_hid
which is a major factor controlling the required memory size.