nshepperd/gpt-2

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[1,12,1024,1024] and type float on /job:localhost/replica:0/task:0/device:GPU

josai opened this issue · 12 comments

josai commented

Caused by op 'model/h3/attn/truediv_1', defined at:
File "train.py", line 293, in
main()
File "train.py", line 138, in main
opt_grads = memory_saving_gradients.gradients(loss, train_vars)
File "C:\Users\The Atomizer\Desktop\text\gpt2\memory_saving_gradients.py", line 250, in gradients
copied_sgv, info = ge.copy_with_input_replacements(ge.sgv(ops_to_copy), {})
File "C:\Users\The Atomizer\Miniconda3\envs\gtext\lib\site-packages\tensorflow\contrib\graph_editor\transform.py", line 673, in copy_with_input_replacements
sgv, dst_graph, dst_scope, src_scope, reuse_dst_scope=reuse_dst_scope)
File "C:\Users\The Atomizer\Miniconda3\envs\gtext\lib\site-packages\tensorflow\contrib\graph_editor\transform.py", line 453, in call
self.copy_ops(info)
File "C:\Users\The Atomizer\Miniconda3\envs\gtext\lib\site-packages\tensorflow\contrib\graph_editor\transform.py", line 467, in copy_ops
op
, op_outputs
= self.transform_op_handler(info, op, new_inputs)
File "C:\Users\The Atomizer\Miniconda3\envs\gtext\lib\site-packages\tensorflow\contrib\graph_editor\transform.py", line 177, in copy_op_handler
[], input_types_, None, op_def_)
File "C:\Users\The Atomizer\Miniconda3\envs\gtext\lib\site-packages\tensorflow\python\framework\ops.py", line 1770, in init
self._traceback = tf_stack.extract_stack()

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[1,12,1024,1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[node model/h3/attn/truediv_1 (defined at C:\Users\The Atomizer\Miniconda3\envs\gtext\lib\site-packages\tensorflow\contrib\graph_editor\transform.py:177) = RealDiv[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](model/h3/attn/Exp_1, model/h3/attn/Sum_1)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

josai commented

Whats going on here? am I out of memory? I can't get it to train.

josai commented

conda install -c anaconda cudnn==6.0.0 --yes

seemed to fix the problem...

@josai how did you know that you needed to do conda install -c anaconda cudnn==6.0.0 --yes from the error message?

josai commented

@josai how did you know that you needed to do conda install -c anaconda cudnn==6.0.0 --yes from the error message?

googling similar errors and trying their solutions until one worked.

Is this with the 345M model? I've found it only just fits in a 1080TI, so anything using substantial vram like a browser running in the background can push it over the edge.

josai commented

Is this with the 345M model? I've found it only just fits in a 1080TI, so anything using substantial vram like a browser running in the background can push it over the edge.

No, neither models were working until I conda installed cudnn. I am currently retraining the 345m model on GTX 970 with several applications including chrome in the background with no problems.

I have a gtx 1060 6gb and I also have this problem.
Searching on google I read that the batch size should be reduced, so I launched
PYTHONPATH=src ./train.py --batch_size 1 --dataset test.txt
but I had the same problem.
I then changed this line to train.py:
return [data_sampler.sample(1024) for _ in range(args.batch_size)]
in:
return [data_sampler.sample(512) for _ in range(args.batch_size)]
but I don't know what this line does, how will the training change?
if this change is not good how can I fix it?

Same problem. Are there any diagnostics we can run?

iocaposk8, that change is one way to reduce the memory usage. You are basically shortening the model's memory there, by allowing it to only remember the last 512 words instead of the full 1024, during training. I'm not sure how much the effect on output quality would be from that.

thanks for the answer, so can I use for example 895 as value? or is better number like 128, 512, 1024 ecc...?

Another question: I am training a model for my language, according to you how much should the loss be for having a good model but not going overfit?

last question: how can I generate texts that speak about a certain topic?

Thank you

i'm having the same problem trying to train the 355M model on a RTX2070 8GB, even with both --memory_saving_gradients and --optimizer sgd i get the following error tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[1,16,1024,64] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[{{node model/h23/attn/MatMul_1_1}} = BatchMatMul[T=DT_FLOAT, adj_x=false, adj_y=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](model/h23/attn/truediv_1, model/h23/attn/transpose_2_1)]]

i didn't use Conda but i have cudnn installed manually (cudnn v7.6.2.24 on CUDA 9.0)

I have a gtx 1060 6gb and I also have this problem.
Searching on google I read that the batch size should be reduced, so I launched
PYTHONPATH=src ./train.py --batch_size 1 --dataset test.txt
but I had the same problem.
I then changed this line to train.py:
return [data_sampler.sample(1024) for _ in range(args.batch_size)]
in:
return [data_sampler.sample(512) for _ in range(args.batch_size)]
but I don't know what this line does, how will the training change?
if this change is not good how can I fix it?

This fix worked for me.