hsinyuan-huang/FlowQA

RuntimeError: CUDA error: out of memory

zysNLP opened this issue · 10 comments

When I execute python train_QuAC.py,there are errors as following:

After Input LSTM, the vector_sizes [doc, query] are [ 250 250 ] * 2
Self deep-attention 250 rays in 750-dim space
Before answer span finding, hidden size are 250 250
12/20/2018 04:43:43 [dev] Total number of params: 11852394
12/20/2018 16:43:43 - INFO - main - [dev] Total number of params: 11852394
12/20/2018 04:43:45 Epoch 1
12/20/2018 16:43:45 - WARNING - main - Epoch 1
12/20/2018 04:43:46 updates[ 1] train loss[15.87973] remaining[1:26:47]
12/20/2018 16:43:46 - INFO - main - updates[ 1] train loss[15.87973] remaining[1:26:47]
12/20/2018 04:44:00 updates[ 21] train loss[10.87507] remaining[0:45:12]
12/20/2018 16:44:00 - INFO - main - updates[ 21] train loss[10.87507] remaining[0:45:12]
12/20/2018 04:44:14 updates[ 41] train loss[10.16175] remaining[0:44:17]
12/20/2018 16:44:14 - INFO - main - updates[ 41] train loss[10.16175] remaining[0:44:17]
12/20/2018 04:44:29 updates[ 61] train loss[9.96472] remaining[0:45:30]
12/20/2018 16:44:29 - INFO - main - updates[ 61] train loss[9.96472] remaining[0:45:30]
12/20/2018 04:44:48 updates[ 81] train loss[9.56536] remaining[0:49:21]
12/20/2018 16:44:48 - INFO - main - updates[ 81] train loss[9.56536] remaining[0:49:21]
12/20/2018 04:45:03 updates[ 101] train loss[9.38102] remaining[0:48:40]
12/20/2018 16:45:03 - INFO - main - updates[ 101] train loss[9.38102] remaining[0:48:40]
12/20/2018 04:45:17 updates[ 121] train loss[9.11970] remaining[0:47:07]
12/20/2018 16:45:17 - INFO - main - updates[ 121] train loss[9.11970] remaining[0:47:07]
12/20/2018 04:45:35 updates[ 141] train loss[8.99858] remaining[0:48:30]
12/20/2018 16:45:35 - INFO - main - updates[ 141] train loss[8.99858] remaining[0:48:30]
12/20/2018 04:45:49 updates[ 161] train loss[8.76992] remaining[0:47:21]
12/20/2018 16:45:49 - INFO - main - updates[ 161] train loss[8.76992] remaining[0:47:21]
12/20/2018 04:46:03 updates[ 181] train loss[8.64918] remaining[0:46:42]
12/20/2018 16:46:03 - INFO - main - updates[ 181] train loss[8.64918] remaining[0:46:42]
12/20/2018 04:46:17 updates[ 201] train loss[8.58265] remaining[0:46:12]
12/20/2018 16:46:17 - INFO - main - updates[ 201] train loss[8.58265] remaining[0:46:12]
12/20/2018 04:46:30 updates[ 221] train loss[8.47283] remaining[0:45:20]
12/20/2018 16:46:30 - INFO - main - updates[ 221] train loss[8.47283] remaining[0:45:20]
12/20/2018 04:46:43 updates[ 241] train loss[8.38734] remaining[0:44:24]
12/20/2018 16:46:43 - INFO - main - updates[ 241] train loss[8.38734] remaining[0:44:24]
12/20/2018 04:46:57 updates[ 261] train loss[8.36940] remaining[0:44:00]
12/20/2018 16:46:57 - INFO - main - updates[ 261] train loss[8.36940] remaining[0:44:00]
Traceback (most recent call last):
File "train_QuAC.py", line 324, in
main()
File "train_QuAC.py", line 209, in main
model.update(batch)
File "/home/zys/文档/FlowQA-master/QA_model/model_QuAC.py", line 83, in update
score_s, score_e, score_no_answ = self.network(*inputs)
File "/home/zys/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/zys/文档/FlowQA-master/QA_model/detail_model.py", line 306, in forward
highlvl_self_attn_hiddens = self.highlvl_self_att(x1_att, x1_att, x1_mask, x3=doc_hiddens, drop_diagonal=True)
File "/home/zys/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/zys/文档/FlowQA-master/QA_model/layers.py", line 285, in forward
alpha = F.softmax(scores, dim=2)
File "/home/zys/anaconda3/lib/python3.6/site-packages/torch/nn/functional.py", line 889, in softmax
return input.softmax(dim)
RuntimeError: CUDA error: out of memory

请问下代码哪里可以优化一下么,使内存尽量减少用些?本人机器1080TI

Is your batch_size = 1?
You need to set the batch size as 1 to avoid out of memory.
Besides, you could add torch.cuda.empty() to the end of function update() at the train_CoQA.py

请问下代码哪里可以优化一下么,使内存尽量减少用些?本人机器1080TI

Have you solved this problem? @zysNLP

Hi,I met the same problem with you , have you solved it?

請問一下,如果想跑train_QuAC.py的部分,大概需要memory size是多少的GPU呢?

@a410661 More than 8G,I have tried on 1070ti-8g

@a410661 More than 8G,I have tried on 1070ti-8g

感謝! 不過我試了batch_size=1的case,還是有memory不夠的問題@@

原來如此,謝謝回答!
看來我要想想別的辦法了....

QuAC大概需要11g,所以8g是不行的,即使batch size设置为1 发自我的iPhone

------------------ Original ------------------ From: a410661 notifications@github.com Date: Thu,Apr 25,2019 10:35 PM To: momohuang/FlowQA FlowQA@noreply.github.com Cc: longkun 1711466966@qq.com, Comment comment@noreply.github.com Subject: Re: [momohuang/FlowQA] RuntimeError: CUDA error: out of memory (#8) @a410661 More than 8G,I have tried on 1070ti-8g 感謝! 不過我試了batch_size=1的case,還是有memory不夠的問題@@ — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

Thank you for all of you to reply this issue.It seems the problem get some answers. I think making batchsize=1 is not a good idea, even maybe solve it. Maybe need a better algorithm. Hope there are more answers.