GPU memory issue

Question

GPU memory issue

Closed this issue 3 years ago · 4 comments

Hi Qingyun, it is really a nice job. Thank you so much for providing the codes and models. When I was try to decode your model, even with batch size 1, my machine suffers from the "CUDA out of memory" problem. I am not sure whether it is because of me or it is because of my machine. I am using V100 GPU with 32GB memory. May I ask what the GPU memory of your machine is when you finetuning and decoding the T5-large model? Hope to hear from you. Thank you so much!

Answer 1 · 2021-09-27T23:17:39.000Z

Hi @shixiao9941 , thank you very much for your interest in our research. Could you share more details and the training bash script you use? Do you use the recommend transformer version? Thank you!

Answer 2 · 2021-10-21T05:11:56.000Z

Hi Qingyun, I am really really sorry for the late reply. The email went to spam somehow and I didn't see the issue on Github so I was even doubtful whether I had made the comment on Github. I just found the email yesterday and redid the experiment. I figured out the reason that I got the issue because there were lots of sleeping processes on our server by other folks that cost the GPU memory. Your T5- large model can run on V100 GPUs. Thank you so much for your reply and help. And very sorry for the late reply. Best, Xiao

…

________________________________ From: Qingyun Wang ***@***.***> Sent: Wednesday, September 29, 2021 17:34 To: EagleW/Stage-wise-Fine-tuning ***@***.***> Cc: Shi, Xiao ***@***.***>; Mention ***@***.***> Subject: Re: [EagleW/Stage-wise-Fine-tuning] GPU memory issue (#2) Closed #2<https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FEagleW%2FStage-wise-Fine-tuning%2Fissues%2F2&data=04%7C01%7Cxiao.shi%40mavs.uta.edu%7Cb41426bab3b94f84d4e108d98399407f%7C5cdc5b43d7be4caa8173729e3b0a62d9%7C1%7C0%7C637685516498728572%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=iWkm89dKdrTO1hdQMpfgYuNYKSVzsKGUt3h53hfnh2U%3D&reserved=0>. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FEagleW%2FStage-wise-Fine-tuning%2Fissues%2F2%23event-5383867324&data=04%7C01%7Cxiao.shi%40mavs.uta.edu%7Cb41426bab3b94f84d4e108d98399407f%7C5cdc5b43d7be4caa8173729e3b0a62d9%7C1%7C0%7C637685516498738526%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=psght%2BgA57jRx%2BUTrhysfy%2FeifrDLJ%2BSQ5Zyp3bKWf0%3D&reserved=0>, or unsubscribe<https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAKWFX4CDL2UHGV75FPMXVV3UEOH6BANCNFSM5EZ7GHQQ&data=04%7C01%7Cxiao.shi%40mavs.uta.edu%7Cb41426bab3b94f84d4e108d98399407f%7C5cdc5b43d7be4caa8173729e3b0a62d9%7C1%7C0%7C637685516498738526%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=BpzNRcErqZgL68QHIUuXFIaB5i2Gcp3HBWDXSLfvZws%3D&reserved=0>. Triage notifications on the go with GitHub Mobile for iOS<https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fapps.apple.com%2Fapp%2Fapple-store%2Fid1477376905%3Fct%3Dnotification-email%26mt%3D8%26pt%3D524675&data=04%7C01%7Cxiao.shi%40mavs.uta.edu%7Cb41426bab3b94f84d4e108d98399407f%7C5cdc5b43d7be4caa8173729e3b0a62d9%7C1%7C0%7C637685516498738526%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=S7zeCctxlpvnQdnFq2%2Bw%2FFIA26r6nxZEDJpvQ81%2FWTE%3D&reserved=0> or Android<https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fplay.google.com%2Fstore%2Fapps%2Fdetails%3Fid%3Dcom.github.android%26referrer%3Dutm_campaign%253Dnotification-email%2526utm_medium%253Demail%2526utm_source%253Dgithub&data=04%7C01%7Cxiao.shi%40mavs.uta.edu%7Cb41426bab3b94f84d4e108d98399407f%7C5cdc5b43d7be4caa8173729e3b0a62d9%7C1%7C0%7C637685516498748482%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=%2FZcXbpEyInNto%2Bm%2BOAVZCXJ%2FGpiSPJaNAN8U9rFUTdI%3D&reserved=0>.

Answer 3 · 2021-10-21T05:33:09.000Z

Great! Have a good day!

Answer 4 · 2021-10-21T05:35:27.000Z

Thank you so much! Have a good day too! 😁 Best, Xiao Get Outlook for iOS<https://aka.ms/o0ukef>

…

________________________________ From: Qingyun Wang ***@***.***> Sent: Thursday, October 21, 2021 12:33:19 AM To: EagleW/Stage-wise-Fine-tuning ***@***.***> Cc: Shi, Xiao ***@***.***>; Mention ***@***.***> Subject: Re: [EagleW/Stage-wise-Fine-tuning] GPU memory issue (#2) Great! Have a good day! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FEagleW%2FStage-wise-Fine-tuning%2Fissues%2F2%23issuecomment-948272197&data=04%7C01%7Cxiao.shi%40mavs.uta.edu%7C9b9861110c694eb8638908d994544aee%7C5cdc5b43d7be4caa8173729e3b0a62d9%7C1%7C0%7C637703912027652907%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=AetUldPGYHJhbT9HInK0D9DQs3kC8XAvUqzvrSrMf5I%3D&reserved=0>, or unsubscribe<https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAKWFX4COLYA6ZPD5I4WJIPLUH6QZ7ANCNFSM5EZ7GHQQ&data=04%7C01%7Cxiao.shi%40mavs.uta.edu%7C9b9861110c694eb8638908d994544aee%7C5cdc5b43d7be4caa8173729e3b0a62d9%7C1%7C0%7C637703912027662865%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=QGzX77hd8oI91VkP%2F3Act3LcSoPkDR0XMcdGvtkWar8%3D&reserved=0>. Triage notifications on the go with GitHub Mobile for iOS<https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fapps.apple.com%2Fapp%2Fapple-store%2Fid1477376905%3Fct%3Dnotification-email%26mt%3D8%26pt%3D524675&data=04%7C01%7Cxiao.shi%40mavs.uta.edu%7C9b9861110c694eb8638908d994544aee%7C5cdc5b43d7be4caa8173729e3b0a62d9%7C1%7C0%7C637703912027662865%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=rkyBPng7X9iRftDTg0n7O2hsbzpwAp3eoYOREP83AdA%3D&reserved=0> or Android<https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fplay.google.com%2Fstore%2Fapps%2Fdetails%3Fid%3Dcom.github.android%26referrer%3Dutm_campaign%253Dnotification-email%2526utm_medium%253Demail%2526utm_source%253Dgithub&data=04%7C01%7Cxiao.shi%40mavs.uta.edu%7C9b9861110c694eb8638908d994544aee%7C5cdc5b43d7be4caa8173729e3b0a62d9%7C1%7C0%7C637703912027662865%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=HG%2BKxLc6v5dSwltFvz3DIeBkEUclpcLjB1aLLcJ8Nms%3D&reserved=0>.