GPU-Util is zero

Question

GPU-Util is zero

purse1996 opened this issue 7 years ago · 7 comments

purse1996 commented 7 years ago

Thank you for your sharing code. I met a problem. When I train the model for my data in GPU,the memory-usage is high but the GPU-util is always zero. So training speed is slow. It confuse me very much.

Answer 1 · 2018-10-06T18:25:03.000Z

Not sure what this means, I haven't experienced this error. Could you give me some steps to reproduce the error?

Answer 2 · 2019-05-13T14:54:18.000Z

i got the same issue plus resource-exhausted.

Answer 3 · 2019-05-18T15:46:03.000Z

Not sure what this means, I haven't experienced this error. Could you give me some steps to reproduce the error?

11%|███▍ | 1058/10000 [9:15:04<77:40:33, 31.27s/it]

Is this state normal with GPU ? I think this speed is a bit slow and GPU-Util is zero with 10000 img of 100*100, batchsize=32 ,32layer Net. thx~~

Answer 4 · 2019-05-19T02:56:08.000Z

Hey there, I believe you are right that the timing you show is quite slow. I think there may be a problem with your cuda setup, unrelated to the code here. This would probably cause zero GPU utilization. Nothing in this project directly edits what is being used in the gpu. That is done behind the scenes by tensorflow

Answer 5 · 2019-05-19T05:45:52.000Z

Hey there, I believe you are right that the timing you show is quite slow. I think there may be a problem with your cuda setup, unrelated to the code here. This would probably cause zero GPU utilization. Nothing in this project directly edits what is being used in the gpu. That is done behind the scenes by tensorflow

thx~~little brother ,hahah, Could u tell me your tensorflow-version , CUDA and cudnn version? （i use python2.7 env）

Answer 6 · 2019-05-21T07:45:10.000Z

i solved the issue. The reason is why data is not prepared when GPU need to compute. Now i put the data and label into files, GPU can get data immediately when the training stage.
Although your code decouples data processing and training networks. But it caused serial training instead of parallelism, and the main expenses were placed on the cpu. It is possible when the amount of data is small, but it is extremely inefficient when the data is slightly larger.

Answer 7 · 2019-05-25T18:10:50.000Z

I see, good catch. Do you have code for this? If so, please make a PR and I'll review it