Bad address
pragna96 opened this issue · 0 comments
Describe the bug
I have taken the t5 trivia code and implemented on my GPU server. I have copied all the 3B pretrained models from Google cloud to the local folder and changed the init_checkpoint path operative_config.gin file too. It is detecting the GPU but it's giving me a bad address error:
2021-09-16 12:30:47.882408: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 30996 MB memory: -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:15:00.0, compute capability: 7.0
2021-09-16 12:31:30.153454: W tensorflow/core/framework/op_kernel.cc:1692] OP_REQUIRES failed at save_restore_v2_ops.cc:207 : Invalid argument: /projects/prma7604/T5/3B/model.ckpt-1000000.data-00025-of-00064; Bad address
I'm using cuda 11.2 and cudnn 8.1