titu1994/tf-TabNet

Check failed: work_element_count > 0 (0 vs. 0)

SirJohnFranklin opened this issue · 11 comments

Hi,
If I run your mnist example I get an error message like this:

"Check failed: work_element_count > 0 (0 vs. 0)"

Seems so be related to tensorflow, but generally tensorflow is working fine for me - thus I'll post it here. Do you have any idea?

Thanks!

Full error log is:

2020-03-05 21:26:48.868876: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties: 
pciBusID: 0000:0a:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1
coreClock: 1.607GHz coreCount: 28 deviceMemorySize: 11.00GiB deviceMemoryBandwidth: 451.17GiB/s
2020-03-05 21:26:48.869296: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-03-05 21:26:48.869429: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-03-05 21:26:48.869561: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-03-05 21:26:48.869684: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-03-05 21:26:48.869806: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-03-05 21:26:48.869930: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-03-05 21:26:48.870056: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-03-05 21:26:48.870515: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-03-05 21:26:48.870902: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties: 
pciBusID: 0000:0a:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1
coreClock: 1.607GHz coreCount: 28 deviceMemorySize: 11.00GiB deviceMemoryBandwidth: 451.17GiB/s
2020-03-05 21:26:48.871150: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-03-05 21:26:48.871277: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-03-05 21:26:48.871400: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-03-05 21:26:48.871526: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-03-05 21:26:48.871655: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-03-05 21:26:48.871784: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-03-05 21:26:48.871912: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-03-05 21:26:48.872321: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-03-05 21:26:48.872526: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-03-05 21:26:48.872759: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102]      0 
2020-03-05 21:26:48.872893: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0:   N 
2020-03-05 21:26:48.873502: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9011 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:0a:00.0, compute capability: 6.1)
Epoch 1/5
2020-03-05 21:26:56.571332: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-03-05 21:26:56.963017: F .\tensorflow/core/util/gpu_launch_config.h:129] Check failed: work_element_count > 0 (0 vs. 0)
Process finished with exit code -1073740791 (0xC0000409)

I'm able to run examples without issue on 2.1. Perhaps an issue with your GPU config ? Negative error codes often describe issues with tensorflow setup.

I'm able to run examples without issue on 2.1. Perhaps an issue with your GPU config ? Negative error codes often describe issues with tensorflow setup.

Thanks for checking and your fast reply! I'll check my setup.

@SirJohnFranklin did you manage to fix this issue? I'm facing the same problem right now

usufu commented

any updates here?same issue

I think the issue is related to tensorflow. If I disable the GPU, it's working.

To disable the GPU, add:
os.environ["CUDA_VISIBLE_DEVICES"] = "-1"

Same issue here with TensorFlow 2.2.0:

2020-05-18 16:55:46.901545: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-05-18 16:55:46.911811: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-18 16:55:46.912141: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce GTX 750 Ti computeCapability: 5.0
coreClock: 1.1105GHz coreCount: 5 deviceMemorySize: 1.95GiB deviceMemoryBandwidth: 80.47GiB/s
2020-05-18 16:55:46.916367: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-05-18 16:55:47.008013: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-05-18 16:55:47.062914: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-05-18 16:55:47.075725: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-05-18 16:55:47.167468: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-05-18 16:55:47.179648: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-05-18 16:55:47.358981: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-05-18 16:55:47.359320: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-18 16:55:47.360100: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-18 16:55:47.360733: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-05-18 16:55:47.361377: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-05-18 16:55:47.390053: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 3099055000 Hz
2020-05-18 16:55:47.390434: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7fc8c4000b20 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-05-18 16:55:47.390461: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-05-18 16:55:47.422871: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-18 16:55:47.423235: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x3bd7330 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-05-18 16:55:47.423252: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce GTX 750 Ti, Compute Capability 5.0
2020-05-18 16:55:47.423462: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-18 16:55:47.423736: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce GTX 750 Ti computeCapability: 5.0
coreClock: 1.1105GHz coreCount: 5 deviceMemorySize: 1.95GiB deviceMemoryBandwidth: 80.47GiB/s
2020-05-18 16:55:47.423797: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-05-18 16:55:47.423819: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-05-18 16:55:47.423829: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-05-18 16:55:47.423849: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-05-18 16:55:47.423868: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-05-18 16:55:47.423878: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-05-18 16:55:47.423888: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-05-18 16:55:47.423954: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-18 16:55:47.424241: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-18 16:55:47.424469: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-05-18 16:55:47.424515: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-05-18 16:55:47.425811: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-05-18 16:55:47.425844: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108]      0 
2020-05-18 16:55:47.425852: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0:   N 
2020-05-18 16:55:47.426014: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-18 16:55:47.426303: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-18 16:55:47.426557: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1036 MB memory) -> physical GPU (device: 0, name: GeForce GTX 750 Ti, pci bus id: 0000:01:00.0, compute capability: 5.0)
Epoch 1/5
2020-05-18 16:57:11.648760: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-05-18 16:57:13.682660: F ./tensorflow/core/util/gpu_launch_config.h:129] Check failed: work_element_count > 0 (0 vs. 0)
Aborted (core dumped)

This happens while running the mnist example. Same issue when I run my desired tabnet regression experiment. Both work fine if I disable the GPU by adding tf.config.set_visible_devices([], 'GPU').

Please note that TF works normally on my machine including GPU when running other projects (another sequential or functional keras models).

Tested with CUDA 10.1, libcudnn 7 (7.6.5.32-1+cuda10.1), nvidia drivers 440.64, Ubuntu 18.04

Downgraded TF to 2.1.0 and re-tested: it did not help (same issue)

I'm able to run examples without issue on 2.1. Perhaps an issue with your GPU config ? Negative error codes often describe issues with tensorflow setup.

@titu1994 could you please provide which version of CUDA, cudadnn and nvidia drivers did you use? I would like to replicate successful run on GPU

Any updates? Same issue here with TF 2.2

Same issue running examples on GPU with TF 2.3.0, CUDA 10.1, Windows 10.
Examples execute successfully on CPU.

Closed via 5a32893