tensorflow/tensorrt

Image Classification Example Int8 Quantization-Out of memory error

abc3698 opened this issue · 3 comments

I'm trying to quantize TF-TRT INT 8 Model in Colab-TF-TRT-inference-from-Keras-saved-model.ipynb using Jupyter notebook.

I faced gpu out of memory error. but i think i have enough gpu memory.

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.59       Driver Version: 440.59       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  TITAN X (Pascal)    Off  | 00000000:01:00.0  On |                  N/A |
| 34%   48C    P8    14W / 250W |  12026MiB / 12187MiB |      1%      Default |
+-------------------------------+----------------------+----------------------+

But this example is throwing me following error :

Cuda error in file src/winograd.cu at line 715: out of memory
Cuda error in file src/implicit_gemm.cu at line 648: out of memory
2020-03-05 14:09:13.009317: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-03-05 14:09:13.070448: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7

Is there any solution?

Can you try using set_memory_growth to prevent TF from allocating all of the GPU memory on startup? Does that address this issue?

@sanjoy Thanks for your reply. I applied set_memory_growth but I met same error :

2020-03-06 05:08:41.864093: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-03-06 05:08:41.864415: I tensorflow/core/grappler/devices.cc:55] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 1
2020-03-06 05:08:41.864473: I tensorflow/core/grappler/clusters/single_machine.cc:356] Starting new session
2020-03-06 05:08:41.864768: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-03-06 05:08:41.865073: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: TITAN X (Pascal) computeCapability: 6.1
coreClock: 1.531GHz coreCount: 28 deviceMemorySize: 11.90GiB deviceMemoryBandwidth: 447.48GiB/s
2020-03-06 05:08:41.865100: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-03-06 05:08:41.865110: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-03-06 05:08:41.865122: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-03-06 05:08:41.865131: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-03-06 05:08:41.865143: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-03-06 05:08:41.865155: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-03-06 05:08:41.865166: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-03-06 05:08:41.865212: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-03-06 05:08:41.865536: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-03-06 05:08:41.865840: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-03-06 05:08:41.865858: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-03-06 05:08:41.865864: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102]      0 
2020-03-06 05:08:41.865870: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0:   N 
2020-03-06 05:08:41.865930: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-03-06 05:08:41.866257: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-03-06 05:08:41.866568: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10803 MB memory) -> physical GPU (device: 0, name: TITAN X (Pascal), pci bus id: 0000:01:00.0, compute capability: 6.1)
2020-03-06 05:08:41.968599: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:814] Optimization results for grappler item: graph_to_optimize
2020-03-06 05:08:41.968629: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:816]   function_optimizer: Graph size after: 1254 nodes (931), 2553 edges (2230), time = 54.414ms.
2020-03-06 05:08:41.968635: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:816]   function_optimizer: function_optimizer did nothing. time = 0.442ms.
2020-03-06 05:08:43.423059: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-03-06 05:08:43.423386: I tensorflow/core/grappler/devices.cc:55] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 1
2020-03-06 05:08:43.423443: I tensorflow/core/grappler/clusters/single_machine.cc:356] Starting new session
2020-03-06 05:08:43.423720: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-03-06 05:08:43.424015: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: TITAN X (Pascal) computeCapability: 6.1
coreClock: 1.531GHz coreCount: 28 deviceMemorySize: 11.90GiB deviceMemoryBandwidth: 447.48GiB/s
2020-03-06 05:08:43.424042: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-03-06 05:08:43.424053: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-03-06 05:08:43.424066: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-03-06 05:08:43.424077: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-03-06 05:08:43.424088: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-03-06 05:08:43.424098: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-03-06 05:08:43.424107: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-03-06 05:08:43.424145: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-03-06 05:08:43.424451: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-03-06 05:08:43.424730: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-03-06 05:08:43.424749: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-03-06 05:08:43.424755: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102]      0 
2020-03-06 05:08:43.424761: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0:   N 
2020-03-06 05:08:43.424822: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-03-06 05:08:43.425131: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-03-06 05:08:43.425414: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10803 MB memory) -> physical GPU (device: 0, name: TITAN X (Pascal), pci bus id: 0000:01:00.0, compute capability: 6.1)
2020-03-06 05:08:44.396934: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:460] There are 6 ops of 3 different types in the graph that are not converted to TensorRT: Identity, NoOp, Placeholder, (For more information see https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html#supported-ops).
2020-03-06 05:08:44.469006: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:636] Number of TensorRT candidate segments: 1
2020-03-06 05:08:44.595068: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:737] Replaced segment 0 consisting of 507 nodes by TRTEngineOp_0.
2020-03-06 05:08:45.026797: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:814] Optimization results for grappler item: tf_graph
2020-03-06 05:08:45.026824: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:816]   constant_folding: Graph size after: 881 nodes (-320), 1860 edges (-640), time = 331.343ms.
2020-03-06 05:08:45.026828: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:816]   layout: Graph size after: 885 nodes (4), 1864 edges (4), time = 182.591ms.
2020-03-06 05:08:45.026832: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:816]   constant_folding: Graph size after: 883 nodes (-2), 1862 edges (-2), time = 112.166ms.
2020-03-06 05:08:45.026835: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:816]   TensorRTOptimizer: Graph size after: 377 nodes (-506), 430 edges (-1432), time = 411.723ms.
2020-03-06 05:08:45.026839: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:816]   constant_folding: Graph size after: 377 nodes (0), 430 edges (0), time = 4.949ms.
2020-03-06 05:08:45.026844: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:814] Optimization results for grappler item: TRTEngineOp_0_native_segment
2020-03-06 05:08:45.026848: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:816]   constant_folding: Graph size after: 509 nodes (0), 792 edges (0), time = 62.461ms.
2020-03-06 05:08:45.026853: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:816]   layout: Graph size after: 509 nodes (0), 792 edges (0), time = 91.355ms.
2020-03-06 05:08:45.026856: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:816]   constant_folding: Graph size after: 509 nodes (0), 792 edges (0), time = 64.46ms.
2020-03-06 05:08:45.026860: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:816]   TensorRTOptimizer: Graph size after: 509 nodes (0), 792 edges (0), time = 10.872ms.
2020-03-06 05:08:45.026864: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:816]   constant_folding: Graph size after: 509 nodes (0), 792 edges (0), time = 64.135ms.
2020-03-06 05:08:46.942320: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer.so.6
2020-03-06 05:08:46.994748: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer_plugin.so.6
2020-03-06 05:08:47.195947: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:38] DefaultLogger Calling isShapeTensor before the entire network is constructed may result in an inaccurate result.
2020-03-06 05:08:47.205115: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:38] DefaultLogger Half2 support requested on hardware without native FP16 support, performance will be negatively affected.
Cuda error in file src/implicit_gemm.cu at line 585: out of memory
Cuda error in file src/winograd.cu at line 715: out of memory
Cuda error in file src/implicit_gemm.cu at line 585: out of memory
2020-03-06 05:09:08.673772: E tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:42] DefaultLogger ../rtSafe/cuda/cudaPaddingRunner.cpp (51) - Cuda Error in execute: 2 (out of memory)
2020-03-06 05:09:08.690029: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-03-06 05:09:08.700573: E tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:42] DefaultLogger FAILED_EXECUTION: std::exception
2020-03-06 05:09:08.767901: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
terminate called after throwing an instance of 'std::out_of_range'
  what():  _Map_base::at

@tfeher Can you PTAL?