tensorflow/tensorrt

Jupyter Notebook kernel dies automatically

WeiFoo opened this issue · 8 comments

I was trying to run the following notebook on Ubuntu 18.04 with T4 GPU on EC2,

https://github.com/tensorflow/tensorrt/blob/master/tftrt/examples/image-classification/TFv2-TF-TRT-inference-from-Keras-saved-model.ipynb

I can run most cells until TF-TRT FP32 model section, the kernel will die automatically.

I even restarted the runtime and just ran the following code, the kernel still die

conversion_params = trt.DEFAULT_TRT_CONVERSION_PARAMS._replace(precision_mode=trt.TrtPrecisionMode.FP32,
                                                               max_workspace_size_bytes=8000000000)

converter = trt.TrtGraphConverterV2(input_saved_model_dir='resnet50_saved_model',
                                    conversion_params=conversion_params)
converter.convert()
converter.save(output_saved_model_dir='resnet50_saved_model_TFTRT_FP32')
print('Done Converting to TF-TRT FP32')

Anyone has an idea? Thanks!

I am also experiencing the same. Any pointers @pooyadavoodi?

I have seen an issue related to running int8 calibration in the same process that previously ran fp32/fp16 conversion. But if you run each conversion once per process, I expect it to work.

I just tried TF-TRT FP32 and it worked.
I got the following perf on a P100:

Step 0: 10.8ms
Step 50: 10.8ms
Step 100: 10.8ms
Step 150: 10.8ms
Step 200: 10.8ms
Step 250: 10.8ms
Step 300: 10.8ms
Step 350: 10.8ms
Step 400: 10.8ms
Step 450: 10.8ms
Step 500: 10.8ms
Step 550: 10.8ms
Step 600: 10.8ms
Step 650: 10.8ms
Step 700: 10.8ms
Step 750: 10.8ms
Step 800: 10.8ms
Step 850: 10.8ms
Step 900: 10.8ms
Step 950: 10.8ms
Throughput: 742 images/s

Perhaps some colab nodes aren't stable?

Yeah, it might be the case. Worth propagating to the Colab team, I guess.

My kernel is also dying when doing this step. Running Jupyter inside the docker container with FP32, FP16 on GTX 1650.

I had the same symptoms. In my case, it was caused by not adding <your tensorRT path>\lib to LD_LIBRARY_PATH before running jupyter lab. Adding Path and running Jupyter lab again solved it.

azayz commented

Hey, my kernel dies not during conversion and optimization of the model but during inference, it converts the model smoothly both with fp16 and fp32 but during inference ( predicting one image) the kernel dies and automatically restarts. Any help ?

I actually put together a tutorial a few days back that shows how to use TensorRT in an end-to-end manner for accelerating inference: https://sayak.dev/tf.keras/tensorrt/tensorflow/2020/07/01/accelerated-inference-trt.html. Hope this will be helpful.