NVIDIA-AI-IOT/tf_to_trt_image_classification

Killed

Davidnet opened this issue · 5 comments

On a brand new jetson while running the test:

ubuntu@tegra-ubuntu:~/Documents/Projects/tf_to_trt_image_classification$ python scripts/test_tf.py
Testing mobilenet_v1_0p25_128
2018-03-15 01:29:49.697714: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:865] ARM64 does not support NUMA - returning NUMA node zero                                                                                              
2018-03-15 01:29:49.697838: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1212] Found device 0 with properties:
name: NVIDIA Tegra X2 major: 6 minor: 2 memoryClockRate(GHz): 1.3005
pciBusID: 0000:00:00.0
totalMemory: 7.67GiB freeMemory: 6.14GiB
2018-03-15 01:29:49.697886: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1312] Adding visible gpu devices: 0
2018-03-15 01:29:50.859554: I tensorflow/core/common_runtime/gpu/gpu_device.cc:993] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5664 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus
id: 0000:00:00.0, compute capability: 6.2)
['Gordon setter\n', 'Rottweiler\n', 'Tibetan mastiff\n', 'black-and-tan coonhound\n', 'flat-coated retriever\n']
Testing resnet_v1_50
2018-03-15 01:29:57.964211: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1312] Adding visible gpu devices: 0
2018-03-15 01:29:57.964345: I tensorflow/core/common_runtime/gpu/gpu_device.cc:993] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4491 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus
id: 0000:00:00.0, compute capability: 6.2)
['Gordon setter\n', 'Irish setter, red setter\n', 'cocker spaniel, English cocker spaniel, cocker\n', 'black-and-tan coonhound\n', 'Rottweiler\n']                                                                                          
Testing mobilenet_v1_1p0_224
2018-03-15 01:30:06.583719: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1312] Adding visible gpu devices: 0
2018-03-15 01:30:06.583871: I tensorflow/core/common_runtime/gpu/gpu_device.cc:993] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 2178 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus
id: 0000:00:00.0, compute capability: 6.2)
['Gordon setter\n', 'Irish setter, red setter\n', 'black-and-tan coonhound\n', 'cocker spaniel, English cocker spaniel, cocker\n', 'Tibetan mastiff\n']                                                                                     
Testing inception_v2
2018-03-15 01:30:11.103292: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1312] Adding visible gpu devices: 0
2018-03-15 01:30:11.103426: I tensorflow/core/common_runtime/gpu/gpu_device.cc:993] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 2131 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus
id: 0000:00:00.0, compute capability: 6.2)
['Gordon setter\n', 'Irish setter, red setter\n', 'cocker spaniel, English cocker spaniel, cocker\n', 'Rottweiler\n', 'black-and-tan coonhound\n']                                                                                          
Testing inception_v3
2018-03-15 01:30:19.184979: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1312] Adding visible gpu devices: 0
2018-03-15 01:30:19.185114: I tensorflow/core/common_runtime/gpu/gpu_device.cc:993] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 2030 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus
id: 0000:00:00.0, compute capability: 6.2)
['Gordon setter\n', 'Rottweiler\n', 'Irish setter, red setter\n', 'black-and-tan coonhound\n', 'English setter\n']
Testing resnet_v2_152
2018-03-15 01:30:38.026162: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1312] Adding visible gpu devices: 0
2018-03-15 01:30:38.026349: I tensorflow/core/common_runtime/gpu/gpu_device.cc:993] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 27 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)
Killed

new Jetpack. Any ideas why the jetson is not returning full memory?

We are currently looking into the TensorFlow related memory issues. However, restarting the Jetson TX2 you should be able to run the entire script.

No, I haven't be able to replicate the script, every time gets killed by memory consumption

I have seen that creating sessions and closing sessions it actually downs the available gpu memory usage. Maybe a bug in tf part?

@jaybdub-nv I can report the same issue. Reboot does not help. Memory consumption is constantly around 6.5GB but in resnet_v2_152 goes through the limit of 8GB.

I am also reporting this problem. Please let me know if you find any solutions :)