File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/pywrap_tensorflow_internal.py", line 1034, in GetNext return _pywrap_tensorflow_internal.PyRecordReader_GetNext(self) tensorflow.python.framework.errors_impl.DataLossError: corrupted record at 0 Execution status: FAIL
monajalal opened this issue · 1 comments
I get this error during training. All steps before this PASS with no error. Could you please guide me?
!docker run -it --rm --gpus all -v $LOCAL_PROJECT_DIR:/workspace/tao-experiments $DOCKER_CONTAINER \ detectnet_v2 train -e /workspace/tao-experiments/local/training/tao/specs/training/resnet18_distractors.txt \ -r /workspace/tao-experiments/local/training/tao/detectnet_v2/resnet18_palletjack -k $KEY --gpus $NUM_GPUS
==============================
=== TAO Toolkit TensorFlow ===
==============================
NVIDIA Release 4.0.0-TensorFlow (build )
TAO Toolkit Version 4.0.0
Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved.
This container image and its contents are governed by the TAO Toolkit End User License Agreement.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/tao-toolkit-software-license-agreement
NOTE: The SHMEM allocation limit is set to the default of 64MB. This may be
insufficient for TAO Toolkit. NVIDIA recommends the use of the following flags:
docker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 ...
Using TensorFlow backend.
2024-02-12 19:31:44.622414: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
/usr/local/lib/python3.6/dist-packages/requests/__init__.py:91: RequestsDependencyWarning: urllib3 (1.26.5) or chardet (3.0.4) doesn't match a supported version!
RequestsDependencyWarning)
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
/usr/local/lib/python3.6/dist-packages/requests/__init__.py:91: RequestsDependencyWarning: urllib3 (1.26.5) or chardet (3.0.4) doesn't match a supported version!
RequestsDependencyWarning)
Using TensorFlow backend.
2024-02-12 19:31:51,462 [INFO] root: Starting DetectNet_v2 Training job
2024-02-12 19:31:51,462 [INFO] __main__: Loading experiment spec at /workspace/tao-experiments/local/training/tao/specs/training/resnet18_distractors.txt.
2024-02-12 19:31:51,464 [INFO] iva.detectnet_v2.spec_handler.spec_loader: Merging specification from /workspace/tao-experiments/local/training/tao/specs/training/resnet18_distractors.txt
2024-02-12 19:31:51,467 [INFO] root: Training gridbox model.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:153: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.
2024-02-12 19:31:51,467 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:153: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.
Traceback (most recent call last):
File "</usr/local/lib/python3.6/dist-packages/iva/detectnet_v2/scripts/train.py>", line 3, in <module>
File "<frozen iva.detectnet_v2.scripts.train>", line 1022, in <module>
File "<frozen iva.detectnet_v2.scripts.train>", line 1011, in <module>
File "<decorator-gen-117>", line 2, in main
File "<frozen iva.detectnet_v2.utilities.timer>", line 46, in wrapped_fn
File "<frozen iva.detectnet_v2.scripts.train>", line 994, in main
File "<frozen iva.detectnet_v2.scripts.train>", line 853, in run_experiment
File "<frozen iva.detectnet_v2.scripts.train>", line 625, in train_gridbox
File "<frozen iva.detectnet_v2.dataloader.build_dataloader>", line 273, in build_dataloader
File "<frozen iva.detectnet_v2.dataloader.drivenet_dataloader>", line 491, in __init__
File "<frozen iva.detectnet_v2.dataloader.drivenet_dataloader>", line 548, in _construct_data_sources
File "<frozen iva.detectnet_v2.dataloader.drivenet_dataloader>", line 395, in __init__
File "<frozen iva.detectnet_v2.dataloader.drivenet_dataloader>", line 395, in <listcomp>
File "<frozen iva.detectnet_v2.dataloader.drivenet_dataloader>", line 394, in <genexpr>
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/lib/io/tf_record.py", line 181, in tf_record_iterator
reader.GetNext()
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/pywrap_tensorflow_internal.py", line 1034, in GetNext
return _pywrap_tensorflow_internal.PyRecordReader_GetNext(self)
tensorflow.python.framework.errors_impl.DataLossError: corrupted record at 0
Execution status: FAIL
I get this error during training. All steps before this PASS with no error. Could you please guide me?
!docker run -it --rm --gpus all -v $LOCAL_PROJECT_DIR:/workspace/tao-experiments $DOCKER_CONTAINER \ detectnet_v2 train -e /workspace/tao-experiments/local/training/tao/specs/training/resnet18_distractors.txt \ -r /workspace/tao-experiments/local/training/tao/detectnet_v2/resnet18_palletjack -k $KEY --gpus $NUM_GPUS
============================== === TAO Toolkit TensorFlow === ============================== NVIDIA Release 4.0.0-TensorFlow (build ) TAO Toolkit Version 4.0.0 Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved. This container image and its contents are governed by the TAO Toolkit End User License Agreement. By pulling and using the container, you accept the terms and conditions of this license: https://developer.nvidia.com/tao-toolkit-software-license-agreement NOTE: The SHMEM allocation limit is set to the default of 64MB. This may be insufficient for TAO Toolkit. NVIDIA recommends the use of the following flags: docker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 ... Using TensorFlow backend. 2024-02-12 19:31:44.622414: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them. /usr/local/lib/python3.6/dist-packages/requests/__init__.py:91: RequestsDependencyWarning: urllib3 (1.26.5) or chardet (3.0.4) doesn't match a supported version! RequestsDependencyWarning) WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them. /usr/local/lib/python3.6/dist-packages/requests/__init__.py:91: RequestsDependencyWarning: urllib3 (1.26.5) or chardet (3.0.4) doesn't match a supported version! RequestsDependencyWarning) Using TensorFlow backend. 2024-02-12 19:31:51,462 [INFO] root: Starting DetectNet_v2 Training job 2024-02-12 19:31:51,462 [INFO] __main__: Loading experiment spec at /workspace/tao-experiments/local/training/tao/specs/training/resnet18_distractors.txt. 2024-02-12 19:31:51,464 [INFO] iva.detectnet_v2.spec_handler.spec_loader: Merging specification from /workspace/tao-experiments/local/training/tao/specs/training/resnet18_distractors.txt 2024-02-12 19:31:51,467 [INFO] root: Training gridbox model. WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:153: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead. 2024-02-12 19:31:51,467 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:153: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead. Traceback (most recent call last): File "</usr/local/lib/python3.6/dist-packages/iva/detectnet_v2/scripts/train.py>", line 3, in <module> File "<frozen iva.detectnet_v2.scripts.train>", line 1022, in <module> File "<frozen iva.detectnet_v2.scripts.train>", line 1011, in <module> File "<decorator-gen-117>", line 2, in main File "<frozen iva.detectnet_v2.utilities.timer>", line 46, in wrapped_fn File "<frozen iva.detectnet_v2.scripts.train>", line 994, in main File "<frozen iva.detectnet_v2.scripts.train>", line 853, in run_experiment File "<frozen iva.detectnet_v2.scripts.train>", line 625, in train_gridbox File "<frozen iva.detectnet_v2.dataloader.build_dataloader>", line 273, in build_dataloader File "<frozen iva.detectnet_v2.dataloader.drivenet_dataloader>", line 491, in __init__ File "<frozen iva.detectnet_v2.dataloader.drivenet_dataloader>", line 548, in _construct_data_sources File "<frozen iva.detectnet_v2.dataloader.drivenet_dataloader>", line 395, in __init__ File "<frozen iva.detectnet_v2.dataloader.drivenet_dataloader>", line 395, in <listcomp> File "<frozen iva.detectnet_v2.dataloader.drivenet_dataloader>", line 394, in <genexpr> File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/lib/io/tf_record.py", line 181, in tf_record_iterator reader.GetNext() File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/pywrap_tensorflow_internal.py", line 1034, in GetNext return _pywrap_tensorflow_internal.PyRecordReader_GetNext(self) tensorflow.python.framework.errors_impl.DataLossError: corrupted record at 0 Execution status: FAIL
I also encountered the same problem and couldn't solve it.How did you solve it?