MCZhi/DIPP

An error encountered when running open_loop_test.py

Closed this issue · 2 comments

J1dan commented

Hello, hope you are doing well. After training the model, when I run the command
python open_loop_test.py --name open_loop --test_set ./ValData/ --model_path ./training_log/DIPP/model_1_1.6529.pth --use_planning --render --save --device cpu
, I encounter an error
tensorflow.python.framework.errors_impl.DataLossError: corrupted record at 0 [Op:IteratorGetNext].
Below is the log. ValData is the directory where I put the validation data. Could you please help with it? Thanks.

2022-10-24 11:54:56.033651: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-10-24 11:54:56.033936: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-10-24 11:54:56.034017: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
------------- open_loop -------------
Use integrated planning module: True
Use device: cpu
2022-10-24 11:54:56.166509: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-10-24 11:54:56.201599: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)
Traceback (most recent call last):
File "open_loop_test.py", line 227, in
open_loop_test()
File "open_loop_test.py", line 52, in open_loop_test
for scenario in scenarios:
File "/home/jidan/anaconda3/envs/DIPP/lib/python3.8/site-packages/tensorflow/python/data/ops/iterator_ops.py", line 761, in next
return self._next_internal()
File "/home/jidan/anaconda3/envs/DIPP/lib/python3.8/site-packages/tensorflow/python/data/ops/iterator_ops.py", line 744, in _next_internal
ret = gen_dataset_ops.iterator_get_next(
File "/home/jidan/anaconda3/envs/DIPP/lib/python3.8/site-packages/tensorflow/python/ops/gen_dataset_ops.py", line 2728, in iterator_get_next
_ops.raise_from_not_ok_status(e, name)
File "/home/jidan/anaconda3/envs/DIPP/lib/python3.8/site-packages/tensorflow/python/framework/ops.py", line 6941, in raise_from_not_ok_status
six.raise_from(core._status_to_exception(e.code, message), None)
File "", line 3, in raise_from
tensorflow.python.framework.errors_impl.DataLossError: corrupted record at 0 [Op:IteratorGetNext]

MCZhi commented

Hi, @J1dan, I guess that's the problem with the tensorflow when reading the tfrecord file. I don't know exactly what caused this error but you can try to reinstall the package waymo-open-dataset-tf-2-6-0.

J1dan commented

I just found that it is the test directory that I chosen that went wrong, which I used the processed data. After changing the path the problem is addressed. Thank you for your reply.