FLming/CRNN.tf2

Handling invalid image path or corrupted image files.

somso2e opened this issue · 0 comments

The problem:
In the current implementation, if a path in the annotation file provided to DatasetBuilder does not exist or one of the files is somehow corrupted, the entire training comes to a halt and you lose all your progress.
Considering it could take hours to iterate through all images, this becomes very frustrating.

I came across this problem because a few images (around 50) got somehow corrupted while downloading the MJSynth dataset. I did try to clean them up as this solution suggested, but I'm still encountering weird nonsensical errors:

     [30] try:
---> [31]    model.fit(train_ds,
     [32]              epochs=EPOCHS,
     [33]              callbacks=callbacks,
     [34]              validation_data=val_ds,
     [35]              use_multiprocessing=True)
     [36] except KeyboardInterrupt:
     [37]    pass

File c:\Users\somso\AppData\Local\Programs\Python\Python38\lib\site-packages\keras\utils\traceback_utils.py:70, in filter_traceback.<locals>.error_handler(*args, **kwargs)
     67     filtered_tb = _process_traceback_frames(e.__traceback__)
     68     # To get the full stack trace, call:
     69     # `tf.debugging.disable_traceback_filtering()`
---> 70     raise e.with_traceback(filtered_tb) from None
     71 finally:
     72     del filtered_tb

File c:\Users\somso\AppData\Local\Programs\Python\Python38\lib\site-packages\tensorflow\python\eager\execute.py:54, in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
     52 try:
     53   ctx.ensure_initialized()
---> 54   tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
     55                                       inputs, attrs, num_outputs)
     56 except core._NotOkStatusException as e:
     57   if name is not None:

InvalidArgumentError: Graph execution error:

2 root error(s) found.
  (0) INVALID_ARGUMENT:  jpeg::Uncompress failed. Invalid JPEG data or crop window.
	 [[{{node DecodeJpeg}}]]
	 [[IteratorGetNext]]
	 [[assert_equal_3/Assert/Assert/data_0/_4]]
  (1) INVALID_ARGUMENT:  jpeg::Uncompress failed. Invalid JPEG data or crop window.
	 [[{{node DecodeJpeg}}]]
	 [[IteratorGetNext]]
0 successful operations.
0 derived errors ignored. [Op:__inference_train_function_11983]

What could be done:
Catching the exceptions thrown while trying to load images and logging the catched exception in the terminal and skipping it.

I honestly tried to come up with a solution myself but I still cannot understand how DatasetBuilder works. lol
I'd be happy to make a PR myself if you have an idea of how to fix this.