Issue while using the model and json
rajban94 opened this issue · 8 comments
I am getting the below error while trying to get the data using the code
file = 'out_10.jpg'
cv_img = cv2.imread(file,0)
predictor = Predictor.from_checkpoint(
params=PredictorParams(),
checkpoint='./models/cal_model.ckpt')
calamari_output = {}
for sample in predictor.predict_raw(cv_img):
inputs, prediction, meta = sample.inputs, sample.outputs, sample.meta
pred_text = prediction.sentence
avg_char_probability = 0
for p in prediction.positions:
if len(p.chars) > 0:
avg_char_probability += p.chars[0].probability
avg_char_probability /= len(prediction.positions) if len(prediction.positions) > 0 else 1
#print(prediction.avg_char_probability)
pred_confidence = round(avg_char_probability * 100, 1)
calamari_output['name'] = [pred_text, pred_confidence]
Error:::
2023-01-23 18:20:15.833943: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2023-01-23 18:20:15.834081: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tensorflow_addons\utils\ensure_tf_install.py:67: UserWarning: Tensorflow Addons supports using Python ops for all Tensorflow versions above or equal to 2.9.0 and strictly below 2.12.0 (nightly versions are not supported).
The versions of TensorFlow you are currently using is 2.6.5 and is not supported.
Some things might work, some things might not.
If you were to encounter a bug, do not file an issue.
If you want to make sure you're using a tested and supported configuration, either change the TensorFlow version or the TensorFlow Addons's version.
You can find the compatibility matrix in TensorFlow Addon's readme:
https://github.com/tensorflow/addons
UserWarning,
INFO 2023-01-23 18:20:17,488 tfaip.device.device_config: Setting up device config DeviceConfigParams(gpus=None, gpu_auto_tune=False, gpu_memory=None, soft_device_placement=True, dist_strategy=<DistributionStrategy.DEFAULT: 'default'>)
2023-01-23 18:20:17.494646: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'nvcuda.dll'; dlerror: nvcuda.dll not found
2023-01-23 18:20:17.494742: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
2023-01-23 18:20:17.497412: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: LAPTOP-U30QQNA8
2023-01-23 18:20:17.497586: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: LAPTOP-U30QQNA8
INFO 2023-01-23 18:20:17,495 calamari_ocr.ocr.savedmodel.sa: Checkpoint version 5 is up-to-date.
INFO 2023-01-23 18:20:17,519 tfaip.device.device_config: Setting up device config DeviceConfigParams(gpus=None, gpu_auto_tune=False, gpu_memory=None, soft_device_placement=True, dist_strategy=<DistributionStrategy.DEFAULT: 'default'>)
2023-01-23 18:20:17.530284: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
WARNING 2023-01-23 18:20:18,624 tensorflow: No training configuration found in the save file, so the model was *not* compiled. Compile it manually.
WARNING 2023-01-23 18:20:18,624 tensorflow: No training configuration found in the save file, so the model was *not* compiled. Compile it manually.
Prediction: 0%| | 0/262 [00:00<?, ?it/s]2023-01-23 18:20:18.929519: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)
2023-01-23 18:20:19.885121: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2023-01-23 18:20:19.885257: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tensorflow_addons\utils\ensure_tf_install.py:67: UserWarning: Tensorflow Addons supports using Python ops for all Tensorflow versions above or equal to 2.9.0 and strictly below 2.12.0 (nightly versions are not supported).
The versions of TensorFlow you are currently using is 2.6.5 and is not supported.
Some things might work, some things might not.
If you were to encounter a bug, do not file an issue.
If you want to make sure you're using a tested and supported configuration, either change the TensorFlow version or the TensorFlow Addons's version.
You can find the compatibility matrix in TensorFlow Addon's readme:
https://github.com/tensorflow/addons
UserWarning,
INFO 2023-01-23 18:20:21,432 tfaip.device.device_config: Setting up device config DeviceConfigParams(gpus=None, gpu_auto_tune=False, gpu_memory=None, soft_device_placement=True, dist_strategy=<DistributionStrategy.DEFAULT: 'default'>)
2023-01-23 18:20:21.435367: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'nvcuda.dll'; dlerror: nvcuda.dll not found
2023-01-23 18:20:21.435484: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
2023-01-23 18:20:21.438304: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: LAPTOP-U30QQNA8
2023-01-23 18:20:21.438477: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: LAPTOP-U30QQNA8
INFO 2023-01-23 18:20:21,436 calamari_ocr.ocr.savedmodel.sa: Checkpoint version 5 is up-to-date.
INFO 2023-01-23 18:20:21,456 tfaip.device.device_config: Setting up device config DeviceConfigParams(gpus=None, gpu_auto_tune=False, gpu_memory=None, soft_device_placement=True, dist_strategy=<DistributionStrategy.DEFAULT: 'default'>)
2023-01-23 18:20:21.516634: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
WARNING 2023-01-23 18:20:22,690 tensorflow: No training configuration found in the save file, so the model was *not* compiled. Compile it manually.
WARNING 2023-01-23 18:20:22,690 tensorflow: No training configuration found in the save file, so the model was *not* compiled. Compile it manually.
Prediction: 0%| | 0/262 [00:00<?, ?it/s]2023-01-23 18:20:23.067125: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)
2023-01-23 18:20:23.460350: W tensorflow/core/framework/op_kernel.cc:1680] Unknown: RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.
Traceback (most recent call last):
File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tensorflow\python\ops\script_ops.py", line 249, in __call__
ret = func(*args)
File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tensorflow\python\autograph\impl\api.py", line 645, in wrapper
return func(*args, **kwargs)
File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tensorflow\python\data\ops\dataset_ops.py", line 892, in generator_py_func
values = next(generator_state.get_iterator(iterator_id))
File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tfaip\data\pipeline\runningdatapipeline.py", line 164, in generator
for s in samples:
File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tfaip\data\pipeline\runningdatapipeline.py", line 214, in _generate_input_samples
for s in generate:
File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tfaip\data\pipeline\processor\sample\processorpipeline.py", line 114, in _apply
with parallel_pipeline as output_generator:
File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tfaip\util\multiprocessing\data\pipeline.py", line 66, in __enter__
maxtasksperchild=self.max_tasks_per_child,
File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tfaip\util\multiprocessing\data\pool.py", line 62, in __init__
super().__init__(initializer=Initializer(worker_constructor), **kwargs)
File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\multiprocessing\pool.py", line 176, in __init__
self._repopulate_pool()
File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\multiprocessing\pool.py", line 241, in _repopulate_pool
w.start()
File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\multiprocessing\process.py", line 112, in start
self._popen = self._Popen(self)
File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\multiprocessing\popen_spawn_win32.py", line 46, in __init__
prep_data = spawn.get_preparation_data(process_obj._name)
File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\multiprocessing\spawn.py", line 143, in get_preparation_data
_check_not_importing_main()
File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\multiprocessing\spawn.py", line 136, in _check_not_importing_main
is not going to be frozen to produce an executable.''')
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.
Prediction: 0%| | 0/262 [00:00<?, ?it/s]
CRITICAL 2023-01-23 18:20:23,456 tfaip.util.logging: Uncaught exception
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\multiprocessing\spawn.py", line 114, in _main
prepare(preparation_data)
File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\multiprocessing\spawn.py", line 225, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
run_name="__mp_main__")
File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\runpy.py", line 263, in run_path
pkg_name=pkg_name, script_name=fname)
File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\runpy.py", line 96, in _run_module_code
mod_name, mod_spec, pkg_name, script_name)
File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "C:\Users\RISHAV\Documents\ML_Flow\extraction.py", line 86, in <module>
for sample in predictor.predict_raw(cv_img):
File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tfaip\predict\predictorbase.py", line 215, in predict_pipeline
total=n_samples,
File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tqdm\std.py", line 1195, in __iter__
for obj in iterable:
File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tfaip\data\pipeline\processor\sample\processorpipeline.py", line 84, in _apply
for sample in samples:
File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tfaip\predict\predictorbase.py", line 279, in predict_dataset
r = predict_function(iterator) # hack to access inputs
File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tensorflow\python\eager\def_function.py", line 885, in __call__
result = self._call(*args, **kwds)
File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tensorflow\python\eager\def_function.py", line 957, in _call
filtered_flat_args, self._concrete_stateful_fn.captured_inputs) # pylint: disable=protected-access
File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tensorflow\python\eager\function.py", line 1964, in _call_flat
ctx, args, cancellation_manager=cancellation_manager))
File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tensorflow\python\eager\function.py", line 596, in call
ctx=ctx)
File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tensorflow\python\eager\execute.py", line 60, in quick_execute
inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.UnknownError: RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.
Traceback (most recent call last):
File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tensorflow\python\ops\script_ops.py", line 249, in __call__
ret = func(*args)
File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tensorflow\python\autograph\impl\api.py", line 645, in wrapper
return func(*args, **kwargs)
File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tensorflow\python\data\ops\dataset_ops.py", line 892, in generator_py_func
values = next(generator_state.get_iterator(iterator_id))
File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tfaip\data\pipeline\runningdatapipeline.py", line 164, in generator
for s in samples:
File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tfaip\data\pipeline\runningdatapipeline.py", line 214, in _generate_input_samples
for s in generate:
File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tfaip\data\pipeline\processor\sample\processorpipeline.py", line 114, in _apply
with parallel_pipeline as output_generator:
File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tfaip\util\multiprocessing\data\pipeline.py", line 66, in __enter__
maxtasksperchild=self.max_tasks_per_child,
File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tfaip\util\multiprocessing\data\pool.py", line 62, in __init__
super().__init__(initializer=Initializer(worker_constructor), **kwargs)
File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\multiprocessing\pool.py", line 176, in __init__
self._repopulate_pool()
File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\multiprocessing\pool.py", line 241, in _repopulate_pool
w.start()
File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\multiprocessing\process.py", line 112, in start
self._popen = self._Popen(self)
File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\multiprocessing\popen_spawn_win32.py", line 46, in __init__
prep_data = spawn.get_preparation_data(process_obj._name)
File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\multiprocessing\spawn.py", line 143, in get_preparation_data
_check_not_importing_main()
File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\multiprocessing\spawn.py", line 136, in _check_not_importing_main
is not going to be frozen to produce an executable.''')
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.
[[{{node PyFunc}}]]
[[IteratorGetNext]] [Op:__inference_predict_function_2234]
Function call stack:
predict_function
2023-01-23 18:20:23.484051: W tensorflow/core/framework/op_kernel.cc:1680] Unknown: BrokenPipeError: [Errno 32] Broken pipe
Traceback (most recent call last):
File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tensorflow\python\ops\script_ops.py", line 249, in __call__
ret = func(*args)
File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tensorflow\python\autograph\impl\api.py", line 645, in wrapper
return func(*args, **kwargs)
File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tensorflow\python\data\ops\dataset_ops.py", line 892, in generator_py_func
values = next(generator_state.get_iterator(iterator_id))
File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tfaip\data\pipeline\runningdatapipeline.py", line 164, in generator
for s in samples:
File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tfaip\data\pipeline\runningdatapipeline.py", line 214, in _generate_input_samples
for s in generate:
File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tfaip\data\pipeline\processor\sample\processorpipeline.py", line 114, in _apply
with parallel_pipeline as output_generator:
File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tfaip\util\multiprocessing\data\pipeline.py", line 66, in __enter__
maxtasksperchild=self.max_tasks_per_child,
File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tfaip\util\multiprocessing\data\pool.py", line 62, in __init__
super().__init__(initializer=Initializer(worker_constructor), **kwargs)
File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\multiprocessing\pool.py", line 176, in __init__
self._repopulate_pool()
File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\multiprocessing\pool.py", line 241, in _repopulate_pool
w.start()
File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\multiprocessing\process.py", line 112, in start
self._popen = self._Popen(self)
File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\multiprocessing\popen_spawn_win32.py", line 89, in __init__
reduction.dump(process_obj, to_child)
File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
BrokenPipeError: [Errno 32] Broken pipe
Prediction: 0%| | 0/262 [00:04<?, ?it/s]
CRITICAL 2023-01-23 18:20:23,487 tfaip.util.logging: Uncaught exception
Traceback (most recent call last):
File "extraction.py", line 86, in <module>
for sample in predictor.predict_raw(cv_img):
File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tfaip\predict\predictorbase.py", line 215, in predict_pipeline
total=n_samples,
File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tqdm\std.py", line 1195, in __iter__
for obj in iterable:
File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tfaip\data\pipeline\processor\sample\processorpipeline.py", line 84, in _apply
for sample in samples:
File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tfaip\predict\predictorbase.py", line 279, in predict_dataset
r = predict_function(iterator) # hack to access inputs
File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tensorflow\python\eager\def_function.py", line 885, in __call__
result = self._call(*args, **kwds)
File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tensorflow\python\eager\def_function.py", line 957, in _call
filtered_flat_args, self._concrete_stateful_fn.captured_inputs) # pylint: disable=protected-access
File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tensorflow\python\eager\function.py", line 1964, in _call_flat
ctx, args, cancellation_manager=cancellation_manager))
File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tensorflow\python\eager\function.py", line 596, in call
ctx=ctx)
File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tensorflow\python\eager\execute.py", line 60, in quick_execute
inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.UnknownError: BrokenPipeError: [Errno 32] Broken pipe
Traceback (most recent call last):
File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tensorflow\python\ops\script_ops.py", line 249, in __call__
ret = func(*args)
File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tensorflow\python\autograph\impl\api.py", line 645, in wrapper
return func(*args, **kwargs)
File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tensorflow\python\data\ops\dataset_ops.py", line 892, in generator_py_func
values = next(generator_state.get_iterator(iterator_id))
File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tfaip\data\pipeline\runningdatapipeline.py", line 164, in generator
for s in samples:
File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tfaip\data\pipeline\runningdatapipeline.py", line 214, in _generate_input_samples
for s in generate:
File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tfaip\data\pipeline\processor\sample\processorpipeline.py", line 114, in _apply
with parallel_pipeline as output_generator:
File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tfaip\util\multiprocessing\data\pipeline.py", line 66, in __enter__
maxtasksperchild=self.max_tasks_per_child,
File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tfaip\util\multiprocessing\data\pool.py", line 62, in __init__
super().__init__(initializer=Initializer(worker_constructor), **kwargs)
File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\multiprocessing\pool.py", line 176, in __init__
self._repopulate_pool()
File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\multiprocessing\pool.py", line 241, in _repopulate_pool
w.start()
File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\multiprocessing\process.py", line 112, in start
self._popen = self._Popen(self)
File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\multiprocessing\popen_spawn_win32.py", line 89, in __init__
reduction.dump(process_obj, to_child)
File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
BrokenPipeError: [Errno 32] Broken pipe
[[{{node PyFunc}}]]
[[IteratorGetNext]] [Op:__inference_predict_function_2234]
Function call stack:
predict_function
Please help me on how to resolve this issue.
Edit andbue: code formatting
predictor.predict_raw
expects Iterable[np.ndarray]
, you are providing only numpy.ndarray
. Try:
raw_image_generator = [cv_img]
for sample in predictor.predict_raw(raw_image_generator):
...
I will try and let you know. But i have already converted the image as cv image which is a numpy.ndarray, then why i am getting this error. Is it because of tfaip?
@andbue i am still getting the same error after changing the image as numpy.ndarray. can you let me know what is the error that i am getting?
The image has been numpy.ndarray before, now it should be a list or any kind of iterator before you put it in the predictor. Could you post a full example of your code as it looks right now, ideally with all imports and maybe even the out_10.jpg you're using?
@andbue i am sharing the code which i am using for end to end prediction. Please let me know where am i going wrong.
import cv2
import numpy as np
import os
import glob
from pdf2image import convert_from_path
import subprocess
import pandas as pd
import re
from calamari_ocr.ocr.predict.predictor import Predictor, PredictorParams
def generateImage(pdfFile, des = './images'):
if pdfFile.split('.')[-1]=='pdf':
name = os.path.basename(pdfFile).replace('.pdf','')
images = convert_from_path(pdfFile,dpi=500,poppler_path = "C:\\Program Files (x86)\\poppler-0.68.0\\bin")
for i in range(len(images)):
images[i].save(des+'/'+name+'_page_'+ str(i) +'.jpg', 'JPEG')
def get_calamari_output(cropImg, index):
predictor = Predictor.from_checkpoint(
params=PredictorParams(),
checkpoint='./models/cal_model.ckpt')
calamari_output = {}
for sample in predictor.predict_raw([cropImg]):
inputs, prediction, meta = sample.inputs, sample.outputs, sample.meta
pred_text = prediction.sentence
avg_char_probability = 0
for p in prediction.positions:
if len(p.chars) > 0:
avg_char_probability += p.chars[0].probability
avg_char_probability /= len(prediction.positions) if len(prediction.positions) > 0 else 1
#print(prediction.avg_char_probability)
pred_confidence = round(avg_char_probability * 100, 1)
calamari_output[index] = [pred_text, pred_confidence]
return calamari_output
def drawBoundBox(imageFile):
orig_img = cv2.imread(imageFile)
gray = cv2.cvtColor(orig_img, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray,(11,11),0)
_, thresh = cv2.threshold(blur,0,255,cv2.THRESH_BINARY_INV | cv2.THRESH_OTSU)
kernal = cv2.getStructuringElement(cv2.MORPH_RECT,(11,19))
dilate = cv2.dilate(thresh,kernal, iterations=9)
cnts = cv2.findContours(dilate, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts)==2 else cnts[1]
cnts = sorted(cnts, key=lambda x: cv2.boundingRect(x)[0])
boxes = []
for c in cnts:
x,y,w,h = cv2.boundingRect(c)
boxes.append([x,y,w,h])
return boxes
def get_crops_dtls(imageFile):
res = cv2.imread(imageFile)
boxlist = drawBoundBox(imageFile)
for idx, box in enumerate(boxlist):
x,y,w,h = box[0],box[1],box[2],box[3]
crop = res[y:y+h,x:x+w]
#cv2.imwrite(dst+"/"+"out_"+str(idx)+'.jpg',crop)
pred_text_dict = get_calamari_output(crop, idx)
return pred_text_dict
if not os.path.exists('./images'):
os.makedirs('./images')
files = glob.glob('./invoice/*')
for file in files:
generateImage(file)
imgs = glob.glob('./images/*.jpg')
for img in imgs:
predict_data = get_crops_dtls(img)
As suggested i have done: for sample in predictor.predict_raw([cropImg]) but still it's giving the same error as before.
Ah, now I get it: put the lines at the bottom in a if __name__ == "__main__":
-block, otherwise the whole subprocess magic of calamari, tfaip and tensorflow is not going to work, producing the Broken pipe errors.
Further suggestions:
- do not call
Predictor.from_checkpoint
for each and every line, this is going to be very slow. Instantiate the object once and then just throw all of the images at it - if you are looking for a simple line segmentation algorithm, have a look at https://github.com/cisocrgroup/ocrd_cis/blob/master/ocrd_cis/ocropy/segment.py
@andbue thank you so much for your help. It worked for me with
if __name__=="__main__":
But i am facing another issue i.e, if the crop image have only one line it's extracting the text correctly but whenever it's having multiple lines it's giving blank string as output. Any suggestion to resolve this without re-training the existing model? Thank you in advance.
Glad to hear that it worked for you!
Calamari is, as stated in the "About"-text, a "Line based ATR Engine", so it does not contain any code for image preprocessing, document analysis, or line segmentation. To segment paragraph blocks into lines, have a look at the ocropy segmenter I linked to earlier. A more complex alternative that also performs document layout analysis can be found at https://github.com/qurator-spk/eynollah.