Receiving error "TypeError: can't pickle _thread.RLock objects"
Closed this issue · 17 comments
Hi
I am excited trying out your code and I installed it on my Windows 10 machine (Ryzen 3700x cpu, Nvidia RTX 2070 Super gpu) under anaconda (python 3.6.15, tensorflow 2.6.2, cudatoolkit 11.2.2) and it gets pretty far along before it crashes.
Here is my command line...
eynollah --image sn98062568_1933-11-18_ed-1_seq-3.png --out test1 --model models_eynollah --save_layout test1 --full-layout --enable-plotting --allow-enhancement --allow_scaling --log-level DEBUG
And I get sn98062568_1933-11-18_ed-1_seq-3_enhanced.png
and sn98062568_1933-11-18_ed-1_seq-3_layout_main.png
images generated that look reasonable. But here is the output stream just before and including the error...
14:32:25.982 INFO eynollah - detection of marginals took 4.2s
14:32:25.982 DEBUG eynollah - enter run_boxes_full_layout
14:32:26.780 DEBUG eynollah - enter extract_text_regions
14:32:26.894 DEBUG eynollah - enter start_new_session_and_model (model_dir=models_eynollah/model_3up_new_good_no_augmentation.h5)
14:32:28.952 DEBUG eynollah - enter do_prediction
14:32:28.954 DEBUG eynollah - Patch size: 896x896
14:32:32.797 DEBUG eynollah - enter do_prediction
14:32:32.799 DEBUG eynollah - Patch size: 896x896
14:32:41.277 DEBUG eynollah - exit extract_text_regions
14:32:42.255 DEBUG eynollah - enter extract_text_regions
14:32:42.256 DEBUG eynollah - enter start_new_session_and_model (model_dir=models_eynollah/model_no_patches_class0_30eopch.h5)
14:32:44.120 DEBUG eynollah - enter do_prediction
14:32:45.507 DEBUG eynollah - exit extract_text_regions
14:32:46.658 DEBUG eynollah - exit run_boxes_full_layout
14:33:52.914 DEBUG eynollah - enter get_slopes_and_deskew_new
Traceback (most recent call last):
Traceback (most recent call last):
File "c:\users\steve\anaconda3\envs\qurator-spk\lib\runpy.py", line 193, in _run_module_as_main
File "<string>", line 1, in <module>
"__main__", mod_spec)
File "c:\users\steve\anaconda3\envs\qurator-spk\lib\multiprocessing\spawn.py", line 105, in spawn_main
File "c:\users\steve\anaconda3\envs\qurator-spk\lib\runpy.py", line 85, in _run_code
exitcode = _main(fd)
exec(code, run_globals)
File "c:\users\steve\anaconda3\envs\qurator-spk\lib\multiprocessing\spawn.py", line 115, in _main
File "C:\Users\Steve\anaconda3\envs\qurator-spk\Scripts\eynollah.exe\__main__.py", line 7, in <module>
self = reduction.pickle.load(from_parent)
File "c:\users\steve\anaconda3\envs\qurator-spk\lib\site-packages\click\core.py", line 1128, in __call__
EOFError: Ran out of input
return self.main(*args, **kwargs)
File "c:\users\steve\anaconda3\envs\qurator-spk\lib\site-packages\click\core.py", line 1053, in main
rv = self.invoke(ctx)
File "c:\users\steve\anaconda3\envs\qurator-spk\lib\site-packages\click\core.py", line 1395, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "c:\users\steve\anaconda3\envs\qurator-spk\lib\site-packages\click\core.py", line 754, in invoke
return __callback(*args, **kwargs)
File "c:\users\steve\anaconda3\envs\qurator-spk\lib\site-packages\qurator\eynollah\cli.py", line 151, in main
pcgts = eynollah.run()
File "c:\users\steve\anaconda3\envs\qurator-spk\lib\site-packages\qurator\eynollah\eynollah.py", line 2458, in run
slopes, all_found_texline_polygons, boxes_text, txt_con_org, contours_only_text_parent, all_box_coord, index_by_text_par_con = self.get_slopes_and_deskew_new(txt_con_org, contours_only_text_parent, textline_mask_tot_ea, image_page_rotated, boxes_text, slope_deskew)
File "c:\users\steve\anaconda3\envs\qurator-spk\lib\site-packages\qurator\eynollah\eynollah.py", line 828, in get_slopes_and_deskew_new
processes[i].start()
File "c:\users\steve\anaconda3\envs\qurator-spk\lib\multiprocessing\process.py", line 105, in start
self._popen = self._Popen(self)
File "c:\users\steve\anaconda3\envs\qurator-spk\lib\multiprocessing\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "c:\users\steve\anaconda3\envs\qurator-spk\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "c:\users\steve\anaconda3\envs\qurator-spk\lib\multiprocessing\popen_spawn_win32.py", line 65, in __init__
reduction.dump(process_obj, to_child)
File "c:\users\steve\anaconda3\envs\qurator-spk\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
TypeError: can't pickle _thread.RLock objects
Do you have any idea of what the problem may be, and what I can do to fix it?
Thanks!
Dear @sjscotti,
Sorry to say this but Windows is not supported by our team.
Hi
I eventually got it to run without errors (unfortunately, I don't remember what steps I took to get to that stage). It works with the environment below. (BTW, I also had to modify the path in setup.py
. It appears Windows needed a direct path to ./qurator/eynollah/ocrd-tool.json
in the with open
statement)
BTW, I like the quality of the results I am getting so far, but it takes 8 to 10 minutes to run on each of my images which are pretty dense old newspaper pages having dimensions ~5000 x 6700.
Thanks!
-Steve
Environment used...
(qurator) D:\qurator\eynollah>conda list
# packages in environment at C:\Users\Steve\anaconda3\envs\qurator:
#
# Name Version Build Channel
absl-py 1.0.0 pypi_0 pypi
astunparse 1.6.3 pypi_0 pypi
atomicwrites 1.4.0 pypi_0 pypi
attrs 21.4.0 pypi_0 pypi
bagit 1.8.1 pypi_0 pypi
bagit-profile 1.3.1 pypi_0 pypi
ca-certificates 2022.5.18.1 h5b45459_0 conda-forge
cached-property 1.5.2 pypi_0 pypi
cachetools 5.1.0 pypi_0 pypi
certifi 2022.5.18.1 pypi_0 pypi
charset-normalizer 2.0.12 pypi_0 pypi
click 8.1.3 pypi_0 pypi
colorama 0.4.4 pypi_0 pypi
cudatoolkit 11.7.0 ha6f8bbd_10 conda-forge
cycler 0.11.0 pypi_0 pypi
deprecated 1.2.0 pypi_0 pypi
eynollah 0.0.11 pypi_0 pypi
flask 2.1.2 pypi_0 pypi
flatbuffers 1.12 pypi_0 pypi
fonttools 4.33.3 pypi_0 pypi
gast 0.4.0 pypi_0 pypi
google-auth 2.6.6 pypi_0 pypi
google-auth-oauthlib 0.4.6 pypi_0 pypi
google-pasta 0.2.0 pypi_0 pypi
grpcio 1.46.3 pypi_0 pypi
h5py 3.6.0 pypi_0 pypi
idna 3.3 pypi_0 pypi
importlib-metadata 4.11.3 pypi_0 pypi
importlib-resources 5.7.1 pypi_0 pypi
imutils 0.5.4 pypi_0 pypi
itsdangerous 2.1.2 pypi_0 pypi
jinja2 3.1.2 pypi_0 pypi
joblib 1.1.0 pypi_0 pypi
jsonschema 4.5.1 pypi_0 pypi
keras 2.9.0 pypi_0 pypi
keras-preprocessing 1.1.2 pypi_0 pypi
kiwisolver 1.4.2 pypi_0 pypi
libclang 14.0.1 pypi_0 pypi
lxml 4.8.0 pypi_0 pypi
markdown 3.3.7 pypi_0 pypi
markupsafe 2.1.1 pypi_0 pypi
matplotlib 3.5.2 pypi_0 pypi
numpy 1.21.6 pypi_0 pypi
oauthlib 3.2.0 pypi_0 pypi
ocrd 2.34.0 pypi_0 pypi
ocrd-modelfactory 2.34.0 pypi_0 pypi
ocrd-models 2.34.0 pypi_0 pypi
ocrd-utils 2.34.0 pypi_0 pypi
ocrd-validators 2.34.0 pypi_0 pypi
opencv-python-headless 4.5.5.64 pypi_0 pypi
openssl 3.0.3 h8ffe710_0 conda-forge
opt-einsum 3.3.0 pypi_0 pypi
packaging 21.3 pypi_0 pypi
pillow 9.1.1 pypi_0 pypi
pip 22.0.4 pyhd8ed1ab_0 conda-forge
protobuf 3.20.1 pypi_0 pypi
pyasn1 0.4.8 pypi_0 pypi
pyasn1-modules 0.2.8 pypi_0 pypi
pyparsing 3.0.9 pypi_0 pypi
pyrsistent 0.18.1 pypi_0 pypi
python 3.7.12 h900ac77_100_cpython conda-forge
python-dateutil 2.8.2 pypi_0 pypi
python_abi 3.7 2_cp37m conda-forge
pyyaml 6.0 pypi_0 pypi
requests 2.27.1 pypi_0 pypi
requests-oauthlib 1.3.1 pypi_0 pypi
rsa 4.8 pypi_0 pypi
scikit-learn 1.0.2 pypi_0 pypi
scipy 1.7.3 pypi_0 pypi
setuptools 62.3.2 py37h03978a9_0 conda-forge
shapely 1.8.2 pypi_0 pypi
six 1.16.0 pypi_0 pypi
sqlite 3.38.5 h8ffe710_0 conda-forge
tensorboard 2.9.0 pypi_0 pypi
tensorboard-data-server 0.6.1 pypi_0 pypi
tensorboard-plugin-wit 1.8.1 pypi_0 pypi
tensorflow 2.9.0 pypi_0 pypi
tensorflow-estimator 2.9.0 pypi_0 pypi
tensorflow-io-gcs-filesystem 0.26.0 pypi_0 pypi
termcolor 1.1.0 pypi_0 pypi
threadpoolctl 3.1.0 pypi_0 pypi
typing-extensions 4.2.0 pypi_0 pypi
ucrt 10.0.20348.0 h57928b3_0 conda-forge
urllib3 1.26.9 pypi_0 pypi
vc 14.2 hb210afc_6 conda-forge
vs2015_runtime 14.29.30037 h902a5da_6 conda-forge
werkzeug 2.1.2 pypi_0 pypi
wheel 0.37.1 pyhd8ed1ab_0 conda-forge
wrapt 1.14.1 pypi_0 pypi
zipp 3.8.0 pypi_0 pypi
Wow, that's great to hear and thanks for sharing, Steve!
it takes 8 to 10 minutes to run on each of my images which are pretty dense old newspaper pages having dimensions ~5000 x 6700
For our use cases (large images of historical newspapers is a prominent goal here too), quality of results is paramount, even when we have to trade processing time for it. But we are aware of the issue. One reason is that we did not optimize the code at all for throughput, but the main reason likely is that multiple models need to be loaded to memory for each image/processing call. We are working on a "batch mode" that will keep all the models in memory for the entire run, which has shown to result in a significant reduction of processing time in our tests - @vahidrezanezhad may be able to give some more pointers.
Also, in this branch, you can find a "light" version of Eynollah which gives more speed for a little less quality in the result. Perhaps worth a try.
Hi I eventually got it to run without errors (unfortunately, I don't remember what steps I took to get to that stage). It works with the environment below. (BTW, I also had to modify the path in
setup.py
. It appears Windows needed a direct path to./qurator/eynollah/ocrd-tool.json
in thewith open
statement) BTW, I like the quality of the results I am getting so far, but it takes 8 to 10 minutes to run on each of my images which are pretty dense old newspaper pages having dimensions ~5000 x 6700. Thanks! -SteveEnvironment used...
(qurator) D:\qurator\eynollah>conda list # packages in environment at C:\Users\Steve\anaconda3\envs\qurator: # # Name Version Build Channel absl-py 1.0.0 pypi_0 pypi astunparse 1.6.3 pypi_0 pypi atomicwrites 1.4.0 pypi_0 pypi attrs 21.4.0 pypi_0 pypi bagit 1.8.1 pypi_0 pypi bagit-profile 1.3.1 pypi_0 pypi ca-certificates 2022.5.18.1 h5b45459_0 conda-forge cached-property 1.5.2 pypi_0 pypi cachetools 5.1.0 pypi_0 pypi certifi 2022.5.18.1 pypi_0 pypi charset-normalizer 2.0.12 pypi_0 pypi click 8.1.3 pypi_0 pypi colorama 0.4.4 pypi_0 pypi cudatoolkit 11.7.0 ha6f8bbd_10 conda-forge cycler 0.11.0 pypi_0 pypi deprecated 1.2.0 pypi_0 pypi eynollah 0.0.11 pypi_0 pypi flask 2.1.2 pypi_0 pypi flatbuffers 1.12 pypi_0 pypi fonttools 4.33.3 pypi_0 pypi gast 0.4.0 pypi_0 pypi google-auth 2.6.6 pypi_0 pypi google-auth-oauthlib 0.4.6 pypi_0 pypi google-pasta 0.2.0 pypi_0 pypi grpcio 1.46.3 pypi_0 pypi h5py 3.6.0 pypi_0 pypi idna 3.3 pypi_0 pypi importlib-metadata 4.11.3 pypi_0 pypi importlib-resources 5.7.1 pypi_0 pypi imutils 0.5.4 pypi_0 pypi itsdangerous 2.1.2 pypi_0 pypi jinja2 3.1.2 pypi_0 pypi joblib 1.1.0 pypi_0 pypi jsonschema 4.5.1 pypi_0 pypi keras 2.9.0 pypi_0 pypi keras-preprocessing 1.1.2 pypi_0 pypi kiwisolver 1.4.2 pypi_0 pypi libclang 14.0.1 pypi_0 pypi lxml 4.8.0 pypi_0 pypi markdown 3.3.7 pypi_0 pypi markupsafe 2.1.1 pypi_0 pypi matplotlib 3.5.2 pypi_0 pypi numpy 1.21.6 pypi_0 pypi oauthlib 3.2.0 pypi_0 pypi ocrd 2.34.0 pypi_0 pypi ocrd-modelfactory 2.34.0 pypi_0 pypi ocrd-models 2.34.0 pypi_0 pypi ocrd-utils 2.34.0 pypi_0 pypi ocrd-validators 2.34.0 pypi_0 pypi opencv-python-headless 4.5.5.64 pypi_0 pypi openssl 3.0.3 h8ffe710_0 conda-forge opt-einsum 3.3.0 pypi_0 pypi packaging 21.3 pypi_0 pypi pillow 9.1.1 pypi_0 pypi pip 22.0.4 pyhd8ed1ab_0 conda-forge protobuf 3.20.1 pypi_0 pypi pyasn1 0.4.8 pypi_0 pypi pyasn1-modules 0.2.8 pypi_0 pypi pyparsing 3.0.9 pypi_0 pypi pyrsistent 0.18.1 pypi_0 pypi python 3.7.12 h900ac77_100_cpython conda-forge python-dateutil 2.8.2 pypi_0 pypi python_abi 3.7 2_cp37m conda-forge pyyaml 6.0 pypi_0 pypi requests 2.27.1 pypi_0 pypi requests-oauthlib 1.3.1 pypi_0 pypi rsa 4.8 pypi_0 pypi scikit-learn 1.0.2 pypi_0 pypi scipy 1.7.3 pypi_0 pypi setuptools 62.3.2 py37h03978a9_0 conda-forge shapely 1.8.2 pypi_0 pypi six 1.16.0 pypi_0 pypi sqlite 3.38.5 h8ffe710_0 conda-forge tensorboard 2.9.0 pypi_0 pypi tensorboard-data-server 0.6.1 pypi_0 pypi tensorboard-plugin-wit 1.8.1 pypi_0 pypi tensorflow 2.9.0 pypi_0 pypi tensorflow-estimator 2.9.0 pypi_0 pypi tensorflow-io-gcs-filesystem 0.26.0 pypi_0 pypi termcolor 1.1.0 pypi_0 pypi threadpoolctl 3.1.0 pypi_0 pypi typing-extensions 4.2.0 pypi_0 pypi ucrt 10.0.20348.0 h57928b3_0 conda-forge urllib3 1.26.9 pypi_0 pypi vc 14.2 hb210afc_6 conda-forge vs2015_runtime 14.29.30037 h902a5da_6 conda-forge werkzeug 2.1.2 pypi_0 pypi wheel 0.37.1 pyhd8ed1ab_0 conda-forge wrapt 1.14.1 pypi_0 pypi zipp 3.8.0 pypi_0 pypi
Are you running on cpu or gpu? If you use our light version on GPU with flowing from directory (-di and of course you have many images ther in order to make it sense !!) , the processing time per image will be drastically lower than 8-10 minutes.
Thanks for the speedy response! I am running on GPU (Nvidia GeForce RTX 2070 Super with 8GB GPU memory). I'll have to give the light version with -di flag a try to see how that performs. I am happy to find that the code accepts .jp2 image files as input since that is how mine are stored :-)
I have the exact same GPU and in batch mode the processing time went down to ~2-3min per newspaper page (with similar dimensions).
I am happy to find that the code accepts .jp2 image files
That praise goes to pillow ;-)
Hi
I created a new Anaconda environment on my Windows machine and installed the eynollah_light
branch on it. I was initially having an error that was not easy to trace, but it turned out to occur on line 23 in eynollah.py
from keras import backend as K
After some additional debugging, it was fixed by downgrading to protobuf 3.20.1
. I was then able to run using this command...
eynollah --out test1 --model models_eynollah --save_layout test1 --full-layout --enable-plotting --allow-enhancement --allow_scaling --input_binary --log-level DEBUG --light_version --dir_in input_images
It then ran and created the xxx_enhanced.png
and xxx_layout_main.png
files for the first image. It also stayed below my 8GB of dedicated GPU memory.
But then it got to this step and did not progress past it before it hit an error...
19:24:13.953 DEBUG eynollah - enter get_slopes_and_deskew_new
GPU memory increased above my 8GB dedicated GPU memory and the program started using shared GPU memory (note: Windows will use CPU memory as virtual GPU memory - if needed - up to half of the installed CPU memory), but it eventually gave an out of memory error as shown below...
Traceback (most recent call last):
File "C:\Users\Steve\anaconda3\envs\eynollah_light\lib\site-packages\tensorflow_core\python\client\session.py", line 1365, in _do_call
return fn(*args)
File "C:\Users\Steve\anaconda3\envs\eynollah_light\lib\site-packages\tensorflow_core\python\client\session.py", line 1350, in _run_fn
target_list, run_metadata)
File "C:\Users\Steve\anaconda3\envs\eynollah_light\lib\site-packages\tensorflow_core\python\client\session.py", line 1443, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[1,1,1024,512] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node res5a_branch2a_6/random_uniform/RandomUniform}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Users\Steve\anaconda3\envs\eynollah_light\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "C:\Users\Steve\anaconda3\envs\eynollah_light\lib\multiprocessing\spawn.py", line 115, in _main
self = reduction.pickle.load(from_parent)
File "C:\Users\Steve\anaconda3\envs\eynollah_light\lib\site-packages\keras\engine\network.py", line 1334, in __setstate__
model = saving.unpickle_model(state)
File "C:\Users\Steve\anaconda3\envs\eynollah_light\lib\site-packages\keras\engine\saving.py", line 604, in unpickle_model
return _deserialize_model(h5dict)
File "C:\Users\Steve\anaconda3\envs\eynollah_light\lib\site-packages\keras\engine\saving.py", line 336, in _deserialize_model
K.batch_set_value(weight_value_tuples)
File "C:\Users\Steve\anaconda3\envs\eynollah_light\lib\site-packages\keras\backend\tensorflow_backend.py", line 2960, in batch_set_value
tf_keras_backend.batch_set_value(tuples)
File "C:\Users\Steve\anaconda3\envs\eynollah_light\lib\site-packages\tensorflow_core\python\keras\backend.py", line 3259, in batch_set_value
get_session().run(assign_ops, feed_dict=feed_dict)
File "C:\Users\Steve\anaconda3\envs\eynollah_light\lib\site-packages\tensorflow_core\python\keras\backend.py", line 486, in get_session
_initialize_variables(session)
File "C:\Users\Steve\anaconda3\envs\eynollah_light\lib\site-packages\tensorflow_core\python\keras\backend.py", line 910, in _initialize_variables
session.run(variables_module.variables_initializer(uninitialized_vars))
File "C:\Users\Steve\anaconda3\envs\eynollah_light\lib\site-packages\tensorflow_core\python\client\session.py", line 956, in run
run_metadata_ptr)
File "C:\Users\Steve\anaconda3\envs\eynollah_light\lib\site-packages\tensorflow_core\python\client\session.py", line 1180, in _run
feed_dict_tensor, options, run_metadata)
File "C:\Users\Steve\anaconda3\envs\eynollah_light\lib\site-packages\tensorflow_core\python\client\session.py", line 1359, in _do_run
run_metadata)
File "C:\Users\Steve\anaconda3\envs\eynollah_light\lib\site-packages\tensorflow_core\python\client\session.py", line 1384, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[1,1,1024,512] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[node res5a_branch2a_6/random_uniform/RandomUniform (defined at C:\Users\Steve\anaconda3\envs\eynollah_light\lib\site-packages\tensorflow_core\python\framework\ops.py:1748) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
Original stack trace for 'res5a_branch2a_6/random_uniform/RandomUniform':
File "<string>", line 1, in <module>
File "C:\Users\Steve\anaconda3\envs\eynollah_light\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "C:\Users\Steve\anaconda3\envs\eynollah_light\lib\multiprocessing\spawn.py", line 115, in _main
self = reduction.pickle.load(from_parent)
File "C:\Users\Steve\anaconda3\envs\eynollah_light\lib\site-packages\keras\engine\network.py", line 1334, in __setstate__
model = saving.unpickle_model(state)
File "C:\Users\Steve\anaconda3\envs\eynollah_light\lib\site-packages\keras\engine\saving.py", line 604, in unpickle_model
return _deserialize_model(h5dict)
File "C:\Users\Steve\anaconda3\envs\eynollah_light\lib\site-packages\keras\engine\saving.py", line 274, in _deserialize_model
model = model_from_config(model_config, custom_objects=custom_objects)
File "C:\Users\Steve\anaconda3\envs\eynollah_light\lib\site-packages\keras\engine\saving.py", line 627, in model_from_config
return deserialize(config, custom_objects=custom_objects)
File "C:\Users\Steve\anaconda3\envs\eynollah_light\lib\site-packages\keras\layers\__init__.py", line 168, in deserialize
printable_module_name='layer')
File "C:\Users\Steve\anaconda3\envs\eynollah_light\lib\site-packages\keras\utils\generic_utils.py", line 147, in deserialize_keras_object
list(custom_objects.items())))
File "C:\Users\Steve\anaconda3\envs\eynollah_light\lib\site-packages\keras\engine\network.py", line 1075, in from_config
process_node(layer, node_data)
File "C:\Users\Steve\anaconda3\envs\eynollah_light\lib\site-packages\keras\engine\network.py", line 1025, in process_node
layer(unpack_singleton(input_tensors), **kwargs)
File "C:\Users\Steve\anaconda3\envs\eynollah_light\lib\site-packages\keras\engine\base_layer.py", line 463, in __call__
self.build(unpack_singleton(input_shapes))
File "C:\Users\Steve\anaconda3\envs\eynollah_light\lib\site-packages\keras\layers\convolutional.py", line 141, in build
constraint=self.kernel_constraint)
File "C:\Users\Steve\anaconda3\envs\eynollah_light\lib\site-packages\keras\engine\base_layer.py", line 279, in add_weight
weight = K.variable(initializer(shape, dtype=dtype),
File "C:\Users\Steve\anaconda3\envs\eynollah_light\lib\site-packages\keras\initializers.py", line 227, in __call__
dtype=dtype, seed=self.seed)
File "C:\Users\Steve\anaconda3\envs\eynollah_light\lib\site-packages\keras\backend\tensorflow_backend.py", line 4357, in random_uniform
shape, minval=minval, maxval=maxval, dtype=dtype, seed=seed)
File "C:\Users\Steve\anaconda3\envs\eynollah_light\lib\site-packages\tensorflow_core\python\keras\backend.py", line 5494, in random_uniform
shape, minval=minval, maxval=maxval, dtype=dtype, seed=seed)
File "C:\Users\Steve\anaconda3\envs\eynollah_light\lib\site-packages\tensorflow_core\python\ops\random_ops.py", line 245, in random_uniform
rnd = gen_random_ops.random_uniform(shape, dtype, seed=seed1, seed2=seed2)
File "C:\Users\Steve\anaconda3\envs\eynollah_light\lib\site-packages\tensorflow_core\python\ops\gen_random_ops.py", line 822, in random_uniform
name=name)
File "C:\Users\Steve\anaconda3\envs\eynollah_light\lib\site-packages\tensorflow_core\python\framework\op_def_library.py", line 794, in _apply_op_helper
op_def=op_def)
File "C:\Users\Steve\anaconda3\envs\eynollah_light\lib\site-packages\tensorflow_core\python\util\deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "C:\Users\Steve\anaconda3\envs\eynollah_light\lib\site-packages\tensorflow_core\python\framework\ops.py", line 3357, in create_op
attrs, op_def, compute_device)
File "C:\Users\Steve\anaconda3\envs\eynollah_light\lib\site-packages\tensorflow_core\python\framework\ops.py", line 3426, in _create_op_internal
op_def=op_def)
File "C:\Users\Steve\anaconda3\envs\eynollah_light\lib\site-packages\tensorflow_core\python\framework\ops.py", line 1748, in __init__
self._traceback = tf_stack.extract_stack()
I saw there was another OOM error mentioned in issue #80, so maybe it is the same problem.
Thanks for reporting, there indeed appears to be memory leakage - we will have to investigate this further.
Could you please share the batch of dataset you have used.
Hi
Enclosed is a single .jp2 file (in a zip archive) that should give the error. Interestingly, I successfully ran eynollah_light
with this command to use the --original
method and handling a single image...
eynollah --image sn98062568_1926-09-25_ed-1_seq-1.jp2 --out test1 --model models_eynollah --save_layout test1 --full-layout --enable-plotting --allow-enhancement --allow_scaling --input_binary --log-level DEBUG --original
which took 244.5 seconds to run.
And I ran this command on the same file using eynollah main
which took 277.9 seconds to run.
eynollah --image sn98062568_1926-09-25_ed-1_seq-1.jp2 --out test1 --model models_eynollah --save_layout test1 --full-layout --enable-plotting --allow-enhancement --allow_scaling --input_binary --log-level DEBUG
So the light version with the --original
flag is a bit faster, even on a single file.
Hi Enclosed is a single .jp2 file (in a zip archive) that should give the error. Interestingly, I successfully ran
eynollah_light
with this command to use the--original
method and handling a single image...eynollah --image sn98062568_1926-09-25_ed-1_seq-1.jp2 --out test1 --model models_eynollah --save_layout test1 --full-layout --enable-plotting --allow-enhancement --allow_scaling --input_binary --log-level DEBUG --original
which took 244.5 seconds to run.
And I ran this command on the same file using eynollah
main
which took 277.9 seconds to run.eynollah --image sn98062568_1926-09-25_ed-1_seq-1.jp2 --out test1 --model models_eynollah --save_layout test1 --full-layout --enable-plotting --allow-enhancement --allow_scaling --input_binary --log-level DEBUG
So the light version with the
--original
flag is a bit faster, even on a single file.
I took your attached image and made more copies of them and put them all in directory (26 copy of the same image) and I run eynollah with these options -fl -light -di and it took 76.4 seconds per image. just for information in -light version the tool does binarization as default so -input_binary can be ignored with that version. I would appreciate if you provide me the full batch of images, my intention is to reproduce the same OOM error.
Here are 14 jp2 images for you to use to test (limited to 14 by the 25MB Github filesize allowance). I have also been trying to track down where things get bogged down in the code on my Windows PC, and it appears that I have nearly reached my 8GB of dedicated GPU memory when the first call in a loop to processes[i].start()
on line 1109 in is made in eynollah.py
. From that point, the loop is executed extremely slowly and my 'virtual' GPU memory is slowly increasing until the OOM error occurs on the sixth time through the loop. This is all occurring with the first image, so I think my problem is actually insufficient GPU memory when using the batch image input option. That makes sense thinking about your comment that the batch image option keeps all the models in memory. When I use eynollah_light
with the --light_version
option for a single input file (using --image
input), my GPU memory peaks at about 6.5GB, and an image that took 244.5 seconds to run using the --original
method only took 204.1 seconds.
BTW, is there an option in eynollah to segment into words? If so, I'd like to try it with ABINet for the text recognition step.
is there an option in eynollah to segment into words
I'm afraid no, and likely never will. We only use text line recognizers like Calamari or recent versions of Tesseract these days. Both can segment lines into words starting from Eynollah output.
We have another project OCR-D where you can construct more complex workflows including e.g. segmentation with Eynollah and subsequent text recognition. It requires some specific data formats (e.g. METS xml containers for image files), but we're always happy to help via our public chat at https://gitter.im/OCR-D/Lobby.
Thanks! I have actually just finished trying the 3 Calamari models in OCR-D (i.e., running ocrd-calamari-recognize) after segmentation with
ocrd-eynollah-segment for one of the images I sent you, and the results were not that good. I’ve tested ABINet to recognize text (using a colab script in mmocr with word segmentation capabilities they offer) for the same image, and it was excellent in recognizing words whenever the word segmentation captured a decent image of a word. But a number of words were not captured by the word segmentation methods in mmocr, so I am looking for alternatives. I’ll put a question on the public chat you suggested and see what responses it brings.
Thanks again!
Hi
I'm sorry that I forgot to close this issue out previously, and I apologize that it strayed from the original issue to one involving eynollah_light
(though I am still interested in the question of whether there is a need for a much larger GPU memory when using the batch option in eynollah_light
). BTW, after segmentation with ocrd-eynollah-segment
I used ocrd-tesserocr-recognize
in OCR-D with the normal eng
model, and I found that it not only does the word segmentation I had asked about, but that the results are much better than I got with the Calamari models.