config file not properly closed
epifanio opened this issue · 1 comments
Hi,
I maybe mis-suing the confuse
library but I am running a function which uses a method like:
import confuse
import logging
def get_logpath():
try:
config = confuse.Configuration("mmdtool", __name__)
logfilepath = config["paths"]["logs"].get()
except NotFoundError:
logfilepath = "./logs/"
if not pathlib.Path(logfilepath).exists():
pathlib.Path(logfilepath).mkdir(parents=True, exist_ok=True)
return logfilepath
it works fine for a while but as the number of files processed increase, when I run my script in parallel over thousands of record, at some point the parallel job breaks with the following error:
ubuntu@pycsw-prod:/mnt/csw/dev/py-mmd-tools/script$ python3 convert_all.py -i /mnt/csw/metadata/nbs -t /mnt/csw/dev/mmd/xslt/mmd-to-iso.xsl -o /mnt/csw/metadata/nbs_iso/
os.walk("/mnt/csw/metadata/nbs")
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/ubuntu/.local/lib/python3.8/site-packages/confuse/yaml_util.py", line 85, in load_yaml
OSError: [Errno 24] Too many open files: '/home/ubuntu/.config/mmdtool/config.yaml'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.8/multiprocessing/pool.py", line 125, in worker
File "/usr/lib/python3.8/multiprocessing/pool.py", line 48, in mapstar
File "/usr/local/lib/python3.8/dist-packages/parmap/parmap.py", line 104, in _func_star_single
File "convert_all.py", line 33, in writerecord
File "/mnt/csw/dev/py-mmd-tools/py_mmd_tools/mmd_to_csw_iso.py", line 40, in mmd_to_iso
File "/mnt/csw/dev/py-mmd-tools/py_mmd_tools/mmd_util.py", line 31, in setup_log
File "/mnt/csw/dev/py-mmd-tools/py_mmd_tools/mmd_util.py", line 21, in get_logpath
File "/home/ubuntu/.local/lib/python3.8/site-packages/confuse/core.py", line 558, in __init__
File "/home/ubuntu/.local/lib/python3.8/site-packages/confuse/core.py", line 600, in read
File "/home/ubuntu/.local/lib/python3.8/site-packages/confuse/core.py", line 574, in _add_user_source
File "/home/ubuntu/.local/lib/python3.8/site-packages/confuse/yaml_util.py", line 88, in load_yaml
confuse.exceptions.ConfigReadError: file /home/ubuntu/.config/mmdtool/config.yaml could not be read: [Errno 24] Too many open files: '/home/ubuntu/.config/mmdtool/config.yaml'
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "convert_all.py", line 56, in <module>
main(metadata=args.input_dir, mmd2iso_xslt=args.input_xslt, outdir=args.output_dir)
File "convert_all.py", line 42, in main
y = parmap.map(writerecord, xmlfiles, mmd2iso_xslt=mmd2iso_xslt, outdir=outdir, pm_pbar=False)
File "/usr/local/lib/python3.8/dist-packages/parmap/parmap.py", line 304, in map
return _map_or_starmap(function, iterable, args, kwargs, "map")
File "/usr/local/lib/python3.8/dist-packages/parmap/parmap.py", line 248, in _map_or_starmap
output = result.get()
File "/usr/lib/python3.8/multiprocessing/pool.py", line 768, in get
raise self._value
confuse.exceptions.ConfigReadError: file file /home/ubuntu/.config/mmdtool/config.yaml could not be read: [Errno 24] Too many open files: '/home/ubuntu/.config/mmdtool/config.yaml' could not be read
I tried to replace my code with:
with confuse.Configuration("mmdtool", __name__) as config:
logfilepath = config["paths"]["logs"].get()
with the hope to get the config file closed, but that didn't work as I got a AttributeError: __enter__
Hi! Here's that load_yaml
function that shows up in your traceback:
Lines 85 to 86 in 0403614
We are in fact closing the file after reading it. You mentioned that you are running this program many times in parallel:
when I run my script in parallel over thousands of record
So it seems likely to me that these thousands of parallel processes are simultaneously opening the same file—even if it will shortly be closed again by all of them.
Any chance you can instead find a way to load your config once and share it across all the processes?