ENH: Value in `MDIO__IMPORT__CPU_COUNT` does not get used for ingestion in a particular scenario
Closed this issue · 1 comments
Issue
The value in environment variable MDIO__IMPORT__CPU_COUNT
should be used to limit number of processed spawned by ProcessPoolExecutor
. However, this value does not get used in a particular situation.
In following two scenarios, it get's used in scenario 1 but not in scenario 2.
Scenario 1:
Set the environment variable MDIO__IMPORT__CPU_COUNT
and then run the script that invokes segy_to_mdio
function : Works
E.g. Launch a pod with environment variable MDIO__IMPORT__CPU_COUNT
already set and then run the script
Scenario 2:
Run the script, set environment variable MDIO__IMPORT__CPU_COUNT
in that script and then invoke segy_to_mdio
function : Does not work
E.g. Launch a pod. Use argument sent to the script to set the environment variable MDIO__IMPORT__CPU_COUNT
in the script and then invoke the segy_to_mdio
function
This happens because NUM_CPUS
value gets updated when the code is loaded in memory before execution starts thus requiring environment variable to be set before running the script.
Suggested solution
Re-read the value of MDIO__IMPORT__CPU_COUNT
just before following line and save it in NUM_CPUs. This will ensure that Scenario 2 would also work.
mdio-python/src/mdio/segy/blocked_io.py
Line 122 in 1475793
Hi Amit; thanks for sharing this. I'll try to find a more clean way that satisfies both.
In the meantime if you set it with os.environ before you import mdio functions it should still work for scenario 2. Can you please try and let me know?