Ordering of results in run_over_distribution() does not match ordering in configuration list
Opened this issue · 3 comments
Calling a distributed case:
distrorun1 = DistributionRun(testconfig, numvalues=10000)
results1 = distrorun1.run_over_distribution(scendata, output_vars, max_workers=100)
where scendata is a list of configuration dictionaries, output_vars is a list of desired outputs
the ordering of outputs in df=results1.timeseries() does not match that in distrorun1.cfgs - which renders any emulation impossible. I suspect this traces back to the parallel utility in openscmrunner:
https://github.com/openscm/openscm-runner/blob/main/src/openscm_runner/adapters/utils/_parallel_process.py
which (I think) doesn't preserve the ordering of the original configuration vector in the output.
Working on a simpler code which bypasses openscmrunner for this...
Alright - this seems to work ( https://github.com/ciceroOslo/ciceroscm/blob/calibration-workflow/notebooks/CSCM_calibrate.ipynb ), without calling the openscmrunner:
Firstly, a quick wrapper to jointly return results and config, and handle crashes:
def get_results(cfg):
try:
cscm_dir._run({"results_as_dict":True},
pamset_udm=cfg['pamset_udm'],pamset_emiconc=cfg['pamset_emiconc'])
res=cscm_dir.results
except:
res=None
return [cfg,res]
then:
def run_parallel(cfgs,nworkers=4):
results=len(cfgs)*[None]
with ProcessPoolExecutor(nworkers) as exe:
# execute tasks concurrently and process results in order
pres=list(tqdm(exe.map(get_results, cfgs)))
for result in pres:
# get the corresponding index of the config
ind=int(result[0]['Index'])
# put it in the right element of the results vector
results[ind]=result[1]
return results
so, in use - you do:
distrorun1 = DistributionRun(testconfig, numvalues=10000)
results=run_parallel(distrorun1.cfgs,nworkers=100)
This is nice, I will take a look at implementing this on Monday, @benmsanderson . In the meantime a quick review #131 so we have a working version would be really great ;-)
Will try and look tomorrow - in the meantime, a simple parallel implementation without openscmrunner is here, works fine on qbo (working on the emulation/optimization now). https://github.com/ciceroOslo/ciceroscm/blob/calibration-workflow/notebooks/calibration%20pipeline/1%20-%20run%20parallel%20PPE.ipynb