Inconsistent/repeated string variables in chain header when using campaign runs
jessmuir opened this issue · 4 comments
In taking a closer look at some chains I ran a couple weeks ago, I'm seeing some odd behavior that may have to do with updates to how the campaign runs handle variables. This behavior has appeared on chain I ran around the end of June, but not ones using the same ini+yml file structure run at the end of May. (Credit to @xuod for catching this)
As a quick sketch of my job structure, I have a one yml file that has runs like
## file: main_campaign.yml
runs:
# ----- baseline 3x2pt -----
output_dir: "${OUTPUT_DIR}"
runs:
# Fiducial 3x2pt lcdm data run.
- name: 32pt_lcdm_fidsimy6
base: ini_files/main_32pt.ini
env:
DATA_VECTOR : sim_baseline_32_lcdm.fits
SCALE_CUTS_FILE : y6-scales-ml6-3x2pt_4_4_v0.0.ini
# ----- baseline shear+gammat -----
- name: 22pt-sg_lcdm_fidsimy6
parent: 32pt_lcdm_fidsimy6
params:
- DEFAULT.2PT_DATA_SETS = xip xim gammat
pipeline:
- del 2pt_gal
Then I have another file that includes the first one, and sets up chains with the same settings, but running on different (noisy) data vectors:
## file: noisy_campaign.yml
output_dir: "${OUTPUT_DIR}"
include: main_campaign.yml
runs:
- name: wnoise_22pt-sg_lcdm_fidsimy6_r01
parent: 22pt-sg_lcdm_fidsimy6
env:
NOISEREAL : "01"
params:
- DEFAULT.2PT_FILE = data_vectors/wnoise_sim_baseline_32_lcdm_r${NOISEREAL}.fits
Relevant lines from the ini file used are:
[DEFAULT]
2PT_FILE = data_vectors/${DATA_VECTOR}
2PT_DATA_SETS = xip xim gammat wtheta
[fits_nz]
...
nz_file = %(2PT_FILE)s
...
[2pt_like]
...
data_file = %(2PT_FILE)s
data_sets = %(2PT_DATA_SETS)s
...
Now, the odd behavior.
Looking at the header of the chain run with this setup, the line ## 2pt_file = data_vectors/wnoise_sim_baseline_32_lcdm_r01.fits
(with the noisy data) appears in the DEFAULT
section as expected, but the line ## 2pt_file = data_vectors/sim_baseline_32_lcdm.fits
(with the original noiseless DV) appears at the end of every other module section (runtime, 2pt_like, consistency, sampler settings, everything). Also, the load_nz
module and 2pt_like
module have lines like ## nz_file = data_vectors/sim_baseline_32_lcdm.fits
rather than the %(2PT_FILE)s
cosmosis variables which remain in the headers of older chains (run in May).
In contrast, in the 2pt_like
header, the ## data_sets = %(2PT_DATA_SETS)s
line is still using the cosmosis variable. I'm not sure why these two variables (2PT_DATA_SETS
vs 2PT_FILE
) would be treated differently, but I wonder if the variable update including an env variable has something to do with it. I'll tinker with this a bit with a test samper and will share any additional insights if I can find them, but ideas or suggestions are welcome!
The items in the DEFAULT
section are accessible in every other section, so I guess it must be something to do with that. I can't really follow your example as it's quite complicated - could you have a go at distilling down to a trival example?
Sure -- I'll do some experimentation to try and figure out what aspects of my setup are causing this and will get back to you with a simpler example.
Ok, so at least some of the issue seems to be related to having a bash variable in one of the DEFAULT
parameter values. To test this I've played around with demo5 (just to produce some chain output that doesn't take too long). I modified the demo simply by adding a default section to the ini file:
[DEFAULT]
P1 = ${BASHVAR}
P2= justastring
Also, to get a sense of what version of this the different modules were seeing, I added the line
newparam = %(P1)s
to the [camb]
section.
Then, I set up a campaign file with that looks like this:
output_dir: output
runs:
# fiducial using demo5 ini
- name: base
base: demo5.ini
env:
# setting these here have the same effect as
# defining bash variables in an job submission script
BASHVAR : base_bash_var_value
- name: changep2
parent: base
params:
- DEFAULT.p2 = anotherstring
- name: changep1
parent: base
params:
- DEFAULT.P1 = altp1
- name: changep1_bash
parent: base
env:
BASHVAR : "alt_bash_var"
params:
- DEFAULT.P1 = test_${BASHVAR}
Looking at the chain outputs, the base chain's header section for CAMB has the linkes
## [camb]
...
## newparam = base_bash_var_value
## p1 = base_bash_var_value
And that p1 =
line shows up in the chain header section for every module. There isn't a similar p2 =
line outside of the default section.
When I do the run changep1
, which changes the default P1 parameter to a string with no bash variable in it, the camb header output looks like this
## [camb]
## newparam = %(P1)s
with no ## p1=
line in that module header or any of the others. The default section has the line ## p1 = altp1
, which is the expected behavior.
When I do the run change_p1_bash
, which changes P1 to a string with a bash variable in it, and sets that bash varialbe to something, I get similar output to the base run, with the camb header lines looking like the following, and with the p1 line repeated for every module listed in the header:
## [camb]
...
## newparam = test_alt_bash_var
## p1 = test_alt_bash_var
The main thing that this is replicating is that if there isn't a bash variable in a DEFAULT
section parameter, the %()s
formatting of those variables in the other sections is retained, but if the default param contains a bash variable, the %()s
entries are replaced by the variable values, and additionally the default parameters show up in all the module headers.
This may not actually cause problems in and of itself, but it does seem to be a new behavior that I don't fully understand.
More investigation is needed on my end, as this is replicating some of the out outputs, but not actually the problematic behavior I was seeing with my chain. Effectively what I was seeing (but haven't yet replicated in this simple setup) is that I have something like this in my chain header (mismatching the default section and what the modules seem to be seeing):
## [DEFAULT]
## p1 = test_alt_bash_var
## [camb]
...
## newparam = base_bash_var_value
## p1 = base_bash_var_value
As an update, the behavior changes if I move the runs changep1
and changep1_bash
to a different yml file that includes the base run's yml file.
If I do this and then look at the chain header, the line ## p1 = base_bash_var_value
is printed for every module, whether I change it to a plain string or to a new string with a bash variable in the update runs So, the default parameter set in the base run and main yml file does not get updated.
To get more specific, my child.yml file looks like this:
output_dir: output
include: main.yml
runs:
- name: childyml_changep1
parent: base
params:
- DEFAULT.P1 = ch_altp1
- name: childyml_changep1_bash
parent: base
env:
BASHVAR : "childyml_alt_bash_var"
params:
- DEFAULT.P1 = test2_${BASHVAR}
The chain header does update the info in the default section, ala
## [DEFAULT]
## p1 = ch_altp1
## p2 = justastring
but in the other sections it looks like this:
## [camb]
...
## newparam = base_bash_var_value
## p1 = base_bash_var_value
where base_bash_var_value
was the p1 value set in the original base run in main.yml
.