Hintzelab/MABE

File overwriting

JorySchossau opened this issue · 4 comments

Began in #211 wrt allowing '-f' and '-s' to write files to outputDirectory instead of mabe.exe dir.

Vinny:
well, there is the potential for any user to overwrite their current settings files with default settings files unintentionally. But that's true even if we are using the output directory in the settings. I think we should either detect that the user is overriding their settings files and warn them with a [y/N] sort of prompt, or we just leave it as is.

In reply to Vinny:
As you point out, currently there is no concept of overwrite safety. If we had it, I think mabe would detect if output files (max.csv, etc.) already exist in outputDir and ask you to provide an override flag. The same could be true for settings and popload files. And then because the command line is so long anyway, users would always use it, making the flag useless. Or we would ask them for every file to answer y/n/a at runtime, which could lead to unfortunate HPCC runs and otherwise mildly annoying local runs. All that, and what kind of warning should the load/merge/save feature "-f -s" produce, which exists to explicitly overwrite files? Perhaps we could add a runtime file overwriting y/n/a that times out after 10 seconds and overwrites if no answer given. Thoughts?

This issue is related to the concept of experiment management. A typical workflow would be,

  1. (build mabe) compile code and get executable mabe.
  2. generate settings files via '-s'.
  3. modify settings files manually, or through the command line (so mabe modifies the settings file).
  4. run mabe.

During this workflow, mabe doesn't really have a way of knowing if a file is being overwritten intentionally, and if we loop to step 1, the semantics change, i.e we want files to be overwritten, and in fact, not even loaded, since they might no longer be valid.

I feel like the concept of "a complete experiment" should be distinct from "experimenting". What I mean is that the latter would be the workflow describe above, where you can write to files and load from files arbitrarily. Also easily, which means mabe doesn't warn about overwriting, or other potential user or end-user mistakes.
The former would be somewhat similar to what mq.py does, i.e. if mabe were passed the '-e' flag, say, then a new experiment directory (uniquely named) would be created, and all the settings files, or organism files, or tertiary data files(like spatial maps), etc, to be loaded, are placed into a subdir, and this would be a read-only dir. Then mabe would run and save all output files to another subdir, and at the end of the run, the entire exp directory would be made read-only. This means that the experiment is self-contained, and can be shared with others, without worrying about some parts of it being overwritten by another execution of mabe. If one needs the files from this experiment for another experiment, they can still be read, and copied out, but overwrite safety is no longer a concern.

The overwrite issue is broader than just settings files even, since e.g. mabe writes to LOD_data.csv every time, and I have to remember not to overwrite this file every time I run mabe. My personal fix is to move the files around, rename them, make them read-only etc. If mabe could do this with a couple of command line options, that would solve these problems to some extent.

Thoughts?

My workflow is to copy the executable and settings files and then modify settings files in the new location when I want to run a new experiment. It seems to me that there are 2 modes we need to support. Experiment mode - where one is running experiments that will result in publication and Development mode - where one is writing a new world/brain/etc. I'm not sure what is the correct behavior of Experiment mode, but In Development mode, I do not want to be encumbered by a plethora of directories and locked files.

That's an important question, deciding the correct behaviour of 'experiment' mode. At the moment, mabe supports only 'development' mode. So this issue is suggesting an additional feature, and not a change to existing features, correct?

Fixed by #220