verrou_dd usability issues
HadrienG2 opened this issue · 6 comments
Besides #6 , another issue which I have with verrou_dd is that it is a bit difficult to use correctly. Here are some suggestions of quality-of-life improvements:
-
verrou_dd
should accept relative paths to the run_script and cmp_script -
verrou_dd
should run the scripts from its initial working directory in order to remove the need for cumbersome absolute paths inside of said scripts => Probably not a good idea, as applications may leave state around in the working directory and we aim for independent runs. -
verrou_dd
could be better at detecting when the dd.sym cache must be invalidated. => Moved to #9 -
verrou_dd
's interface could be redesigned so that it is the one responsible for running verrou (or puts the proper valgrind command in an environment variable). This way, one would not need to repeat the valgrind command line in every run_script => Probably does not save enough characters to be useful, considering verrou_dd's requirements - As
verrou_dd
can take a lot of executions to converge, a way to state "my scripts are thread-safe, please fill up my idle CPU cores by testing multiple configurations in parallel" would be very nice. => In progress, parallel configuration testing is not there yet but parallel runs are in, see #8 - Error reporting could be better. For example, "internal error" is not a very explicit way to say that an inconsistency was detected in the dataset.
Note that these ideas are based on my experience with verrou v1.1.0, but looking at the upcoming release notes, it looks like some of them are already in the process of being addressed.
Thanks a lot for your insight. For future reference, here are a few thoughts/comments about the issues you mention.
-
should be fixed by 8155d3d.
-
probably won't be addressed. One of the reasons why is related to point 6.: running all scripts in the same initial directory would make it difficult for a lot of programs to meet the "independent runs" requirements.
-
good idea. Although not planned yet, it will probably be addressed in a future release. Do you have any specific idea / use case in mind? There are a few detectable reasons why the cache should be invalidated (such as changes in the run_script or cmp_script). But there are also a lot of undetectable reasons: bad luck in random perturbations that wrongly makes a configuration valid, changes in any program or data file used in the run_script of cmp_script, etc. I don't see any way to handle such cases without asking the user to manually empty the cache.
-
verrou_dd
should probably not be responsible for running verrou, since (i) the user should be able to specify any command-line option they want, and (ii) not all commands in the run_script should be executed within valgrind/verrou. If I understand your proposition correctly, exporting a$VERROU
environment variable containing the correct command to run would be doable, but this would always contain something likevalgrind --tool=verrou
, so that I'm not sure it would be very useful. -
good idea. It is already in the TODO list for a future release (probably the release after v2.0.0).
-
should be at least improved by 8155d3d (but there might still be some ambiguous error reporting).
Regarding point 5 (verrou_dd
parallelization), parallelization at the scale of the global DD algorithm is not yet available (but is on the TODO list as mentioned above). However, commit 536569f introduced a feature to enable parallelism at the scale of the test of a given configuration.
When the VERROU_DD_NUM_THREADS
environment variable is set, up to VERROU_DD_NUM_THREADS
tests are run in parallel to determine if a given configuration is stable or not. This may speed up the test for stable configurations (since VERROU_DD_NRUNS
runs must be performed in this case to validate the configuration). For unstable configurations, it does not save as much time: if a given configuration fails to validate the cmp_script at its first run, there would have been no need to launch many runs in parallel to get the same result.
Of course, if you have enough computational power, the optimal parallel setting is obtained when VERROU_DD_NUM_THREADS
=VERROU_DD_NRUNS
.
- Thanks a lot for this!
- Ah, yes, this is true, I did not think about the fact that many applications may leave cached state around in the working directory. Scratch this idea then.
- I can think of two paths towards improving the cache invalidation situation in the face of e.g. binary recompilation or configuration changes. One that is sane but unsatisfying, and one that is a little bit insane, but has more potential :)
- Sane-but-meh option: Invalidate the cache by default when a top-level verrou_dd command is run. If the command calls itself recursively, make sure that the recursive invocations do not invalidate the cache. This can be done by adding an optional "--reuse-cache" script option, which will also allow users who know what they are doing to reuse a cache from a previous run.
- Insane option: Well, in principle, one could strace run-script or cmp-script calls in order to tell which files the run_script and cmp_script recursively depend on, and then compute a hash of them to detect dependency changes... ^^
- You're right that this one was probably not such a good idea either.
- I have quickly played with VERROU_DD_NUM_THREADS on the long_name branch. At the time, I noticed two things:
- As you explain, the current implementation effectively puts a processing barrier after each configuration is tested. It would be nice to be able to run tests of multiple configurations in parallel, although I can imagine this will require quite a bit of changes in the task scheduling and logging infrastructure (on the scheduling side you need to schedule runs more lazily and deal with concurrent configs, on the printing side you would need advanced VT-100 hackery to preserve the current output).
- I observed a couple of transient issues where functions were mis-tagged as numerically unstable. At the time I attributed this to my code, but since then I have done a sequential run on the same program without issues. Maybe there are some race conditions around in the current task scheduling code, maybe there was a bug which has since been fixed, I will try investigate this further and tell you what I find.
- There is still an "internal error" message in the DDsym.allDeltaFailedMsg() function of verrou_dd. But I have not explored what error is being reported by this function yet, maybe that's an appropriate error message in this case.
These remaining two issues start to stray a bit apart from the original ergonomics focus, so maybe it would be better to split them into two new tickets. Shall I do that?