Parallel verrou_dd crashes with ValueError
HadrienG2 opened this issue · 3 comments
When I run verrou_dd in parallel on a test workload of mine, it systematically crashes on the second iteration with this kind of backtrace. Sequential runs work fine on the same workload.
$ VERROU_DD_NUM_THREADS=4 VERROU_DD_NRUNS=4 verrou_dd `pwd`/run.sh `pwd`/cmp.sh
[...]
dd (run #1): trying 6275 + 6275
/root/acts-core/build/IntegrationTests/dd.sym/ca2681d399ee504572a37d53b1416f6f --( run )->
Traceback (most recent call last):
File "/usr/local/bin/verrou_dd", line 633, in <module>
main(runScript, cmpScript, algoSearch=ddAlgo)
File "/usr/local/bin/verrou_dd", line 605, in main
(refSym, confSymsTab) = ddSym(run, compare)
File "/usr/local/bin/verrou_dd", line 438, in ddSym
conf = dd.ddmax(deltas)
File "/usr/local/lib/python2.7/site-packages/valgrind/DD.py", line 733, in ddmax
return self.ddgen(c, 0, 1)
File "/usr/local/lib/python2.7/site-packages/valgrind/DD.py", line 607, in ddgen
outcome = self._dd(c, n)
File "/usr/local/lib/python2.7/site-packages/valgrind/DD.py", line 670, in _dd
(t, cs[i]) = self.test_mix(cs[i], c, self.REMOVE)
File "/usr/local/lib/python2.7/site-packages/valgrind/DD.py", line 580, in test_mix
directionbar)
File "/usr/local/lib/python2.7/site-packages/valgrind/DD.py", line 384, in test_and_resolve
t = self.test(csubr)
File "/usr/local/lib/python2.7/site-packages/valgrind/DD.py", line 313, in test
outcome = self._test(c)
File "/usr/local/bin/verrou_dd", line 409, in _test
return vT.run()
File "/usr/local/bin/verrou_dd", line 127, in run
return self.runParMax(maxNbPROC)
File "/usr/local/bin/verrou_dd", line 202, in runParMax
run=self.pidRunTab.index(pid)
ValueError: 50 is not in list
My test workload is a bit complicated, but I have it inside of a docker container if that can be useful. Or maybe we can find a simpler reproducer.
This is somewhat related to #8 , in the sense that if the end decision is to change the verrou_dd parallelization algorithm, it may not be worth expending too much energy at fixing the existing one.
I think you get a high score with 12500 symbols.
It looks like a bug in your python scheduler which I want to write again with python3.
If you can keep this workload test for latter, I'm interested.
That's a C++ binary that uses boost + Eigen and is built in O0 mode. I'm not surprised that the symbol table got crazy :) Please ping me when you are done with the python3 port, and it will be my pleasure to torture it as well.
Since v2.3.1, the delta-debug should be robust enough to treat this problem. If not you can open
a new issue.