edf-hpc/verrou

verrou_dd fails with parse errors in the valgrind output

HadrienG2 opened this issue · 3 comments

I am trying to play with verrou's delta-debugging feature, but did not manage to get it to work so far.

Command and associated output (the rm is only there to invalidate the cache):

$ rm -rf dd.sym/ && verrou_dd `pwd`/run.sh `pwd`/cmp.sh
/root/acts-core/build/IntegrationTests/dd.sym/d41d8cd98f00b204e9800998ecf8427e  --( run )->  FAIL(1)
Traceback (most recent call last):
  File "/usr/local/bin/verrou_dd", line 244, in <module>
    main(sys.argv[1], sys.argv[2])
  File "/usr/local/bin/verrou_dd", line 240, in main
    (refSym, confSyms) = ddSym(run, compare)
  File "/usr/local/bin/verrou_dd", line 144, in ddSym
    conf = dd.ddmax(deltas)
  File "/usr/local/lib/python2.7/site-packages/valgrind/DD.py", line 724, in ddmax
    return self.ddgen(c, 0, 1)
  File "/usr/local/lib/python2.7/site-packages/valgrind/DD.py", line 607, in ddgen
    outcome = self._dd(c, n)
  File "/usr/local/lib/python2.7/site-packages/valgrind/DD.py", line 617, in _dd
    assert self.test([]) == self.PASS
AssertionError

run_script argument:

#!/bin/bash
DIR="$1"
WORKDIR="/root/acts-core/build/IntegrationTests"
valgrind --tool=verrou --rounding-mode=random --demangle=no --exclude="$WORKDIR/libm.ex" $WORKDIR/PropagationTests > ${DIR}/results.dat

cmp_script argument:

#!/bin/bash 
REF="$1"
RUN="$2"
diff ${REF}/results.dat ${RUN}/results.dat

My initial exclude rules (the "libm.ex" file):

__sin_fma       /lib64/libm-2.27.so
__cos_fma       /lib64/libm-2.27.so
__tan_fma       /lib64/libm-2.27.so
sincos  /lib64/libm-2.27.so

Contents of dd.sym/d41d8cd98f00b204e9800998ecf8427e/dd.run1/dd.run.err:

==2296== Loading exclusions list from `/root/acts-core/build/IntegrationTests/libm.ex'... OK.
==2296== Verrou, Check floating-point rounding errors
==2296== Copyright (C) 2014-2016, F. Fevotte & B. Lathuiliere.
==2296== Using Valgrind-3.13.0+verrou-1.1.0 and LibVEX; rerun with -h for copyright info
==2296== Command: /root/acts-core/build/IntegrationTests/PropagationTests
==2296== 
==2296== Loading exclusions list from `/root/acts-core/build/IntegrationTests/dd.sym/d41d8cd98f00b204e9800998ecf8427e/dd.exclude'... ERROR (parse)
==2296== First seed : 123030
==2296== Simulating RANDOM rounding mode
==2296== Instrumented operations :
==2296==        add : yes
==2296==        sub : yes
==2296==        mul : yes
==2296==        div : yes
==2296==        mAdd : yes
==2296==        mSub : yes
==2296==        cmp : no
==2296==        conv : no
==2296==        max : no
==2296==        min : no
==2296== Instrumented scalar operations : no
==2296== FATAL: in suppressions file "/usr/local/lib/valgrind/default.supp" near line 1:
==2296==    expected '{' or end-of-file
==2296== exiting now.

The contents of /root/acts-core/build/IntegrationTests/dd.sym/d41d8cd98f00b204e9800998ecf8427e/dd.exclude and /usr/local/lib/valgrind/default.supp are available at https://gist.github.com/HadrienG2/286e46a5f474ddcd73017e7815d19cf0 .

My best guess so far is that either Valgrind or Verrou is overwhelmed by the remarkably verbose output of g++'s name mangler. But that is only a guess.

Do you have any suggestions of where to start in order to debug this further?

Effectively there is a problem with the long names. The branch long_name solve a problem that I've created with long name.
But I'm not sure it will solve your problem. Indeed the file dd.exclude is really strange (few tabulations between the symbol name and the object name disappeared). I've really no idea how to reproduce that...
Nevertheless I suggest the use the format "* /lib64/libm-2.27.so" for libm.ex to avoid to forget libmath symbol. Indeed In the file dd.exclude we can see at least __dubsin_fma _dubcos_fma __docos_fma are missing.

If the branch long_name and the new libm.ex do not solve the problem, I will need more information to reproduce the problem.

I can confirm that the long_name branch resolves the problem for me :) Switching to it for now, please feel free to close this ticket once the branch has been merged.

The explicit libm.ex was just an experiment of mine to see which libm functions would actually become numerically unstable under verrou (as far as my tests see it, of course).

This should be fixed in bfedcfb