aoh/radamsa

unit tests sometimes have sporadic failures

brarcher opened this issue · 4 comments

It has been observed at least on OSX that some of the unit tests sometimes have sporadic failures. Following are some example failures as output by tests/run:

-n o tests/ts1.sh: 
sort: string comparison failed: Illegal byte sequence
sort: Set LC_ALL='C' to work around the problem.
sort: The strings compared were `\'v\217\347' and `\376"i\3317.\345\250V\333w>\346\311\203\034\316=\337~\233n\320\325\005\371\320Sp\301|\247"\036\024\221\247\016\213\222;\256=<c&\3224'.
-n  o tests/tr2.sh: 
sort: string comparison failed: Illegal byte sequence
sort: Set LC_ALL='C' to work around the problem.
sort: The strings compared were `6\030v\317\3133о\366|\263htv9\0173\340\2421\275F\a\232\360DJ\017\233\037:)\241\023\375\350.\272\r<6\201\002\330\203+\005\221#\355$\343\321F\357T\036\264g[>]\344\200Ę\265s\236\031E\302-\220ǰܺob<\210\004\32415\246\300{ˏ\030\270xrژ\335/\243_ވ8\255y\a\177\362\234!\251N\336\322\371\325p\024\f\241\353&#6\371\204\313\020V\031\311\210V\302\004\\\237\374\316\215!i\357s\231,P\373+\346\303\310tX\300\355\177\247R\347u:3bA-\2148\03114\361\271k\241\376\247/\033\271S\\|,\a#\200w\237\374\002\232!\024\316\346\371C\017\370\354˕\343\241\301\244\025\2763\000iÜP\340\021.\001\301\246\304\363\233\266\022!\030\232L\024\204\311K\030\340\3249.\310\354\a_\t\374{j.$0\021q\267\252<\021\023\260\301Z\235m\005\330H\342~\016\t\242\310\303Oڏ\210S\311\177\275\240\345AwQ g\334\370\302\336\021\207\r}`;8\326Ҵ\270.\363q6\325J,\234(\253QƼ\226V\310\301$W\231A\273\033\000\251\274ѥ\321\322\027\320\000뚦{@\277~-\205\343݅E\200\341\032\203\240\027\3338\366z\351CM6\177C\201\312(N\273\346\201d\200\032\177\371*\177\sort: string comparison failed: Illegal byte sequence
sort: Set LC_ALL='C' to work around the problem.
sort: The strings compared were `\rX\266\204(a) (b)' and `\rX\266\204'.

Some of the unit tests when they fail do not emit output to help diagnose the failure. Here is an example invoking tests/ab.sh directly:

$ rc=0
$ attempt=1
$ while [ $rc -eq 0 ]; do tests/ab.sh bin/radamsa ; rc=$?; echo $attempt; attempt=$((attempt+1)); done
1
2
3
4
5
6
7
8
9
10
11
12
$

After 12 attempts a failure was observed, but the reason for the failure is not emitted.

Likely it is the expectation that the unit test results be consistent. If the current revision in git is under development and the sporadic failure is expected or some cleanup is still underway, kindly ignore. I was unable to determine if release v0.4's unit tests encountered sporadic failures as issue #5 affects the v0.4 release.

As a comparison, release v0.3 had consistently passing unit tests.

(As a side note, at least on OSX the built-in echo command in sh does not support the -n option. This is the reason that "-n" is printed before all of the tests in tests/run . Consider reworking when echo is used in that script so that the -n option is unnecessary, if relevant).

aoh commented

Hi,

Great! Build issues are very welcome. I'll fix soonish when I have spare time. My *BSD buildbots are currently offline, so might be that I haven't noticed some issue on BSDish platforms.

There have been many issues with unit tests on OSX due to minor differences in arguments etc. Might make sense to use the version of owl used for building also for sorting and echoing in tests.

aoh commented

So the behavior is intended, because radamsa is supposed to pad the input with a low probability with random data, if the input is very short. This is done to improve coverage of test with tiny fixed inputs. The only issues is with OSX sort getting confused by non-textual data. This does not matter, because the tests are probabilistic and are expected to fail on occasion, in which case they are tried again many times before tests/run considers them to fail. Sort stderr is now piped to /dev/null, so it doesn't get in the way.

Does the build work otherwise on OSX?

The only issues is with OSX sort getting confused by non-textual data.

Oh, I did not realize that the output was not representative of a failure. Sorry for the false alarm on those tests.

I've modified ab.sh as follows to determine why it is sometimes failing:

# check bad string insertion happens as intended (more likely within quoted area)
mkdir -p tmp

echo '-----------------------------------------------------------------""---------------------------------------------------------------------------' \
   | $@ -m ab -p od -n 20 > tmp/ab.sh.tmp
cat tmp/ab.sh.tmp | grep -q '^-*\".*%.*\"-*$'
rc=$?
if [ $rc -ne 0 ]; then
   echo "Unexpected output:"
   cat tmp/ab.sh.tmp
   exit 1
fi

Attached is one the output from one such test run which failed.

ab.sh.tmp

This does not matter, because the tests are probabilistic and are expected to fail on occasion, in which case they are tried again many times before tests/run considers them to fail.

When building Radamsa the unit tests automatically run. As unit tests may sporadically fail, could the unit tests be sectioned into their own make target (e.g. "make check") so that they do not run automatically? Otherwise, one may build Radamsa, see a failure, and wonder if the sporadic failure indicates an issue with Radamsa on one's platform.

Sure, I can understand that Radamsa, being probabilistic, may not always fuzz as expected. However, is it possible to modify the unit tests which may fail with some probability so that the tests will be more deterministic?

Does the build work otherwise on OSX?

Seems to work so far. I've not tried the TCP stuff yet, which looks interesting.

aoh commented

Makes sense. There is now a separate test build target, which runs the tests if necessary. I also added a thanks-section to readme.md.