Reproducibility, associativity, and deep variability

Reproducibility: software experiments with damn simple associativity and deep variability

Do you agree that x+(y+z) == (x+y)+z? Well, let's see...

Here are the current implementations:

configurable Python implementation with seed and number
configurable Java implementation with basic (float), double or math
configurable C implementation with custom (optional) + windows or linux having an effect on random primitives. We compile with gcc, i686-w64-mingw32-gcc (when needs be/possible), and clang
configurable Rust implementation with compile-time options (associativity, multiplication inverse with and without Pi) and run-time options with optional error margin over equality
LISP implementation
configurable JavaScript implementation with seed number (and actually the surprising global seed) and equality-check (associativity, multiplication inverse with and without Pi)
configurable Bash implementation with equality-check using -e (associativity, multiplication inverse with and without Pi)
configurable Swift implementation with seed, --number, and --equality-check
configurable Ocaml implementation with seed (optional), --number, and --equality-check
configurable Julia implementation with seed (optional), --number, --equality-check, and stric-equality
configurable R implementation with seed (optional), number, and eq_check
configurable Go implementation with seed (optional), number, and equality-check
configurable Perl implementation with seed (optional), number, and equality-check

All implementations (but LISP until now) support parameterization of the number of random generations. Executions are repeated 10 times by default (min, max, average, std reported).

To execute all variants and gathered results into a CSV: export WINEDEBUG=-all; python eval.py > results.csv; # do something with data like rich results.csv (note: eval.sh is deprecated and replaced by eval.py)

(Meta|Multi)morphic testing

It's also possible to perform a kind of metamorphic testing across variants (see multi_testing.py). By metamorphic testing, we mean here checking the two following (metamorphic) relations:

(MR1) whenever there is a triplet x, y, z that fails to hold the equality (e.g., associativity) for a given variant (e.g., Python), this triplet should also fail for another variant (e.g., JavaScript)
(MR2) whenever there is a triplet x, y, z that succeeds to hold the equality (e.g., associativity) for a given variant (e.g., Python), this triplet should also succeed for another variant (e.g., JavaScript)

At the moment, we have extended the Python variants and JavaScript variants in such a way both support --check-case (for verifying a triplet w.r.t. an equality relation) and --failing-cases (resp. --success-cases) for synthesizing a set of triplets that fail (resp. succeed) to respect the equality relation (associativity, multiplication inverse, multiplication inverse with Pi). Hence, we can envision four scenarios:

the failing cases as generated by Python are also failing in JavaScript
the failing cases as generated by JavaScript are also failing in Python
the success cases as generated by Python are also success in JavaScript
the success cases as generated by Python are also success in JavaScript (cases are triplets)

Resources

Perl

cpan install Getopt::Long enum

Brainfuck

https://esolangs.org/wiki/Random_Brainfuck https://esolangs.org/wiki/Brainfuck_algorithms#x_.3D_pseudo-random_number https://twitter.com/acherm/status/1634238174879703040

Scratch

Is x+(y+z) == (x+y)+z true in Scratch? Well, it depends on the upper bound used when randomly generating a value for y (and x and z) considering the example in scratch/testassoc.sb3. You can import in Scratch using https://scratch.mit.edu/projects/editor/. There are surprising results, considering variations over the y upperbound:

with value 100000000000000000000000000000000000, ncorrect = ~730;
with the value: 1e53, ncorrect = 1000 (100%), perfect! ;
with (large) values in-between (play with the slider!), almost perfect (999 or 997 out of 1000) but not perfect...
with specific value 1000000000000000000000000, ncorrect=1000 (out of 1000), so 100% (perfect).

note: for Scratch, it's hard to build a generator and systematize the exploration... It's at the moment mostly for exploring what's going on and hopefully find a comprehensive explanation.

C

To cross-compile for Windows from Linux with i686-w64-mingw32-gcc, specific packages are needed (e.g. on Fedora mingw64-gcc.x86_64). The combinatorial is roughly (but in fact there are much more variation points and variants):

gcc -o testassoc-l testassoc.c
gcc -o testassoc-lc testassoc.c -DCUSTOM
i686-w64-mingw32-gcc -o testassoc-w testassoc.c -DWIN
i686-w64-mingw32-gcc -o testassoc-wc testassoc.c -DWIN -DCUSTOM

acherm/reproducibility-associativity

Reproducibility, associativity, and deep variability

(Meta|Multi)morphic testing

Resources

General

Rust

LISP

C++

Julia

Perl

Brainfuck

Scratch

C