optimization exhibits non-deterministic behavior
Opened this issue · 4 comments
Sometimes, the behavior of the optimization pipeline seems to be non-deterministic.
Example:
./build/bin/thorin -d mem -o - lit/mem/no_mem.thorin -VVVV
in
https://github.com/NeuralCoder3/thorin2/tree/ad_ptr_merge
702d848
The issue might be due to the add_mem optimization, the pipeline builder, or an underlying bug in thorin.
This behavior might also be a side effect of the previous (not merged yet) changes to mem and clos conv with long-reaching impact that did not manifest up to now.
Yes. this is super annoying. Another source is this:
world.app(emit1(), emit2());
It's implementation defined whether emit1()
is happened first or second. This code has different behavior on different compilers/OS's.
I have implemented the --trace-gids
switch that we could somehow use to test for this in our CI.
The issue happens only sometimes on with the same executable on the same computer in the same cirumstances.
Therefore, timing issues or randomness might be the cause.
Probably related issue:
./build/bin/thorin -d matrix -d affine lit/matrix/mapReduce_mult.thorin -o - -VVVV
in matrix_dialect
f3a3def
sometimes generates thorin code and sometimes prints the following error:
:4294967295: error: cannot pass argument
'(__806508#2:(.Idx 3), ‹__806508#2:(.Idx 3); .Idx 4294967296›, 0)' of type
'[.Nat, «__806508#2:(.Idx 3); ★», .Nat]' to
'%mem.lea' of domain
'[n_834521: .Nat, _834535: «n_836768; ★», _834540: .Nat]'
which seems odd to me as the arguments are of the style
(n, <n; T>; 0)
which should be the type
[n:.Nat, <<n; *>>; .Nat]
which should agree with lea.
Was fighting this issue in #184 as a Debug build produced different outputs as the Release one
- 05e833b
A few asserts created new Defs resulting in slightly different behavior between Debug and Release builds. This commit fixes the issue. - 2997a1d
This one fixes a subtle problem when aDef
has coincidentally the same name as an externalDef
.
As mentioned above --trace-gids
and --reeval-breakpoints
helped me tracking down the problem. We could probably write a test case with some non-trivial code, run it with --trace-gids
and double-check in our CI that all builds produce the same output.