ftsrg/gazer

Support large int literals in Theta backend

Opened this issue ยท 7 comments

Previously Theta could only parse 32-bit integer literals, but with 1.7 (see #29) it should not be a limitation. We should check if Gazer has to be modified internally to be able to produce larger literals.

Gazer converts integer type (short, long, signed, unsigned, etc.) variables in the given C code to int in the generated CFA. Theta 1.7 handles these as large int literals, so the new functionality seems to work without any internal change in Gazer.

There is a nuance after 1<<63ull: it does not fit the range of int64_t (as it is used in IntLiteralExpr).
So the only problem is constant unsigned long long values larger than that can fit in a signed long long.

I'd reopen this as a bug or create a new issue in Gazer, whichever is more appropriate :)

Reopening seems cleaner to me in this case.

Thanks @radl97 for pointing this out. What could you imagine as a solution? Replacing Gazer's internal representation with some bigint (e.g. from boost)? Furthermore, is this also an issue for the BMC backend?

LLVM has some solution, I think. For example APInteger maybe, which stores bitwidth with the value.

Note: I would skip Boost, as it is only used on the outer layers, mostly for filesystem handling.

Without giving much thought, I think BMC is affected when using Integer representations (instead of the default(?) bitvectors).

Disclaimer: Both my claims against BMC and Theta modules are untested, I've only seen how IntLitExpr stores integers, which led me to think the problem exists. Further investigation may show otherwise.

I gladly take up the quest, if you'd like :) (both verifying and solving the issue)

@radl97 if you feel like doing it, please go ahead. You can create a new branch and when you're done submit a PR. It would also be nice to come up with a simple test that fails currently, but will be fixed by the patch.

@radl97 Indeed, we currently store large int literals in int64_t. I think we can handle this either with llvm::APSInt (like APInt, but also stores sign information) or boost::multiprecision. Both solutions are fine by me.

Note that you probably will have to update the expression hashing mechanism for int literals in GazerContextImpl.h, otherwise you might get a good number of undebuggable template errors :) The default implementation for primitive literals uses llvm::hash_value, probably we will need a specialization for llvm::hash_value and/or expr_hasher (if deciding equilitiy is not trivial).