`Bits`' `__init__` changes `.length` after hashes are calculated
zwimer opened this issue · 6 comments
When creating a Bits
AST object, the __init__
function sets self.length
; in some cases this does not change anything, but in some cases the newly set length is different from the old length.
This occurs after Base
's __new__
runs, and thus it might change the internals of a base, after Base
calculates the hash of the object to store it inside the lookup cache.
Note that .length
is a HASHCONS
variable, it affects the hash.
Actually, this looks like an interaction with strings.py
's __init__
.
Note that here strings
edits kwargs
:
claripy/claripy/ast/strings.py
Line 31 in 696f1c0
Bits
' self.length = kwargs['length']
would be a no-op if kwargs
were not edited between Bits
__init__
and Base
__new__
.
Suggested solution:
- Refactor all String ops not to use bit lengths rather than byte lengths.
- Remove
kwargs['length'] *= 8
fromString
's__init__
(really this is part of 1) - Remove
self.length = kwargs['length']
fromBits
__init__
. If this is done without doing 1 & 2 then string support will break
This can be triggered from within Base
's new function via the concrete backend constructing an object, it being returned from new, then edited after __init__
runs.
Note: fixing it might require refactoring Base
's __new__
method as well. Specifically, Base
can call the concrete backend internally to construct an object. But Backend.call
only takes in op
and args
. For some ops, length
is not stored in either. For example, a StringV
might have args ('12', 2)
, but length 80000
- such as the result of an simplified IntToStr
operation (you can find this is test_backend_smt.py
).
Since Backend.call
cannot accept args beyond op
and args
, the StringV
to be constructed will have the size of .args[1] * 8
(the bit-length of the python str
held inside the StringV
) rather than the StringV
length of .length
.
Suggested fix: Make StringV
and any other ops that fall into this category hold both lengths or only the .length
length. Alternatively call
could take in kwargs
, but that feels like it is inviting abuse.
Proposed fixes:
- In
def StringV
inast/strings.py
, change the following line from/to:claripy/claripy/ast/strings.py
Line 154 in acc6e38
result = String("StringV", (value, length), length=8*length, **kwargs)
- Prevent
String.__init__
from changingkwargs['length']
. - Prevent
Bits.__init__
from changingself.length
.
How this works:
- The
String
constructor now takes in the byte length asarg[1]
, which means things like theBackend.call
can use the proper length rather than the length of the python string with implicit trailing zeros ignored. Additionally, thekwargs['length']
passed toBase.__new__
will be in bits from the start. - No longer interact to edit the length after it's been set in
Base.__new__
and hashed - Same for point 3.
What this will change:
my_string.length
will be in bits; for bytes, use my_string.str_length
or my_string.args[1]
(both should be byte length)
Additionally, StringS
must also 8*
length.