angr/claripy

`Bits`' `__init__` changes `.length` after hashes are calculated

zwimer opened this issue · 6 comments

When creating a Bits AST object, the __init__ function sets self.length; in some cases this does not change anything, but in some cases the newly set length is different from the old length.

This occurs after Base's __new__ runs, and thus it might change the internals of a base, after Base calculates the hash of the object to store it inside the lookup cache.

Note that .length is a HASHCONS variable, it affects the hash.

Actually, this looks like an interaction with strings.py's __init__.

Note that here strings edits kwargs:

kwargs['length'] *= 8

Bits' self.length = kwargs['length'] would be a no-op if kwargs were not edited between Bits __init__ and Base __new__.

Suggested solution:

  1. Refactor all String ops not to use bit lengths rather than byte lengths.
  2. Remove kwargs['length'] *= 8 from String's __init__ (really this is part of 1)
  3. Remove self.length = kwargs['length'] from Bits __init__. If this is done without doing 1 & 2 then string support will break

This can be triggered from within Base's new function via the concrete backend constructing an object, it being returned from new, then edited after __init__ runs.

Note: fixing it might require refactoring Base's __new__ method as well. Specifically, Base can call the concrete backend internally to construct an object. But Backend.call only takes in op and args. For some ops, length is not stored in either. For example, a StringV might have args ('12', 2), but length 80000- such as the result of an simplified IntToStr operation (you can find this is test_backend_smt.py).

Since Backend.call cannot accept args beyond op and args, the StringV to be constructed will have the size of .args[1] * 8 (the bit-length of the python str held inside the StringV) rather than the StringV length of .length.

Suggested fix: Make StringV and any other ops that fall into this category hold both lengths or only the .length length. Alternatively call could take in kwargs, but that feels like it is inviting abuse.

Proposed fixes:

  1. In def StringV in ast/strings.py, change the following line from/to:
    result = String("StringV", (value, len(value)), length=length, **kwargs)
result = String("StringV", (value, length), length=8*length, **kwargs)
  1. Prevent String.__init__ from changing kwargs['length'].
  2. Prevent Bits.__init__ from changing self.length.

How this works:

  1. The String constructor now takes in the byte length as arg[1], which means things like the Backend.call can use the proper length rather than the length of the python string with implicit trailing zeros ignored. Additionally, the kwargs['length'] passed to Base.__new__ will be in bits from the start.
  2. No longer interact to edit the length after it's been set in Base.__new__ and hashed
  3. Same for point 3.

What this will change:
my_string.length will be in bits; for bytes, use my_string.str_length or my_string.args[1] (both should be byte length)

Additionally, StringS must also 8* length.