angr/claripy

BVS sizing weirdness

Closed this issue · 11 comments

Hello!

I'm noticing some weird and potentially unwanted behavior when creating BVS. After you create a (x86) project and setup the state, if you run:
lorem_ipsum = "« Lorem ipsum dolor sit amet, consectetur adipisci elit, sed eiusmod tempor incidunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrum exercitationem ullam corporis suscipit laboriosam, nisi ut aliquid ex ea commodi consequatur. Quis aute iure reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint obcaecat cupiditat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. »« Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam eaque ipsa, quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt, explicabo. Nemo enim ipsam voluptatem, quia voluptas sit, aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos, qui ratione voluptatem sequi nesciunt, neque porro quisquam est, qui dolorem ipsum, quia dolor sit, amet, consectetur, adipisci velit, sed quia non numquam eius modi tempora incidunt, ut labore et dolore magnam aliquam quaerat voluptatem. Ut enim ad minima veniam, quis nostrum exercitationem ullam corporis suscipit laboriosam, nisi ut aliquid ex ea commodi consequatur? Quis autem vel eum iure reprehenderit, qui in ea voluptate velit esse, quam nihil molestiae consequatur, vel illum, qui dolorem eum fugiat, quo voluptas nulla pariatur? [33] At vero eos et accusamus et iusto odio dignissimos ducimus, qui blanditiis praesentium voluptatum deleniti atque corrupti, quos dolores et quas molestias excepturi sint, obcaecati cupiditate non provident, similique sunt in culpa, qui officia deserunt mollitia animi, id est laborum et dolorum fuga. Et harum quidem rerum facilis est et expedita distinctio. Nam libero tempore, cum soluta nobis est eligendi optio, cumque nihil impedit, quo minus id, quod maxime placeat, facere possimus, omnis voluptas assumenda est, omnis dolor repellendus. Temporibus autem quibusdam et aut officiis debitis aut rerum necessitatibus saepe eveniet, ut et voluptates repudiandae sint et molestiae non recusandae. Itaque earum rerum hic tenetur a sapiente delectus, ut aut reiciendis voluptatibus maiores alias consequatur aut perferendis doloribus asperiores repellat. »" print(len(lorem_ipsum)) # outputs 2195 state.regs.edx = state.solver.BVS(lorem_ipsum,64)

You will notice that edx is BV of size 32, however, the args contain a BVS of size 64 (which is not the actual size of the text):
image

Shouldn't claripy verify the size of the text to ensure that it matches what the user has input? Shouldn't it also not allow a BVS of size greater than 32?

Any clarifications would be greatly appreciated.

Thank you,
-Era

Hi,

A BVS' semantic value has nothing to do with its name. If you want a bitvector which has the value of some text, you should use BVV.

With respect to being able to store a 64-bit value to a 32-bit location, I believe the default behavior is to truncate the data for storage. I would need to think about whether it would be better to throw an exception. Feel free to make a pull request changing the behavior and we can see what breaks in CI.

I would need to think about whether it would be better to throw an exception.

I'm sure that will break existing code...

If I change it to BVV, I do get an error for size mismatch. Isn't this how BVS should also work? Or is size not checked because it's a symbol, so that could be complicated to determine?

The parameter you pass to the first argument of BVS is not a value, it's a name. It's purely used for identifying the symbol during debugging and has no bearing on how emulation treats it.

oh, right! I got confused about it. However, is it safe to have the name be so long? I would think that it would increase memory consumption over time and cause other issues...

And I'm still not sure why it would accept a bigger value into a register that has a smaller content. Is this intentional?

It is safe; the only overhead should be the time to serialize it into z3.

It's not intentional per se, it's just a consequence of the way we wrote the code a long time ago, and now we probably have other code that depends on it. Like I said, if you can make the change work with all our tests I'm happy to take it.

This issue has been marked as stale because it has no recent activity. Please comment or add the pinned tag to prevent this issue from being closed.

This issue has been closed due to inactivity.