
discrepancy between sark.Line(ea=foo).is_tail and is_tail(idc.GetFlags(ea)), also for is_code

Opened this issue · 9 comments


First off. Sark is what I've always wished the IDA API looked like. Thank you so much for reigniting the joy and excitement of binary analysis!! The graph API looks extremely powerful, and the concepts of Sark feel very well thought out. Hats off!

Now, for the issue I happened to stumble upon today, when exploring what the Sark world had to offer.

It seems there is currently a discrepancy between how IDA and Sark handles is_code and is_tail (I did not check is_data, but it would make sense to check that too as part of this issue).

The proof of concept is as follows:

[004785A4]    test    esi, esi
[004785A4]    test    esi, esi

I can't share this database, however, I hope this should be easy to reproduce. If not, just ping me and we'll do some troubleshooting together.

Oh, and for the record, this is at IDA 7.3, using the 6.x branch of Sark (downloaded today, so should be the latest).


This definitely looks like a bug, I'll have to look into it.
Sorry for the delayed response...

Hi Tamir,

No worries, it's important to take time off also from hobby projects.

Hope you've had a good start of the new year :)

Happy coding!

Just looked into it again - I don't think I consider it a bug.
Sark's line normalizes the address, as it represents a line, not an address. It cannot be used to point to the middle of an instruction.
This means that any attempt to use it on such an address will fail.

Why is this needed?

Just looked into it again - I don't think I consider it a bug.

Thanks for taking a look at this. I appreciate that Sark is trying to do the right thing by normalizing addresses. I must have missed this in the documentation.

Sark's line normalizes the address, as it represents a line, not an address. It cannot be used to point to the middle of an instruction.

Given that Sark normalizes the address (to represent a valid line), it would make sense to remove the is_tail method from the Sark API, considering that is_tail is only ever used to report when an address is in the middle of an instruction, something that would be impossible with normalized addresses.

So to avoid confusion, I'd suggest removing the is_tail function from the Sark API.

At least, this would be my understanding, but I might have misunderstood the function of is_tail ^^

Why is this needed?

I was using is_tail to determine if an address was the start of an instruction, or if it was just part of a multi-byte instruction.

It seems that you are correct.
As for the check you want to perform - while it will be costly using Sark, you can always compare the address to the address of the Line you build from it. But direct access will probably be faster.

As for the check you want to perform - while it will be costly using Sark, you can always compare the address to the address of the Line you build from it.

Thanks, that work-around would suffice.

It seems that you are correct.

Ok, so would you agree that removing is_tail from Sark is the way forward?

I am willing to add a deprecation warning, but since some code may already be using it, I'd rather not just break people's code.

But yeah, I agree that it should eventually be removed as it is misleading.