rust-bakery/nom

unsigned integer underflow

mxyns opened this issue · 3 comments

mxyns commented

Hello,

While using nom I encountered an issue with this impl of Offset for [u8]

nom/src/traits.rs

Lines 599 to 606 in 8c68e22

impl Offset for [u8] {
fn offset(&self, second: &Self) -> usize {
let fst = self.as_ptr();
let snd = second.as_ptr();
snd as usize - fst as usize
}
}

The return value underflows when the first span is after the second in memory. This means the user always has to think about whether to use first.offset(second) or second.offset(first):

fn main() {

    let buf = [0,1,2,3,4,5,6,7,8,9];
    let mut first: LocatedSpan<&[u8]>= LocatedSpan::from(&buf[..]);
    let mut second = some_parser::read(first);

    let offset = first.offset(&second);
    println!("offset is negative: {}", offset);

    let offset = second.offset(&first);
    println!("offset is positive: {}", offset);
}

To me, an offset means it may be negative but maybe changing the return type of the offset method to isize may be problematic elsewhere, I don't know a lot about the internals of nom.

Maybe just doing an absolute value of the result could be enough? or let Offset have a signed version of the offset method?

Seeing the same issue. Also, I am not sure if "user error" is the right connotation here. The trait seems pretty broken if you ask me, as it appears to make the assumption that self and second are related, but that is not enforced or even documented anywhere to the degree I can tell. Okay, perhaps one can guess that relationship by what it does. And yet, this functionality is used in higher level APIs where this contract seems lost entirely.

E.g., this program panics to due over/underflow:

use nom::error::ErrorKind::Tag;
use nom::error::VerboseErrorKind::Nom;
use nom::error::convert_error;
use nom::error::VerboseError;

fn main() {
        let input = [31, 139, 8, 8, 85, 135, 48, 102, 2, 255, 108, 105, 98];
        let err = VerboseError {
            errors: vec![(String::from_utf8_lossy(&input), Nom(Tag))],
        };
        let _x = convert_error(String::from_utf8_lossy(&input), err);
}

the problem isn't really obvious at all. And in a general setting, neither may the fix be (on the user's end).

Geal commented

@mxyns @danielocfb as you saw, Offset assumes that the argument is part of the same slice and that should be documented. I looked at the change in blazesym is why the lossy conversion step would be needed in the first place? Because convert_error was not available for &[u8] inputs?

I looked at the change in blazesym is why the lossy conversion step would be needed in the first place? Because convert_error was not available for &[u8] inputs?

Yes, that is the reason. The function assumes something that derefs into str. So I we need some kind of conversion first.

Btw., I accidentally commented what is not the most fitting issue, I think #1619 actually covers this very problem already.