m4rw3r/chomp

Returning the entire slice matched by a chain of parsers

Closed this issue · 3 comments

Is there a clean way to use the parse! macro and return the entire slice that was matched? Currently, I do something like this:

// An identifier is an alphanumeric string that doesn't start with a digit.
fn identifier<I: U8Input>(i: I) -> SimpleResult<I, ()> {
    parse!{i;
        satisfy(is_alpha);
        take_while(is_alphanumeric);

        ret ()
    }
}

// An alias definition is two identifiers separated by an equals sign, e.g. "foo=bar".
fn alias<I: U8Input>(i: I) -> SimpleResult<I, (I::Buffer, I::Buffer)> {
    parse!{i;
        let (left, _)  = matched_by(identifier);
                        token(b'=');
        let (right, _) = matched_by(identifier);

        ret (left, right)
    }
}

It would be nicer if alias didn't have to use matched_by and could just say let left = identifier(). Does chomp provide a good way of doing this?

There are two solutions to this: a) move mached_by into identifier or b) make a stateful closure in identifier.

The first method is more flexible and should result in almost the exact same code unless the backtracking operation of the Input is expensive (it is free on slices, and it is not supposed to be expensive in general):

fn identifier<I: U8Input>(i: I) -> SimpleResult<I, I::Buffer> {
    matched_by(i, parser!{
        satisfy(is_alpha);
        skip_while(is_alphanumeric)
    }).map(|(b, _)| b)
}

Note the change from parse! to parser!, parser!{...} is just |i| parse!{i; ...}, a shorthand for making local parsers. I also changed take_while to skip_while since take_while produces a result, this is not problematic in the least for slice inputs (or buffered slices like chomp::buffer) but some owned type could have an overhead when allocating the unused Buffer implementation. The ret is not needed in this case since we have map to just take the buffer.

The stateful closure is straightforward too, but not as clean (but could be more useful in certain situations since matched_by needs to backtrack):

fn identifier<I: U8Input>(i: I) -> SimpleResult<I, I::Buffer> {
    let mut first = true;

    take_while1(i, |c| if first { first = false;  is_alpha(c) } else { is_alphanumeric(c) })
}

EDIT: Fixed typo take_while -> take_while1

Hope this helps!

Thank you, that helps a lot! That's the solution I was looking for with matched_by, I just didn't know how to put the pieces together.

Awesome! :)