I want to implement `separated_tuple` for 8.0
BGR360 opened this issue · 7 comments
I'm using nom
for Advent of Code 2023 and a pattern I keep using over and over again is "whitespace-separated tuple".
let parser = tuple((
thing1,
char(' '),
thing2,
char(' '),
thing3,
)).map(|(thing1, _, thing2, _, thing3)| (thing1, thing2, thing3));
I ended up implementing a separated_tuple
combinator so that code can be much cleaner:
let parser = separated_tuple(space1, (thing1, thing2, thing3));
Here is my ugly, undocumented implementation that I'm using for AoC.
I think the implementation is complex enough and useful enough to warrant inclusion in nom
rather than in a third party utility library like nom-supreme
.
I would like to contribute a cleaner, more user-ready implementation and submit a pull request.
However, after cloning the repo, I noticed that an 8.0.0 is in the works with lots of changes. So my question to you is:
how would a separated_tuple
combinator best fit into 8.0.0?
If you give me an idea for how to implement it in the 8.0 paradigm, I will happily send out a PR.
For myself, I find there are a lot of unpredicatble ways I want to skip fields in a tuple. Having something like https://docs.rs/combine/latest/combine/macro.struct_parser.html that also applies to tuples could help and be a more general form of this proposed combinator.
Hello, @BGR360!
I don't think there's much need to come up with a separated_tuple
function specifically, we can use separated_list
instead of tuple
.
fn separated_tuple(s: &str) -> IResult<&str, Vec<&str>> {
map(
separated_list1(
multispace1,
alt((tag("thing1"), tag("thing2"), tag("thing3"))),
),
|x| x,
)(s)
}
It's easy to def a function with same functions while no extra _
.
FULL CODES BELOW:
use nom::{
branch::alt, bytes::complete::tag, character::complete::multispace1, combinator::map,
multi::separated_list1, IResult,
};
fn separated_tuple(s: &str) -> IResult<&str, Vec<&str>> {
map(
separated_list1(
multispace1,
alt((tag("thing1"), tag("thing2"), tag("thing3"))),
),
|x| x,
)(s)
}
fn main() {
assert_eq!(
separated_tuple("thing1 thing2 thing3").unwrap().1,
vec!["thing1", "thing2", "thing3"]
);
}
@coalooball That will not work when the tuple elements are not of the same type. Plus, even if they were the same type, this doesn't provide any type-safe guarantee that I'm parsing exactly N things.
What are your thoughts on the more general idea of
let (i, (value1, value2, value3)) = seq!(
thing1,
_: space1,
thing2,
_: space1,
thing3,
).parse(i)?;
@coalooball That will not work when the tuple elements are not of the same type. Plus, even if they were the same type, this doesn't provide any type-safe guarantee that I'm parsing exactly N things.
Regarding the first point, the method I provided allows heterogeneous things:
fn separated_tuple(s: &str) -> IResult<&str, Vec<&str>> {
map(
separated_list1(
multispace1,
alt((
tag("thing1"),
delimited(tag("\""), alphanumeric1, tag("\"")),
tag_no_case("thing3"),
)),
),
|x| x,
)(s)
}
assert_eq!(
separated_tuple("THING3 thing2 \"thing1\"").unwrap().1,
vec!["THING3", "thing2", "thing1"]
);
Concerning the second point, I don t understand type-safe guarantee
refers to.
What are your thoughts on the more general idea of
let (i, (value1, value2, value3)) = seq!( thing1, _: space1, thing2, _: space1, thing3, ).parse(i)?;
I prefer this:
let (i, parsers:iter) = seq!(seperated_parser, permutational_parsers:iter).parse(i)?;
Regarding the first point, the method I provided allows heterogeneous things
@coalooball all of those parsers share the same return type, &str
. IMO, the much more common use case is parsers that have different return types:
fn thing1(input: &str) -> IResult<&str, Thing1> {...}
fn thing2(input: &str) -> IResult<&str, Thing2> {...}
fn thing3(input: &str) -> IResult<&str, Thing3> {...}
It's not possible to use separated_list
to parse a sequence of parsers with heterogeneous outputs.
Concerning the second point, I don t understand type-safe guarantee refers to.
The output of tuple((thing1, thing2, thing3))
is a tuple, (Thing1, Thing2, Thing3)
. Rust's type system guarantees that that output has exactly three elements, which is what I want. And nom
will automatically produce parse errors if the input has more or fewer than 3 elements.
The output of separated_list(alt((a, b, c)))
is Vec<T>
, which could be any length. I would have to perform additional error checking on that vector to make sure it's exactly three elements long.
What are your thoughts on the more general idea of
@epage A macro that supports discarding arbitrary elements in a sequence is definitely more flexible than my idea of separated_tuple
, which assumes parsing exactly one instance of the same parser between each element in the sequence. I generally prefer a non-macro solution though.
But IMO my idea aligns with the nom
ideals. separated_list
also assumes exactly one instance of the same parser between each sequence element.