Non `Copy` `Token`s?
m4rw3r opened this issue · 2 comments
As the title says, any way we can remove the Copy
requirement on the Token
associated type?
The main drawback of removing it is that most filter-functions will need to operate on borrows (satisfy
, take_while
and so on), which will introduce some extra typing (one or two extra &
or *
per filter).
The benefit of removing Copy
would be that we can easily do multi-stage parsing (commonly lexer + parser in separate parts) solely in Chomp, relying on the previous parser to construct a new token stream, and this could then refer to the initially parsed data all the way through (depending of course on lifetime requirements of the source itself).
This would be a breaking change, probably 1.0.0
and/or 0.4.0
.
Notes:
-
Removing
Copy
requirement requires either aClone
orpeek
to return a reference.The latter prevents us from using
&str
as an input since the character iterator provides us withchar
which we cannot return as a reference since we have nothing to borrow from.The former is also not a fantastic idea, since this would cause
clone
to be called for anything usingpeek
(which includessatisfy
,satisfy_with
,token
,not_token
,peek
,peek_next
,eof
). Plus of course theClone
requirement itself. -
ascii
module predicates and any predicates operating on actual tokens (very common when working with small token-types, likeu8
andchar
) does not work as a slot-in for any predicate, they must operate on references or the value has to be dereferenced before calling the predicate:take_while(i, |&c| is_whitespace(c)) satisfy(i, |c| *c == b'a') // vs take_while(i, is_whitespace) satisfy(i, |c| c == b'a')
If we do not remove the Copy
requirement, multi-stage parsers have to use the requirement B: Copy + Buffer
to work properly which imposes restrictions on the underlying Input
. Also possible is to perform the conversion of a buffer into the actual type in the lexer, but this might be a bit too early in the parsing-chain for some formats.¨
Getting some feedback on such a fundamental change as this would be great.
More:
-
Lifetimes are an issue too here, since borrowing a token prevents the following parsers from parsing the input until the reference has been dropped. Eg. if
Token
isn'tCopy
things like this would be forbidden:peek_next(i).then(|i, c| if c == Foo { any(i) } else { i.ret(Bar) } )
Which is a common pattern internally (even if it is just inlined). This means that any peek needs to return by value, necessitating at least
Clone
. -
Adding the
Copy
bound on a returned buffer prevents pretty much all container-types from being used in tokens in a multi-stage parser. Referring to slices of the input is feasible, but that satisfiesBuffer + Copy
while restricting the resulting parser to only work on slices.