whatwg/url

Skipping an item while iterating is undefined

SuperSonicHub1 opened this issue · 1 comments

The URL Standard, when parsing percent-encoded bytes while percent-decoding, makes use of an undefined byte sequence method "skip" on L430. It seems to get a variable number of items from an iterator and immediately dispose of them, like in the following Python code:

def skip(iterator: iterator, n: int):
    for i in range(n):
        next(iterator)

It would seem that the byte sequence input holds a reference to the iterator instance, as implied by the language Skip the next two bytes in <var>input</var>.

As the creation of an iterator is currently implicit in the Infra Standard, I believe explicitly defining a generic iterator data structure, implementing the "skip" method for it, and changing all "iterate" and "for each" definitions to use iterators would be the best course of action. If this were to go through, the URL Standard would then have to wait for whatwg/infra#571 to be resolved.

An initial concern with the approach described above is the edge case of handling a skip on an empty iterator: would we throw an exception or have skipping be a no-op?

On the other hand, this one step in the URL Standard is the only place where this method is used; if "skip" were to be left undefined, it would then be on the URL Standard to revise the percent-encoding algorithm.

Copy of whatwg/infra#572

In this case, "skip" can never be "called" on an empty iterator, since step 2.3.3 only occurs if byte is 0x25 (%) and the next two bytes after byte in input are in the ranges /[0-9a-fA-F]/. As such, I believe there's no ambiguity