python-hyper/rfc3986

reg-name RE is not sufficient

ViktorHaag opened this issue · 4 comments

Your pattern for the reg-name component is::

reg_name = '[\w\d.]+'

However, this is not sufficient, I believe -- the Python regex tokens \w match any "word" characters which notably include "alpha" characters, plus numbers, plus the underscore, but not the full range of characters permitted in reg-names from RFC 3986 section 3.2.2 (despite the comment in your source). The spec specifies this formation for reg-name::

reg-name    = *( unreserved / pct-encoded / sub-delims )

Your reg_name particle should get redefined to support the fuller character set as per the RFC.

I have created a replacement definition for reg_name:

reg_name = '(({0})*|[{1}]*)'.format(
    '%[0-9A-Fa-f]{2}',
    important_characters['re_sub_delimiters'] +
    important_characters['re_unreserved']
    )

I have tested this locally with your tests, but don't have access to submit a pull request with this change.

(Edit: Actually, it appears that a reg-name can be empty, so the def'n should probably use "*" rather than "+" to aggregate unreserved and sub-delimiter chars just like pct-encoded characters.)

I have tested this locally with your tests, but don't have access to submit a pull request with this change.

I'm not sure why you don't have access. If you are not capable of submitting the change yourself via pull request, can you provide a name and email address. I'll commit the changes in a way such that you are the author and I am merely committing the changes on your behalf.

Further, is there a URI that you used to discover this? If so, can you provide it (or something similar) so I can write a test to verify this behaviour? Thank you in advance.

Sure! I will try again to make a pull, and if unable, I will provide a name and email address. The URI I was using was in a test fixture for another project I'm working on that had a '-' in it. I then did ad-hoc testing of your library with a second made up URL with a hyphen ('http://http-bin.org/') and this was tested not to be valid either.

Forgot to fork before clone and branch; apologies. PR has been submitted and is going through travis now.