Function `normalize_url` doubly encodes query string in some cases.
Simon-Will opened this issue · 0 comments
The function normalize_url does not correctly handle some query strings. From what I understand, the function is supposed to rearrange the query parameters so that URLs with different order of query parameters match.
However, the combination of urlencode
and URL.with_query
doubly encodes some characters. I think that it's exactly those characters that can legally occur in a query string, but not in any URL part, such as :
and /
. urlencode
of course will encode them and URL.with_query
will then pick up the %
that the urlencode
unnecessarily produced.
I propose to just drop the parse_qsl
and urlencode
resulting in the simplified_normalize_url
in the following example code:
from yarl import URL
from aioresponses.compat import normalize_url
# %3A = : (Does NOT need to be encoded in query string)
# %2F = / (Does NOT need to be encoded in query string)
# %26 = & (MUST be encoded in query string)
url_str = 'https://example.com/?var=foo%3Abar%2Fbaz%26gaz'
def simplified_normalize_url(url):
url = URL(url)
return url.with_query(sorted(url.query.items()))
print(f'Before: {url_str}')
print(f'After normalize_url: {normalize_url(url_str)}')
print(f'After simplified_normalize_url: {simplified_normalize_url(url_str)}')
Running it on my machine (Xubuntu 22.04.1 with Python 3.9.6) produces the following output:
Before: https://example.com/?var=foo%3Abar%2Fbaz%26gaz
After normalize_url: https://example.com/?var=foo%253Abar%252Fbaz%2526gaz
After simplified_normalize_url: https://example.com/?var=foo:bar/baz%26gaz
I can create a small pull request with tests for this tomorrow.