pnuckowski/aioresponses

Function `normalize_url` doubly encodes query string in some cases.

Simon-Will opened this issue · 0 comments

The function normalize_url does not correctly handle some query strings. From what I understand, the function is supposed to rearrange the query parameters so that URLs with different order of query parameters match.

However, the combination of urlencode and URL.with_query doubly encodes some characters. I think that it's exactly those characters that can legally occur in a query string, but not in any URL part, such as : and /. urlencode of course will encode them and URL.with_query will then pick up the % that the urlencode unnecessarily produced.

I propose to just drop the parse_qsl and urlencode resulting in the simplified_normalize_url in the following example code:

from yarl import URL
from aioresponses.compat import normalize_url

# %3A = :  (Does NOT need to be encoded in query string)
# %2F = /  (Does NOT need to be encoded in query string)
# %26 = &  (MUST be encoded in query string)
url_str = 'https://example.com/?var=foo%3Abar%2Fbaz%26gaz'


def simplified_normalize_url(url):
    url = URL(url)
    return url.with_query(sorted(url.query.items()))


print(f'Before: {url_str}')
print(f'After normalize_url: {normalize_url(url_str)}')
print(f'After simplified_normalize_url: {simplified_normalize_url(url_str)}')

Running it on my machine (Xubuntu 22.04.1 with Python 3.9.6) produces the following output:

Before: https://example.com/?var=foo%3Abar%2Fbaz%26gaz
After normalize_url: https://example.com/?var=foo%253Abar%252Fbaz%2526gaz
After simplified_normalize_url: https://example.com/?var=foo:bar/baz%26gaz

I can create a small pull request with tests for this tomorrow.