cpp-netlib/uri

Make `uri_builder::append_query()` more robust against misuse

Closed this issue · 5 comments

The current interface

template <typename Source>
uri_builder &append_query(const Source &query)

expects query to be an unencoded key/value pair like key=value, which is then percent-encoded internally. However, characters in /.@&%;= are excluded from encoding which leads to broken URI's.

Example:

uri_builder ub(...);
ub.append_query("q=" + get_unsafe_data());

If get_unsafe_data() returns a string that contains characters like % or =, these characters are never percent-encoded. In the former case, a URI decoder will usually expect a percent-encoded octet when reading a % character. In the latter case, a URI decoder may try to split the string again in a key and value part.

To fix the % issue, this character could simply be removed from the exclude set. On the other hand, = cannot be removed from the exclude set, because it is needed literally as a separator between the key and value part.

Proposed solution:

I think the basic design problem here is that the key and value parts are not treated separately. An interface which does that already exists in the form of uri_builder::append_query_key_value_pair (this function has been recently updated because it suffered from similar encoding issues).

uri_builder::append_query(input) should reuse uri_builder::append_query_key_value_pair(key, value) internally by splitting input at the first = character -- the left part goes as key, the right part goes as value. If no = character is found, input is passed as key leaving value empty. This way everything gets percent-encoded properly.

I can prepare a PR if you like.

Maybe, the split on the first = idea is not so good, if a user really wants to append just a key without a value. In this case the entire input must be encoded, even the = sign.

I think the contract/documentation must be clarified here. Does this function expect a single unencoded value or an unencoded key/value pair? Was the first case originally intended? If so, we just need to fix the exclude set of characters and improve the docstring.

Perhaps I should deprecate and remove "append_query" altogether, or, use it an alias for append_query_key_value_pair(key, nullopt), while allowing an optional value parameter in that function. I think I'd prefer to remove it.

I'd prefer to remove it too - no alias. I think it's better to have only one function to do this kind of thing. Keeping both puts an unnecessary burden to the user to find out the difference. And regarding breaking client code, I think this change is small and simple to adapt.

Maybe you could add append_query_key(input) as a replacement for append_query, which is an alias for append_query_key_value_pair(key, nullopt). This way it is more obvious to the user that input will be percent-encoded as a whole and not split into a key/value pair.

Alternative naming that came into my mind:

  • append_query_parameter(name)
  • append_query_parameter(name, value)

@mtrenkmann if you could take a look at this review, I'd appreciate it: #126