cpp-netlib/uri

Percent-decoding does not accept non-ASCII octets

Closed this issue · 0 comments

When building a network::uri object using network::uri_builder, query parameters that contain non-ASCII multibyte characters (e.g. UTF-8) are percent-encoded as expected. For example, http://example.com/q=법정동 becomes http://example.com/q=%EB%B2%95%EC%A0%95%EB%8F%99.

However, the other way around, when applying network::uri::decode to the encoded query parameter, a percent_decoding_error exception is thrown. I think this behavior is incorrect. According to RFC 3986 section 2.5 percent-encoding and decoding work at octet-level and should be otherwise agnostic about character encodings.

Suggested fix in network/uri/detail/decode.hpp:

-  if (h0 >= '8') {
-    // unable to decode characters outside the ASCII character set.
-    throw percent_decoding_error(uri_error::conversion_failed);
-  }

Unit tests for reproduction:

  • Percent-encoding a UTF-8 query parameter works
  • Percent-decoding a UTF-8 query parameter does not work
TEST(UriBuilderTest, PercentEncodingAcceptsNonAsciiOctets) {
  const std::string decoded = u8"법정동";
  const std::string encoded = "%EB%B2%95%EC%A0%95%EB%8F%99";

  network::uri_builder ub(network::uri("http://example.com"));
  ASSERT_NO_THROW(ub.append_query_key_value_pair("q", decoded));

  const network::uri uri = ub.uri();
  ASSERT_EQ(network::string_view(encoded), uri.query_begin()->second);
}

TEST(UriDecodeTest, PercentDecodingAcceptsNonAsciiOctets) {
  const std::string decoded = u8"법정동";
  const std::string encoded = "%EB%B2%95%EC%A0%95%EB%8F%99";

  std::string output;
  ASSERT_NO_THROW(network::uri::decode(encoded.begin(), encoded.end(),
                                       std::back_inserter(output)));
  ASSERT_EQ(decoded, output);
}

Output:

[==========] Running 2 tests from 2 test cases.
[----------] Global test environment set-up.
[----------] 1 test from UriBuilderTest
[ RUN      ] UriBuilderTest.PercentEncodingAcceptsNonAsciiOctets
[       OK ] UriBuilderTest.PercentEncodingAcceptsNonAsciiOctets (0 ms)
[----------] 1 test from UriBuilderTest (1 ms total)

[----------] 1 test from UriDecodeTest
[ RUN      ] UriDecodeTest.PercentDecodingAcceptsNonAsciiOctets
src/uri_test.cc:53: Failure
Expected: network::uri::decode(encoded.begin(), encoded.end(), std::back_inserter(output)) doesn't throw an exception.
  Actual: it throws.
[  FAILED  ] UriDecodeTest.PercentDecodingAcceptsNonAsciiOctets (0 ms)
[----------] 1 test from UriDecodeTest (0 ms total)

[----------] Global test environment tear-down
[==========] 2 tests from 2 test cases ran. (1 ms total)
[  PASSED  ] 1 test.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] UriDecodeTest.PercentDecodingAcceptsNonAsciiOctets

 1 FAILED TEST