nodejs/node

URL parameters containing russian/chinese language characters are not escaped properly

czardoz opened this issue · 10 comments

  • Version: All versions, tested on v6.3.1
  • Platform: All platforms, tested on Ubuntu x86_64 and Windows x86_64
  • Subsystem: http, querystring

Node is not able to properly send query string parameters with russian language characters.
Example:

var http = require('http');

http.get('http://httpbin.org/get?search=обязательный&a=b c', function (response) {
    // Continuously update stream with data
    var body = '';
    response.on('data', function (d) {
        body += d;
    });
    response.on('end', function () {
        console.log(body);
    });
});

Results in the output:

{
  "args": {
    "a": "b c", 
    "search": ">1O70B5;L=K9"
  }, 
  "headers": {
    "Host": "httpbin.org"
  }, 
  "origin": "106.51.38.69", 
  "url": "http://httpbin.org/get?search=>1O70B5%3BL=K9&a=b c"
}

Note that the space between b and c was correctly escaped by Node, but the first parameter ("search") sent garbage. Ideally, that should be escaped as well. Chrome XMLHttpRequest handles the encoding correctly.

Oh I remember this, there's an older issue for it. IIRC it's not allowed in the spec and is a potential security issue of we do.

Edit: hmmm, maybe I'm remembering #1693 ... this could be different.

cc @nodejs/http probably

This is one of the several issues we have in the current url.parse implementation relating to extended character support. The new WHATWG URL parser should handle this without problem but it's going to take a bit too work that in still. I'll investigate to see how difficult it would be to fix this in the current impl.

I think the problem lies here:

QueryString.escape = function(str) {
, specifically in the multi-byte handling.

In the meantime, is there any workaround for this?

In the meantime, is there any workaround for this?

Well, you can alway manually encode query parameters

@vkurchatkin That results in double-encoding of parameters. Unfortunately, there does not seem to be a way to turn off encoding in the http.request() API.

@czardoz I'm 98% sure you can work around that by using a { host: '...', path: '...' } options object.

Something like this:

var http = require('http');

var options = {
  host: 'httpbin.org',
  path: `/get?search=${encodeURIComponent('обязательный')}&a=b%20c`
};

http.get(options, function (response) {
    // Continuously update stream with data
    var body = '';
    response.on('data', function (d) {
        body += d;
    });
    response.on('end', function () {
        console.log(body);
    });
});

Just to confirm, the new WHATWG URL API does handle this correctly:

> process.versions
{ http_parser: '2.7.0',
  node: '7.0.0',
  v8: '5.4.500.36',
  uv: '1.9.1',
  zlib: '1.2.8',
  ares: '1.10.1-DEV',
  icu: '57.1',
  modules: '51',
  openssl: '1.0.2j' }
> new url.URL('http://httpbin.org/get?search=обязательный&a=b c')
URL {
  href: 'http://httpbin.org/get?search=%D0%BE%D0%B1%D1%8F%D0%B7%D0%B0%D1%82%D0%B5%D0%BB%D1%8C%D0%BD%D1%8B%D0%B9&a=b%20c',
  protocol: 'http:',
  hostname: 'httpbin.org',
  pathname: '/get',
  search: '?search=%D0%BE%D0%B1%D1%8F%D0%B7%D0%B0%D1%82%D0%B5%D0%BB%D1%8C%D0%BD%D1%8B%D0%B9&a=b%20c'
}

Until the big is fixed in node.js core, you may be able to blindly toss the URL into the https://www.npmjs.com/package/encodeurl module to get it encoded (and without encountering double-encoding).

Trott commented

This issue has been inactive for sufficiently long that it seems like perhaps it should be closed. Feel free to re-open (or leave a comment requesting that it be re-opened) if you disagree. I'm just tidying up and not acting on a super-strong opinion or anything like that.