Mach 1.0 truncating UTF characters
Closed this issue · 11 comments
I'm getting a bunch of garbage characters in my app. I reduced it down to the following test:
var mach = require("mach");
var stack = new mach.stack();
stack.use(mach.charset, "utf-8");
stack.route(
"*",
function (connection) {
connection.html("<!DOCTYPE html><body>This should be a triangle: \"▼\"</body>");
}
);
mach.serve(stack, 8089);
Here's the hex result, as viewed through Charles:
HTTP/1.1 200 OK
Content-Type: text/html; charset=utf-8
Date: Fri, 14 Nov 2014 21:36:45 GMT
Connection: keep-alive
Transfer-Encoding: chunked
00000000 3c 21 44 4f 43 54 59 50 45 20 68 74 6d 6c 3e 3c <!DOCTYPE html><
00000010 62 6f 64 79 3e 54 68 69 73 20 73 68 6f 75 6c 64 body>This should
00000020 20 62 65 20 61 20 74 72 69 61 6e 67 6c 65 3a 20 be a triangle:
00000030 22 bc 22 3c 2f 62 6f 64 79 3e " "</body>
As you can see, 0x25bc is being returned as 0xbc. In the browser, it is rendered as char code 0xbc if I use no charset and 0xfffd if I do. (0xfffd is literally "replacement character" - e.g. the browser doesn't know WTF it's supposed to be).
I've never used Charles to convert an HTTP response to hex before, but based on what I'm seeing, it looks like only the last two characters of a four-character hex code are being returned:
expected | received |
---|---|
0x 25 bc | 0x bc |
0x 11 cd | 0x cd |
0x 12 34 | 0x 34 |
Any idea what would cause this? I'm certain it worked before the 1.0 push.
Hmm, this could be due to the fact that we're now using bodec to encode binary data instead of node's Buffer
module directly. I did this because I wanted a way to do binary data in the browser without importing a ton of code that we don't need (ala Browserify's Buffer
shim).
Maybe this has something to do with that?
@appsforartists Are you still seeing this issue? If so, I'd love to get a failing test case for it and squash this bug.
Yeah I am. I just installed a fresh copy of all my npm
dependencies (including mach@1.0.2
) on the server, and I still see it. Oddly, I don't always see it on my local machine. Could be worth investigating if there are any differences between the two. I'll let you know.
I looked into it. The npm
stuff is all identical. The difference is that on the dev server, I inline the scripts (so they are passed through mach
). On my local machine, they are served by Webpack, which serves them correctly.
FWIW, here's the result of npm list
:
├─┬ mach@1.0.2
│ ├── bodec@0.1.0 (git://github.com/mjackson/bodec#1c31ee1311969f4471e3582cdcefe1a93b711288)
│ ├─┬ bufferedstream@3.0.7
│ │ ├── bodec@0.1.0 (git://github.com/mjackson/bodec#1c31ee1311969f4471e3582cdcefe1a93b711288)
│ │ └── events@1.0.2
│ ├── describe-property@1.0.1
│ ├── object-assign@2.0.0
│ ├── qs@2.3.3
│ ├── redis@0.11.0
│ ├── strftime@0.8.2
│ └── when@3.6.4
(FWIW, bodec
is now tagged at 1.0.0, so you can update your dependency.)
I am having a similar problem. If I read a JSON file with unicode characters, parse it on the server, then send it to the browser with conn.json()
, the browser cannot parse the response. If I serve the file up directly with mach.file
, however, everything works fine.
Here is a tiny app that demonstrates the problem: https://dl.dropboxusercontent.com/u/22414/mach-unicode-testcase.zip
Just npm install
then browse to http://localhost:5000/index.html
EDIT: To clarify, in the first case the response is delivered to the browser with HTTP status 200, but the body can't be parsed with JSON.parse()
.
@appsforartists @nicholascloud thanks for the report. i'll look into this today.
@appsforartists @nicholascloud Just FYI, this is fixed and published in version 1.1.0. The issue was with how I was treating strings in the BufferedStream library as binary by default, instead of utf-8. Silly me.
Thanks for your help!
Thanks @mjackson. Might I recommend augmenting your Message
unit tests with non-ASCII characters to prevent regressions? 😃
@appsforartists excellent idea.
@mjackson Excellent ❗