keith-hall/http-request-response-syntax

CRLF line ends cause issues

scop opened this issue · 4 comments

scop commented

HTTP uses CRLF line ends, but it appears this syntax works properly only with LF only (at least for me on Linux).

One example reproducer:

curl -i https://www.iki.fi | bat -l http

...results in the body not highlighted as HTML, whereas with

curl -i https://www.iki.fi | sed -e 's/\r//' | bat -l http

...it does.

Not sure if it trips up anything else besides the content type/embed detection.

I tried simply replacing \n with \r?\n in the syntax, but it didn't help.

Thanks for reporting. I've pushed a commit (556aa14) which adds a variable called end_of_headers. Changing this to (?:^$\n?) (i.e. making the \n optional) fixes the problem.

But this is really a problem with bat/syntect. Sublime Text normalizes line endings to \n so \r would never be encountered by plugin code or a syntax definition. syntect provides a LinesWithEndings iterator, but this isn't used by bat and doesn't strip the carriage returns anyway.
There are cases where it is useful that bat doesn't strip carriage returns, like for --show-all highlighting. So I guess the quick fix is to update the HTTP Request/Response syntax definition as mentioned above, but I feel that the underlying issue in bat and syntect should also be resolved...

scop commented

Thanks for taking a look. However the recipe doesn't seem to fix it for me for the test case in the initial comment with bat.

$ grep end_of_headers: assets/syntaxes/02_Extra/http-request-response/http-request-response.sublime-syntax
  end_of_headers: (?:^$\n?)
$ assets/create.sh ; cargo install --path .
[...]
$ curl -i https://www.iki.fi | bat -l http  # HTML does not get highlighted properly

You're right, my apologies - I made a local copy of the seded version and the original and apparently tested against the one with carriage returns removed by mistake.

The correct fix is to use content_type_sep: (?=;|\r?$) and end_of_headers: (?:^\r?$\n) variables. The ? after \n seems to make no difference. As \r is part of the HTTP RFC, let's just fix it here for now and worry about upstream fixes later :)
I've pushed a commit with this fix 93b9326.

scop commented

Thanks! That works.