emancu/toml-rb

Invalid parsing of TOML string with \\n

rsim opened this issue · 8 comments

rsim commented

TOML String "\\n" is parsed as \ and new line and not as \ and n.

>> TOML::Document.parse("\"C:\\\\new\"", root: :string).value
=> "C:\\\new"
>> puts TOML::Document.parse("\"C:\\\\new\"", root: :string).value
C:\
ew

The same issue is with "\\r", "\\t" and other sequences. It is because BasicString.decode_special_char method is called as a first and is replacing the special sequences which start with \ ignoring that there was another \ before that.

Currently also test_special_characters test is not quite correct. Instead of

    match = TOML::Document.parse('"C:\\Documents\\virus.exe"', root: :string)
    assert_equal('C:\\Documents\\virus.exe', match.value)

it should be

    match = TOML::Document.parse('"C:\\\\Documents\\\\virus.exe"', root: :string)
    assert_equal('C:\\Documents\\virus.exe', match.value)

as double backslashes in Ruby single quotes returns just one backslash.

rsim commented

Hi! I wanted to remind about this issue. Did you manage to have a look on this and how to fix?

Just experienced the problem when dumping Hash to TOML and then parsing it you get a different result:

>> TOML.parse(TOML.dump(path: "C:\\temp"))
=> {"path"=>"C:\\\temp"}

Hey Raimonds, I started digging on this issue but I had some personal
issues that I had to attend. I'll get back to this on Monday, and talk to
you on irc too.
I'm so sorry for leave this issue incomplete yet. I'll take care asap.

On Wednesday, 17 February 2016, Raimonds Simanovskis <
notifications@github.com> wrote:

Hi! I wanted to remind about this issue. Did you manage to have a look on
this and how to fix?

Just experienced the problem when dumping Hash to TOML and then parsing it
you get a different result:

TOML.parse(TOML.dump(path: "C:\temp"))
=> {"path"=>"C:\temp"}


Reply to this email directly or view it on GitHub
#78 (comment).

Emiliano Mancuso http://about.me/emancu

rsim commented

To fix this TOML::BasicString::transformed_escaped_chars should be changed to use just one gsub

str.gsub(/\\(u[\da-f]{4}|.)/) do |m|
  ...
end

and then all escaped sequences will be processed sequentially and correctly.

BTW according to https://github.com/toml-lang/toml#string also \UXXXXXXXX sequence should be processed. So it would be even better to use /\\(u[\da-fA-F]{4}|U[\da-fA-F]{6}|.)/

And "All other escape sequences not listed above are reserved and, if used, TOML should produce an error" - so if there is invalid escaped sequence then error should be raised.

@rsim I did a huge progress related to this issue.
But I'm kind of blocked because I have a doubt, regarding to TOML specification:

All other escape sequences not listed above are reserved and, if used, TOML should produce an error.

If someone uses \u or \h should raise an error?
In this case, I wonder what should happen with this test.

rsim commented

According to the specification, invalid escape sequences are syntax errors.
See also, for example, Rust implementation where there is a test for invalid sequence https://github.com/alexcrichton/toml-rs/blob/master/tests/invalid/string-bad-escape.toml

So I think that TOML::ParseError should be raised if invalid escape sequence is used.

Hey @Timcolonel could you explain why lines 9 and 11 of this file are not unicode chars?

I don't really remember how unicode was working but those test were coming from this: https://github.com/BurntSushi/toml-test/blob/master/tests/valid/string-escapes.toml
and I it was supposed the get the same content as the json.

The Toml implementation in rust however doesn't have those 4 lines at the end of this test file
https://github.com/alexcrichton/toml-rs/blob/master/tests/valid/string-escapes.toml

Thank you for your quick response, and the links !

On Tuesday, 23 February 2016, Timothee Guerin notifications@github.com
wrote:

I don't really remember how unicode was working but those test were coming
from this:
https://github.com/BurntSushi/toml-test/blob/master/tests/valid/string-escapes.toml
and I it was supposed the get the same content as the json.

The Toml implementation in rust however doesn't have those 4 lines at the
end of this test file

https://github.com/alexcrichton/toml-rs/blob/master/tests/valid/string-escapes.toml


Reply to this email directly or view it on GitHub
#78 (comment).

Emiliano Mancuso http://about.me/emancu