whitequark/parser

Offsets with \r\n in source

Opened this issue · 0 comments

When you parse source that contains \r\n in the source, they are automatically converted into \n, as per

@source = input.gsub("\r\n".freeze, "\n".freeze).freeze
. The issue is that this can really throw off source locations. For example:

Parser::CurrentRuby.parse("1\r\n2\r\n3").children[2].loc
# => #<Parser::Source::Map::Operator:0x000000010b47c3d0 @expression=#<Parser::Source::Range (string) 4...5>, @node=s(:int, 3), @operator=nil>

This is saying the source range is 4...5, which is one of the \r characters.

For prism's purposes it's okay if the locations are different, I'll just make it so that it doesn't compare locations for files that contain \r\n. My issue is that I use the source buffer to parse with both parsers (https://github.com/ruby/prism/blob/90d570aa50bfff43c66e5f6c600370a61c091329/test/prism/ruby/parser_test.rb#L188-L208) but the source has already been modified internally in the buffer with no way to retrieve the original.

For a solution, I'm wondering if either:

(a) The \r\n gsub can be removed (and therefore have the parser instead of the buffer replace \r\n when necessary)
(b) The Buffer class could support an auto_clrf parameter (or something) that would disable that behavior