seattlerb/ruby_parser

Various complicated syntax deviations

kddnewton opened this issue · 3 comments

👋 Hey there! In writing the prism translator, I've come across a couple of things that look like they might be bugs in ruby_parser. I wanted to flag them in case you wanted to fix them. I don't imagine they're common at all. They're listed below.

These raise errors:

alias %s[abc] %s[xyz]

foo class Bar baz do end end
foo module Bar baz do end end

def f x:-a; end
def f x:+a; end
def foo x:%(xx); end

while class Foo; tap do end; end; break; end
while class Foo a = tap do end; end; break; end
while class << self; tap do end; end; break; end
while class << self; a = tap do end; end; break; end

class if true; Object end::Kernel; end
class while true; break Object end::Kernel; end

not()

These should be flip-flops:

not foo .. bar
not (foo .. bar)
!(foo .. bar)

This should be a match node:

!/wat/

These should be arrays not array patterns:

1 in %w[]
1 in %i[]
1 in %W[]
1 in %I[]

These should introduce local variables:

/(?<match>bar)/ =~ 'bar'; match

[1, 2] => a, b; a
[1, 2] in a, b; a

{a: 1} => a:; a
{a: 1} in a:; a

{key: :value} => key: value; value
{key: :value} in key: value; value

1 => [a]; a
1 in [a]; a

This one I'm a little unsure of, but I think that the following should be s(:and, s(:and, s(:true), s(:call, s(:false), :!)), s(:true)) and not s(:and, s(:true), s(:and, s(:call, s(:false), :!), s(:true))):

true and
not false and
true

This one is dropping a value from the rescued expression:

foo, bar = 1, 2 rescue nil

This heredoc is counting an escaped newline as if it were a real newline:

p <<~"E"
  x\n   y
E

This heredoc should have not dedent:

<<~EOF
  a
#{1}
  a
EOF

This heredoc should only dedent by 2:

p <<~"E"
    x
  #{"  y"}
E

This should be a constant declaration, not a call:

foo::A += m foo

These should have UTF-8 as the encoding of the string, not ASCII-8BIT:

s = <<eos
a\xE9b
eos

s = <<-EOS
a\247b
cöd
EOS

This heredoc should be beforeafter\r\n not before\nafter\n (there's a \r\n after the ):

str = <<-XXX
before\
after
XXX

These are all horrible heredoc behavior that I'm sorry to even bring up:

<<A+%
A


<<A+%r
A


<<A+%q
A


<<A+%Q
A


<<A+%s
A


<<A+%x
A


pp <<-A.gsub(/b\
a
A
b/, "")

pp <<-A, "d\
c
A
d"

pp <<-A, %q[f\
e
A
f]

pp <<-A, %Q[h\
g
A
h]

pp <<-A, %w[j\
i
A
j]

pp <<-A, %W[l\
k
A
l]

pp <<-A, %i[n\
m
A
n]

pp <<-A, %I[p\
o
A
p]

<<A; /\
A
(?<a>)/ =~ ''

<<A; :'a
A
b'

<<A; :"a
A
b"

This has a line number that is very odd (the xstring is saying it's starting on line 2, maybe the \\ isn't incrementing a line count?):

:"a\\
b"
`  x
#{foo}
#`

And then just to mention, I basically turned off support for numbered parameters, e.g., marking the following as a call instead of a local variable read:

-> { _1 }

Very curious what the two of you think of the ordering of nodes for nested ors and ands (e.g. 1 or 2 or 3).

As noted above, RP puts the shallower nesting on the left, while Prism puts the deeper nesting on the left. I don't think this makes a difference in interpretation (a depth-first, left-to-right visitor will process nodes in the same order). But the s-exps are different, which means for Brakeman the "fingerprints" for warnings will be different.

So should Prism match what RP does? Or should RP change?

For that, I'm okay changing Prism to match the RP AST when it does the translation, but I will keep the AST to have the same structure for other use cases (it matches the ASTs for parse.y and whitequark/parser).

Cool - I took a cut at it last night but didn't quite get it there 😄