mbj/unparser

Insufficient parenthesizing

akimd opened this issue ยท 13 comments

akimd commented

This issue is very very close to #131, and it is very serious problem in my production (it turns out, contrary to what I said in #131, that we do depend on unparser on such code).

The test case is very similar, but different: return true ? 1 : 2 is pretty-printed as

return if true
  1
else
  2
end

which is again wrong, since in that case return if true is a postfix condition.

$ unparser -e 'return true ? 1 : 2' --verbose
warning: parser/current is loading parser/ruby27, which recognizes
warning: 2.7.2-compliant syntax, but you are running 2.7.1.
warning: please see https://github.com/whitequark/parser#compatibility-with-ruby-mri.
(string):3:1: error: unexpected token kELSE
(string):3: else
(string):3: ^~~~
(string)
Original-Source:
return true ? 1 : 2
Generated-Source:
return if true
  1
else
  2
end
Original-Node:
(return
  (if
    (true)
    (int 1)
    (int 2)))
Generated-Node:
#<Parser::SyntaxError: unexpected token kELSE>
/opt/local/lib/ruby2.7/gems/2.7.0/gems/parser-2.7.2.0/lib/parser/diagnostic/engine.rb:72:in `process'
/opt/local/lib/ruby2.7/gems/2.7.0/gems/parser-2.7.2.0/lib/parser/base.rb:285:in `on_error'
(eval):3:in `_racc_do_parse_c'
(eval):3:in `do_parse'
/opt/local/lib/ruby2.7/gems/2.7.0/gems/parser-2.7.2.0/lib/parser/base.rb:189:in `parse'
/opt/local/lib/ruby2.7/gems/2.7.0/gems/unparser-0.5.3/lib/unparser.rb:98:in `block in parse_either'
/opt/local/lib/ruby2.7/gems/2.7.0/gems/mprelude-0.1.0/lib/mprelude.rb:85:in `wrap_error'
/opt/local/lib/ruby2.7/gems/2.7.0/gems/unparser-0.5.3/lib/unparser.rb:97:in `parse_either'
/opt/local/lib/ruby2.7/gems/2.7.0/gems/mprelude-0.1.0/lib/mprelude.rb:168:in `bind'
/opt/local/lib/ruby2.7/gems/2.7.0/gems/unparser-0.5.3/lib/unparser/validation.rb:63:in `from_string'
/opt/local/lib/ruby2.7/gems/2.7.0/gems/unparser-0.5.3/lib/unparser/cli.rb:40:in `validation'
/opt/local/lib/ruby2.7/gems/2.7.0/gems/unparser-0.5.3/lib/unparser/cli.rb:143:in `public_send'
/opt/local/lib/ruby2.7/gems/2.7.0/gems/unparser-0.5.3/lib/unparser/cli.rb:143:in `process_target'
/opt/local/lib/ruby2.7/gems/2.7.0/gems/unparser-0.5.3/lib/unparser/cli.rb:133:in `block in exit_status'
/opt/local/lib/ruby2.7/gems/2.7.0/gems/unparser-0.5.3/lib/unparser/cli.rb:132:in `each'
/opt/local/lib/ruby2.7/gems/2.7.0/gems/unparser-0.5.3/lib/unparser/cli.rb:132:in `exit_status'
/opt/local/lib/ruby2.7/gems/2.7.0/gems/unparser-0.5.3/lib/unparser/cli.rb:64:in `run'
/opt/local/lib/ruby2.7/gems/2.7.0/gems/unparser-0.5.3/bin/unparser:10:in `<top (required)>'
/opt/local/bin/unparser:23:in `load'
/opt/local/bin/unparser:23:in `<main>'
Error: (string)

Cheers!

mbj commented

@akimd Valid issue, given you ran it through the unparser self check that fails. Also interesting neither rubyspec nor parsers test suite hit this construct.

This means I need an even wider ruby corpus to test against. Can you recommend a public one? Or can you contact me in private (if you happen to have a large code base you may be able to share for this propose).

Its likely that unparser actually has to learn to emit these ifs ifs as ternary, because only they can produce that exact AST, given its an argument to return. Adding parenthesis may produce a begin node which would invalidate the "exact" AST guarantee.

Unparsers internal changes for 2.7 support (they where more a re-write) make it more easy to direct the code emitting to specific forms.

I cannot guarantee I can get to this issue very soon, while mutant is primarily funding this OSS project, it does not feed me well enough at this point, I've to priorize my consulting work.

There are some possible workarounds I'm happy to discuss (also contact me in private) on this issue, it depends on the actual use case.

akimd commented

Hi!
Thanks for the fast answer.

This means I need an even wider ruby corpus to test against. Can you recommend a public one?

I'm afraid I am not well versed enough for this. In my case we run source to source transformation on the AST and pretty-print it at the end.

Or can you contact me in private (if you happen to have a large code base you may be able to share for this propose).

I'm sorry I have nothing to offer here :( And I do understand your concern. What I can do is submit such test cases when we encounter them. Might not protect from lurking issues, but should protect from regressions.

Its likely that unparser actually has to learn to emit these ifs ifs as ternary, because only they can produce that exact AST, given its an argument to return. Adding parenthesis may produce a begin node which would invalidate the "exact" AST guarantee.

Bummer... That's a dirty catch 22. Never though about that constraint.

I cannot guarantee I can get to this issue very soon, while mutant is primarily funding this OSS project, it does not feed me well enough at this point, I've to priorize my consulting work.

Ok. We have reverted to older versions of parser/unparser that did not exhibit this issue.

There are some possible workarounds I'm happy to discuss (also contact me in private) on this issue, it depends on the actual use case.

I can see that forcing the parens would help (return (1 ? 2 : 3)). I'll see our team if this is applicable to us.

Cheers!

mbj commented

We run source to source transformation on the AST and pretty-print it at the end.

2 things here:

  • Its not really pretty printing, its unparsing. Unparsers output is "far from pretty" but could be made so (I've no reason to spend time on that metric ATM). Better not call it pretty printing if it can create so ugly source ;)
  • Its important to realize that unparser "only" has the guarantee: It produces source that if parsed via parser produces the SAME ast if parsed again.

The latter has an important consequence: If you transform AST, make sure that you know what parser would produce.

Example:

ruby-parse -e 'foo while foo = bar'
(while
  (lvasgn :foo
    (send nil :bar))
  (send nil :foo))

the foo body of the while is a send.

ruby-parse -e 'foo = 1; foo while foo = bar'
(begin
  (lvasgn :foo
    (int 1))
  (while
    (lvasgn :foo
      (send nil :bar))
    (lvar :foo)))

The foo of the while body is an lvar read!

This is because the local variable scope is different. If you are doing AST transforms, make sure the AST you generate expresses the semantics you desire and captures the effects like the lvar scope changes.

Also its important to keep in mind there is a wide range of AST you may generate, that parser would never return from parsing. Unparser will, in the current versions: NOT prevent you from unparsing the into (potentially) valid source, that fails the guarantee.

In the next version of unparser, I plan a new API that would unparse than parse again, to make sure it filters out these cases where artificially constructed AST is passed that "cannot" be generated by parser.

For some private tooling that does source transform (smelling like the ones you indirectly describe) I use such APIs already to prevent this case.

akimd commented
  • Its not really pretty printing, its unparsing. Unparsers output is "far from pretty" but could be made so (I've no reason to spend time on that metric ATM). Better not call it pretty printing if it can create so ugly source ;)

I see very well what you mean here. Indeed, unparser does not have any support of boxing, or managing with etc., so it is rather a crude pretty-printer. Yet, it indents. That's were I usually cut the line. The intended reader is clearly a human, if it were solely for the sake of the machine, it could be much more terse.

  • Its important to realize that unparser "only" has the guarantee: It produces source that if parsed via parser produces the SAME ast if parsed again.

Yup, I'm also well aware of this. And I do understand that shaking the AST does not guarantee in anyway that it is still valid.

But we pay attention to that. Besides, our transformations are mostly related to sanity-checks, there's nothing really deep, AFAIK.

the foo body of the while is a send.
[...]
The foo of the while body is an lvar read!

Yup, already found about that, thanks :)

Cheers!

mbj commented

So yes, (return (if ...)) can only be returned from parser on a return with a ternary expression. Unparser has to learn about this and emit the if as ternary.

bundle exec unparser -e 'return(true ? 1 : 2)' --verbose
mutant-dev@mbj ~/devel/unparser (fix/return-parens) $ bundle exec unparser -e 'return(true ? 1 : 2)' --verbose
(string)
Original-Source:
return(true ? 1 : 2)
Generated-Source:
return (if true
  1
else
  2
end)
Original-Node:
(return
  (begin
    (if
      (true)
      (int 1)
      (int 2))))
Generated-Node:
(return
  (begin
    (if
      (true)
      (int 1)
      (int 2))))
Success: (string)

This is a different AST, hence unparer cannot emit parens to unparse return true ? 1 : 2.

mbj commented

Also affected:

  • next
  • break
mbj commented

Also note this one:

bundle exec unparser -e 'return 1, true ? 1 : 2' --verbose
(string)
Original-Source:
return 1, true ? 1 : 2
Generated-Source:
return 1, if true
  1
else
  2
end
Original-Node:
(return
  (int 1)
  (if
    (true)
    (int 1)
    (int 2)))
Generated-Node:
(return
  (int 1)
  (if
    (true)
    (int 1)
    (int 2)))
Success: (string)

So "only" if its a single argument it needs to be ternary. Love you Ruby.

mbj commented

@akimd I fixed this one. And released at 0.5.4. Please consider my GH sponsor button.

akimd commented

You have been fast, thanks a lot @mbj!

We will try it today, and report our mileage.

Many thanks for your time.

mbj commented

You have been fast, thanks a lot @mbj!

Only possible due the huge amount of time invested on refactorings before. Making a better test suite, CI setup, internal code structure etc.

mbj commented

We will try it today, and report our mileage.

Ping. I want to get mentally pas this one ;)

akimd commented

I have just been confirmed that with the bug fix, our problem was solved. Thanks!

mbj commented

Thanks, I love closing open mental loops. Cheers.