tree-sitter/tree-sitter-php

Incorrect precedence between string concatenation and addition/subtraction

Sjord opened this issue · 6 comments

Sjord commented
<?php
echo $a . $c + $b;

This is interpreted by PHP as ($a . $c) + $b, but by treesitter as $a . ($c + $b).

Actual:

(program
  (php_tag)
  (expression_statement
    (binary_expression
      (variable_name
        (name))
      (binary_expression
        (variable_name
          (name))
        (variable_name
          (name))))))

Expected:

(program
  (php_tag)
  (expression_statement
    (binary_expression
        (binary_expression
          (variable_name
            (name))
          (variable_name
            (name)))
      (variable_name
        (name)))))

According to the documentation, +,  -, and  . have equal precedence. But in grammar.js, . has lower precedence than + and -.

This is a breaking change in PHP 8: https://wiki.php.net/rfc/concatenation_precedence

To my knowledge we're unable to support both the old and the new way simultaneously. Thus, I'm inclined to say it's not a bug.

If you have an idea of how to support both, I'll be happy to support it.

Also see discussion at php-parser: nikic/PHP-Parser#783

It seems the only way to handle this is introducing separate grammars for PHP 7 and 8.

Too be honest, that might be more than I'm currently willing to invest in working on - unless we find a nice way of sharing everything except the diverging parts.

@maxbrunsfeld any thoughts on this?

I don't think we should create separate grammars. Maybe we should just imitate PHP 8, since it looks like it came out in 2019. But if you think that older versions of PHP are still more widely used, then maybe it's good to stick with those.

The behaviour in question here was deprecated in PHP 7.4 (2019) and changed in PHP 8 (2020).

PHP 7.3 will be EOL by December this year, thus there will be no versions of PHP not either asking the user to use parentheses or directly interpreting the code as our grammar currently does.

However, I fear there is a lot of code out there not using the supported runtimes... Regardless, most of the community is, to my knowledge (https://packagist.org/php-statistics) on PHP 7.4 or higher, and therefore I think we should keep the updated precedence.

@Sjord Could you provide a compelling argument for why this should adhere to the old precedence? (Btw: thank you very much for the detailed bug reports)

Sjord commented

I didn't know about the PHP8 change when I filed this bug. I think using the PHP8 precedence is fine. However, then the precedence of concatenation is still incorrect.

In PHP7, the order is this:

  • TIMES
  • PLUS / CONCAT
  • SHIFT
  • INEQUALITY

In PHP8:

  • TIMES
  • PLUS
  • SHIFT
  • CONCAT
  • INEQUALITY

In grammar.js:

  • TIMES
  • PLUS
  • CONCAT
  • SHIFT
  • INEQUALITY

So in PHP8, the bug would be between shift and concatenation.

Good catch! Double checking the precedence in the parser is on my to-do, but I'll fix this as soon as I'm back home from vacation.

Thanks again for detecting and reporting. Please keep them coming 🙂