Parse-bug
codebear opened this issue · 3 comments
I isolated a pretty simple piece of relatively straight forward valid PHP-code which is not parsed correctly.
function x($a, $b) {
echo(" '$b' ");
echo " '$a' ";
}
I tried tweaking it some but couldn't get a grasp on exactly where it fails.
The problem lies in the '
being placed directly following the variable name combined with another string later containing a '
. The following example is the minimal reproduction I could identify:
<?php
"$b'";
"'";
The fix would be to ensure that encapsed_string_chars
can start with an '
The problem only occurs with a variable named b
directly followed by '
. Thus the string
rule kicks in for some reason. (for compatibility we've kept around the b''
string introduced with the PHP 6 plans)
If this is enough direction for you to figure it out, a PR would be very welcome. Otherwise, I'll try to find some time to look into it next week
Aha, that's make a lot more sense. I was playing around with different variables, sometimes b, a or z, and the problem kept moving. I couldn't get any determinism out of it.
I did a brief analysis of the native PHP lexer in context of a different project, and was wondering what the b''
syntax was all about, and why it still was around in the PHP8 source tree.
I might make an attempt on this, but I've not looked at the source code yet :-)
The information about the b''
syntax is well hidden in both the PHP documentation and the language specification.
The optional b-prefix is reserved for future use in dealing with so-called binary strings. For now, a single-quoted-string-literal with a b-prefix is equivalent to one without.
PHP specification - lexical structure
(binary) casting and b prefix exists for forward support. Note that the (binary) cast is essential the same as (string), but it should not be relied upon.
PHP: Type juggeling
And it's no wonder that us mere mortals wonder about this, the current Mr. PHP himself, wondered about the very same thing back in 2011
It will be interesting to see if the syntax will be deprecated eventually or if we'll be getting UTF-8 as default some time in the future. For now, there's been some heavy refactoring of PHP strings lately, but none of them touches this aspect. Priorly, there's been at least two (Binary String Deprecation, Deprecations for PHP 7.2) attempts at getting it deprecated, but for now it's not gotten the required votes.
Great that you might take a stab at it. Just ping me if you need any further assistance.
PS: Great to see another Norwegian in here 😄