laminas/laminas-cache

Require `mbstring` with v3

boesing opened this issue · 6 comments

Feature Request

Q A
New Feature yes
BC Break yes

Summary

As this library supports PSR-6 and PSR-16, it also needs ways to handle multibyte strings which are allowed as per these recommendations.
To fix #154 properly, a multibyte check needs to be done when a key exceeds a certain limit.
This is actually only the case for the redis adapter, so until v3, the redis adapter will decrease its maximum key length to 65534 as 65535 is the maximum number to be used as a specifier for pcre2.

https://3v4l.org/GURWn

Actually you may not need the mbstring extension:

if (strlen($string) != strlen(utf8_decode($string)))
{
    echo 'string is unicode';
}

@ghostwriter As written in Slack, this would then require ext-xml.

@boesing right.

also came up with this:

if (strlen($string) !== strlen(html_entity_decode($string, ENT_QUOTES , 'ISO-8859-1')))
{
    echo 'string is unicode';
}
// another method

html_entity_decode is an internal function. https://3v4l.org/UZOM1

var_dump(in_array('html_entity_decode', get_defined_functions()['internal'])); // bool(true)

The main thing I dont like here is, that strlen is slow depending on the key length. Here, we have to do it twice.

There were performance checks in #154 (comment) regarding performance and thus I'd like to avoid doing it twice. Thus, requiring mbstring seems the most proper way of veryfing the length of a string with a minimal footprint.

Closing this with the rationale, that allowing keys longer than 65534 will lead to performance problems when validating cache keys. Another reason is, that mb_strlen does not return the byte length but the string length.
UTF-8 characters which consume 4-bytes should be treated as 4 characters but with mb_strlen they're treated as 1 character.

good point.

For the record I was trying to find a solution to the particular issue. I’m not against adding mbstring.