Require `mbstring` with v3
boesing opened this issue · 6 comments
Feature Request
Q | A |
---|---|
New Feature | yes |
BC Break | yes |
Summary
As this library supports PSR-6
and PSR-16
, it also needs ways to handle multibyte strings which are allowed as per these recommendations.
To fix #154 properly, a multibyte check needs to be done when a key exceeds a certain limit.
This is actually only the case for the redis adapter, so until v3, the redis adapter will decrease its maximum key length to 65534
as 65535
is the maximum number to be used as a specifier for pcre2
.
Actually you may not need the mbstring
extension:
if (strlen($string) != strlen(utf8_decode($string)))
{
echo 'string is unicode';
}
@ghostwriter As written in Slack, this would then require ext-xml
.
@boesing right.
also came up with this:
if (strlen($string) !== strlen(html_entity_decode($string, ENT_QUOTES , 'ISO-8859-1')))
{
echo 'string is unicode';
}
// another method
html_entity_decode
is an internal function. https://3v4l.org/UZOM1
var_dump(in_array('html_entity_decode', get_defined_functions()['internal'])); // bool(true)
The main thing I dont like here is, that strlen
is slow depending on the key length. Here, we have to do it twice.
There were performance checks in #154 (comment) regarding performance and thus I'd like to avoid doing it twice. Thus, requiring mbstring seems the most proper way of veryfing the length of a string with a minimal footprint.
Closing this with the rationale, that allowing keys longer than 65534
will lead to performance problems when validating cache keys. Another reason is, that mb_strlen
does not return the byte length but the string length.
UTF-8 characters which consume 4-bytes should be treated as 4 characters
but with mb_strlen
they're treated as 1 character
.
good point.
For the record I was trying to find a solution to the particular issue. I’m not against adding mbstring
.