Enhancement: avoid `isSanitized` feature from breaking HTTP body JSON API params, in some rare scenario of params includes UTF8 chars
Opened this issue · 0 comments
rizdaprasetya commented
Affected Lines
Known sample, but recommended to search more comprehensively.
midtrans-php/Midtrans/Sanitizer.php
Line 141 in 4929d87
midtrans-php/Midtrans/Sanitizer.php
Line 124 in 4929d87
https://github.com/search?q=repo%3AMidtrans%2Fmidtrans-php%20substr&type=code
Findings:
Reproducible evidence:
<?php
$anotherSampleStr = "Q-古典-吾家有喜A-半圆_ Q-古典-吾家有喜A-半圆 定制水晶绒【咨询客服】";
echo "// Sample \$str is a long utf-8 based string of chars: \n";
$str = "制制制制制制制制制制制制制制制制制制制制制制制制制制制制制制制制制制制制制制制制制制";
var_dump($str);
echo "// Length shorten using regular `substr(\$str,0,50)`, unexpectedly last chars is improperly truncated in the middle bytes and became invalid UTF-8 char: \n";
var_dump(substr($str,0,50));
echo "// Then JSON encoded, **UNEXPECTEDLY** produce empty JSON: \n";
var_dump(json_encode( substr($str,0,50) ));
echo "\n\n ---- Compared to: ---- \n\n";
echo "// Length shorten using specific `mb_substr(\$str,0,50)`: \n";
var_dump(mb_substr($str,0,50));
echo "- Then JSON encoded, **EXPECTED** succeed, BUT encoding changed: \n";
var_dump(json_encode( mb_substr($str,0,50) ));
echo "\n ---- \n";
echo "// Further, if JSON encoded using `JSON_UNESCAPED_UNICODE` param, expected succeed w/ perserved encoding: \n";
var_dump(json_encode( mb_substr($str,0,50),JSON_UNESCAPED_UNICODE ));
// tips: qucik run it using e.g. https://replit.com/@raizerde/Utf8EncodingSubstringIssue
?>
Issue Desc
When:
- sanitization in which merchant's input params being sanitized, including being
substr
-ed, - then in the very rare scenario of the params includes UTF8 string of chars. (rare case because UTF8 usually used for non-latin char based country, which merchant is rarely from).
Unexpected:
- the string's last char will be prematurely truncated, which unexpectedly result in an invalid UTF8 char.
- then due to this problematic char will cause the string & params fail to be json_encode-ed, unexpectedly will result as
false
/ empty JSON. - then it will cause empty HTTP request body to be sent to the API, & unexpected API response of e.g.
Midtrans API is returning API error. HTTP status code: 500 API response: {"status_code":"500","status_message":"Sorry. Our system is recovering from unexpected issues. Please retry.","id":"c99ccd6b-3174-4b25-907a-bb8e6a1d586d"}
Further: Seems PHP does not natively support UTF-8 char encoding (Chinese chars are UTF-8, which is more bytes than regular ASCII which PHP support).
Expected:
substr
&json_encode
should success produce proper JSON, not empty.
Possible Solution:
[TBD] ... Probably use mb_substr
to replace substr
usage, but need further check for any possible downsides (e.g. apparently not all PHP server env enable MB extension, as it is an optional non-default extension).
- Then maybe add warning about PHP limitation in using UTF-8 or other non ASCII string encoding.
Possible Workaround
- Merchant can change config
\Midtrans\Config::$isSanitized = true;
to\Midtrans\Config::$isSanitized = false;
- as the issue happens during Sanitization flow, disabling it will avoid the problematic flow. Although there will be downside.