Commas/quotes in data not properly escaped in the CSV files, breaking columns
james-cnz opened this issue ยท 5 comments
I'm using Ad-hoc database queries version 3.9 for Moodle 3.5+ (2020062800).
I found this issue when trying to query data from the old H5P plugin (mod_hvp).
It's JSON with embedded HTML that contains (backslash escaped?) double quotes.
To replicate, create an old type H5P (Interactive Content) activity. Start with some container like a Column or Accordion.
Add some text, and change it's font size, then find the H5P activity's CMID, and run the query:
SELECT json_content
FROM prefix_hvp AS hvp
JOIN prefix_course_modules AS cm
ON cm.instance = hvp.id AND cm.module = (SELECT id FROM prefix_modules WHERE name='hvp')
WHERE cm.id = :cmid
I think this is a minimal example:
SELECT '"\"' AS Backslash, '","' AS Comma
It requires the back-slash to break, and the double quotes. If you do '.' instead of '\' it is fine.
So, this is the fault of PHP. By default PHP has non-standard behaviour when reading CSV files. See the bit about the escape argument here: https://www.php.net/manual/en/function.fgetcsv.php
And, it is only possible to disable that from PHP 7.4 onwards.
OK. I guess I will fix it properly from PHP 7.4 onewards, and for older PHP, I will hack it by passing a character that should never appear. How about "\v"?
I am afraid we are in the middle of testing some other changes to this report at the moment, so I can't easily release a new version now. The necessary fix is in report/customsql/view.php. Change the line
while ($row = fgetcsv($handle)) {
replace that with
$disablestupidphpescaping = '';
if (!check_php_version('7.4')) {
\\ This argument of fgetcsv cannot be unset in PHP < 7.4, so substitute a character which is unlikely to ever appear.
$disablestupidphpescaping = "\v";
}
while ($row = fgetcsv($handle, 0, ',', '"', $disablestupidphpescaping)) {
I should be able to release a version including this fix in a few weeks.
That seems to fix it, thanks. I don't really know much about control codes, but I don't think I've ever come across "\v" in use, so I'd guess it's probably a good choice.
Re the escape character, how about "\x10"--data link escape? Wikipedia describes it as a "transmission control character", so I guess it shouldn't ever appear in stored data, and being an escape character, it seems kind of fitting. https://en.wikipedia.org/wiki/Control_character
Fix for this has now been pushed.