moodleou/moodle-report_customsql

Commas/quotes in data not properly escaped in the CSV files, breaking columns

james-cnz opened this issue ยท 5 comments

I'm using Ad-hoc database queries version 3.9 for Moodle 3.5+ (2020062800).
I found this issue when trying to query data from the old H5P plugin (mod_hvp).
It's JSON with embedded HTML that contains (backslash escaped?) double quotes.
To replicate, create an old type H5P (Interactive Content) activity. Start with some container like a Column or Accordion.
Add some text, and change it's font size, then find the H5P activity's CMID, and run the query:

SELECT  json_content
FROM prefix_hvp AS hvp
JOIN prefix_course_modules AS cm
ON cm.instance = hvp.id AND cm.module = (SELECT id FROM prefix_modules WHERE name='hvp')
WHERE cm.id = :cmid

I think this is a minimal example:

SELECT '"\"' AS Backslash, '","' AS Comma

It requires the back-slash to break, and the double quotes. If you do '.' instead of '\' it is fine.

So, this is the fault of PHP. By default PHP has non-standard behaviour when reading CSV files. See the bit about the escape argument here: https://www.php.net/manual/en/function.fgetcsv.php

And, it is only possible to disable that from PHP 7.4 onwards.

OK. I guess I will fix it properly from PHP 7.4 onewards, and for older PHP, I will hack it by passing a character that should never appear. How about "\v"?

I am afraid we are in the middle of testing some other changes to this report at the moment, so I can't easily release a new version now. The necessary fix is in report/customsql/view.php. Change the line

while ($row = fgetcsv($handle)) {

replace that with

$disablestupidphpescaping = '';
if (!check_php_version('7.4')) {
    \\ This argument of fgetcsv cannot be unset in PHP < 7.4, so substitute a character which is unlikely to ever appear.
    $disablestupidphpescaping = "\v";
}
while ($row = fgetcsv($handle, 0, ',', '"', $disablestupidphpescaping)) {

I should be able to release a version including this fix in a few weeks.

That seems to fix it, thanks. I don't really know much about control codes, but I don't think I've ever come across "\v" in use, so I'd guess it's probably a good choice.

Re the escape character, how about "\x10"--data link escape? Wikipedia describes it as a "transmission control character", so I guess it shouldn't ever appear in stored data, and being an escape character, it seems kind of fitting. https://en.wikipedia.org/wiki/Control_character

Fix for this has now been pushed.