Additional double quote escaping issue
uahim opened this issue · 1 comments
[edited text and headline]
I am not quite getting the --disable-double-double-quoting and --disable-escaped-double-quoting options. My simple database has e.g.
McGuiness Flint;When I'm Dead And Gone/Lazy Afternoon;7";1970
And when I run any query, the 7"
is displayed as "7"""
q -d";" -E"UTF-8" -D" - " "select * from records.csv"
(My version is 2.0.9 on Windows)
Tried wrapping the database and escaping the double quote like this
"McGuiness Flint";"When I'm Dead And Gone/Lazy Afternoon";"7\"";"1970"
to no avail. Tried both aforementioned options including a combination of them but no luck either.
What am I doing wrong?
Thanks
Hi, sorry for the late reply,
The reason you're getting this is because the default output-wrapping mode for q is minimal
(there are none
, nonnumeric
and all
as well).
If you add -W none
to the command, the output will be as you expect it to be:
# Added `-W none` as a parameter to the command you sent
$ q -d";" -E"UTF-8" -D" - " -W none "select * from records.csv"
McGuiness Flint - When I'm Dead And Gone/Lazy Afternoon - 7" - 1970
$
I'll try to explain why.
minimal
output-wrapping mode means that whenever there's a string which contains real double-quotes, it will be automatically wrapped as a double-quoted string (e.g. if the string contains blah"blah
it will be wrapped to become "blah"blah"
. However, doing this will lead to an error when reading the file, as no program will be able to understand that the original "
in the middle is part of the string (it will think that the string ends after the first blah
).
This is why the csv standard says that whenever there's a double-quote inside a string, it needs to be ""escaped"" by a double-double-quote (e.g. ""
). This will allow systems to differentiate between the end of a string and just a double-quote character.
Because q's default output-wrapping mode is minimal
, this is exactly what happens to your 7"
string - it is both wrapped in double quotes, and the double-quote inside the string is escaped, leading to the seemingly-odd value "7"""
.
However, this is a legitimate csv value which can be read properly later on. Let's push the output of the command to another q command and see what happens. I'm piping the output to another q command which just takes column 3 and prints it. I'm using -W none
on that second q command so we can see the "bare value" of the column.
# First part is identical to the command you sent.
$ q -d";" -E"UTF-8" -D" - " "select * from records.csv" -D "," | q -d ',' "select c3 from -" -W none
7"
$
This is of course just a test command to see that the output csv can be read again properly. In your case, adding -W none
to your original command would create the output you expect. Note that there are some tools and systems which create/read csv which do not adhere fully to those standards, so if you want to export/import data to other systems or tools, make sure to use the proper -w/-W
flags that will match the relevant tool.
Hope this helps, please write to me with any further issues or questions.
Harel