gregrahn/join-order-benchmark

Malformed CSVs

chsalgado opened this issue · 1 comments

Downloaded CSV tarball. Trying to upload to SQL Azure using bcp proved to be really hard as CSVs are malformed.

Sample CSV row in aka_name.csv
220222,538021,""Borolas", Joaquín García Vargas",,B6425,J2526,B642,6526774f1ce04414f56476409ce59060

CSV expects quotation marks to be escaped as "", not "
220222,538021,"""Borolas"", Joaquín García Vargas",,B6425,J2526,B642,6526774f1ce04414f56476409ce59060

Hey @chsalgado, we had CSV problems as well (see #11), but the mentioned row looks fine to me:

$ grep '^220222' aka_name.csv
220222,538021,"\"Borolas\", Joaquín García Vargas",,B6425,J2526,B642,6526774f1ce04414f56476409ce59060

Maybe your terminal does not show the escape character?
It's still cumbersome as most software expects quotes to be escaped as "", but the given files should be importable to most systems if you set the escape character correctly.

In case you cannot change the escape symbol, this (rather hacky) command might help you (not guarantees):
for csv_file in *.csv; do echo $csv_file; sed -i'' -e 's/\\\\\"/MARKER1/g;s/\\\\"/MARKER2/g;s/\\"/""/g;s/MARKER1/\\\\""/g;s/MARKER2/\\\\"/g' $csv_file; done