faradayio/scrubcsv

doesn't fix or detect linebroken headers

seamusabshere opened this issue · 4 comments

$ scrubcsv broken-headers.csv
2 rows (0 bad) in 0.00 seconds, 923.89 KiB/sec
"Date
Received",Customer Last Name,Customer First Name,Site Address ,Zip Code,"Customer
Phone #",Customer Email
2/2/11,xxx,xxx,xxx,xxx,xxx,xxx

broken-headers.csv.zip

emk commented

That's... just... ugh.

Thank you for the bug report!

emk commented

OK, I talked to @seamusabshere about this, and we decided that the best fix here would be a CLI option that applies s/^\s+//; s/\s+$//; s/\s+/ /g to all cells, assuming \s matches newlines, too.

emk commented

For now, the recommended workaround is feed everything through scrubcsv to normalize it, and then use a short Ruby program to normalize the data in the CSV cells.

this now works:

$ scrubcsv broken-headers.csv  --replace-newlines
2 rows (0 bad) in 0.00 seconds, 686.40 KiB/sec
Date Received,Customer Last Name,Customer First Name,Site Address ,Zip Code,Customer Phone #,Customer Email
2/2/11,xxx,xxx,xxx,xxx,xxx,xxx