alan-turing-institute/CleverCSV

Python csv does a better job with escape characters and quotes than CleverCSV

Closed this issue · 1 comments

Hello,
First of all, thank you for CleverCSV. I use it mainly as a replacement of Python csv. However, I noticed that Python's csv does a better job at handling escape characters and quoes:

Please consider the following:

import clevercsv
import csv
from io import StringIO

data = """sku,features,attributes
22221,"[{""key"":""heel_height"",""value"":""Ulttra High (4\\""+)""}]","11,room"
"""

print("Python csv")

stream = StringIO(data)

reader = csv.reader(stream, delimiter=',', quotechar='"', escapechar='\\')

for row in reader:
    print(row)

# ---------------

print("clever csv")

stream = StringIO(data)

for row in clevercsv.reader(stream, delimiter=',', quotechar='"', escapechar='\\'):
    print(row)

This will print:

Python csv
['sku', 'features', 'attributes']
['22221', '[{"key":"heel_height","value":"Ulttra High (4"+)""}]"', '11,room']
clever csv
['sku', 'features', 'attributes']
['22221', '"[{"key":"heel_height","value":"Ulttra High (4""+)""}]","11', 'room"\n']

Clever CSV splits the line in the wrong place. It also convert the \" into "" which is not correct.

Hi @seperman thanks for opening this issue. I've taken a look and I think in this particular case the problem is with the dialect, not the parsing. If we don't set the escape character then the text is parsed correctly, and I've added a test in 4b4082a to illustrate this. Please reopen the issue if that doesn't resolve your problem.