GBIF DWCA data with double quote characters not processed correctly
Closed this issue · 0 comments
cjgrady commented
Describe the bug
If a GBIF DWCA has a double-quote in the occurrence data, the file is not processed correctly and an IndexError can be thrown.
To Reproduce
fn = 'some file with a double-quote in a field'
from lmpy.point import PointDwcaReader
with PointDwcaReader(fn) as reader:
for points in reader:
pass
Expected behavior
GBIF DWCA occurrence files are not quoted (as specified in the meta.xml file, fieldsEnclosedBy="") so the double-quote character should not be recognized as anything.
The problem... and solution
csv.reader defaults to '"' as quotechar and I set the default value of fieldsEnclosedBy to also be '"'. Instead, set the default to an empty string and set quotechar to None if it is still an empty string after metadata processing.