specifysystems/lmpy

GBIF DWCA data with double quote characters not processed correctly

Closed this issue · 0 comments

Describe the bug

If a GBIF DWCA has a double-quote in the occurrence data, the file is not processed correctly and an IndexError can be thrown.

To Reproduce

fn = 'some file with a double-quote in a field'
from lmpy.point import PointDwcaReader
with PointDwcaReader(fn) as reader:
    for points in reader:
        pass

Expected behavior

GBIF DWCA occurrence files are not quoted (as specified in the meta.xml file, fieldsEnclosedBy="") so the double-quote character should not be recognized as anything.

The problem... and solution

csv.reader defaults to '"' as quotechar and I set the default value of fieldsEnclosedBy to also be '"'. Instead, set the default to an empty string and set quotechar to None if it is still an empty string after metadata processing.