gbif/dwca-io

ArchiveFactory uses CSVReaderFactory instead of TabularDataFileReader

Closed this issue · 3 comments

The recent fixes related to #26 which were done in gbif-common were implemented and tested using TabularDataFileReader. However, dwca-io references a different class, CSVReaderFactory, which doesn't look like it was patched and the silent failure when parsing "newline in quoted" fields issue appears to still be present.

Was the intention to replace and deprecate CSVReader/CSVReaderFactory that only uses line splitting since TabularDataFileReader uses jackson-csv and the RFC rules to properly parse CSV files?

Sorry for the delay in verifying this, the ALA has been upgrading their infrastructure to use Java-8 and I didn't get around to testing the fix until now.

The intention is indeed to deprecate CSVReader/CSVReaderFactory.
Some methods in the Archive class will also be deprecated in favour of org.gbif.dwc.DwcFiles.

Thanks, I wasn't aware of DwcFiles, when I get a chance I will try it out on the archive we have been having issues with and get back to you.