Data-Liberation-Front/csvlint.io

Support for UTF-8 special Characters

Jamie-Atkinson opened this issue · 3 comments

Expected Behaviour

When uploading data for checking any rows that look like this:

9107,McKee’s,11 Fairhill,,,Maghera,BT46 5AY,Northern Ireland,Processing Plant (Meat) Cutting Plant (Red) Mince Meat Establishment Meat Preparation Establishment,,CP (Cutting Plant),,,,MM (Mince Meat Establishment) MP (Meat Preparation Establishment),PP (Processing Plant),,,,,,,,,,,,Section: VI (PP) Section: I (CP) Section: V (MM) Section: V (MP),,Bovine Ovine Porcine,,,,Yes,,,Yes,Yes,,,,,,,,,Yes,,,,,,,,Yes,Yes,,,,Yes,,,,,,,Food Standards Agency,,,,

I would expect them to return as sent, excluding any potential formatting issues and the "".

Current Behaviour (for problems)

Currently that row from a dataset returns:

"9107","McKee???s","11 Fairhill","","","Maghera","BT46 5AY","Northern Ireland","Processing Plant (Meat) Cutting Plant (Red) Mince Meat Establishment Meat Preparation Establishment","","CP (Cutting Plant)","","","","MM (Mince Meat Establishment) MP (Meat Preparation Establishment)","PP (Processing Plant)","","","","","","","","","","","","Section: VI (PP) Section: I (CP) Section: V (MM) Section: V (MP)","","Bovine Ovine Porcine","","","","Yes","","","Yes","Yes","","","","","","","","","Yes","","","","","","","","Yes","Yes","","","","Yes","","","","","","","Food Standards Agency","","","",""

Please note that McKee’s has turned into McKee???s. I believe this is due to a lack of UTF-8 support within the CSVlint application.

Steps to Reproduce (for problems)

Provide a link to a live example, or an unambiguous set of steps to reproduce this bug. Include code to reproduce, if relevant

  1. download github.txt and convert the txt back to csv (github would not upload csv)
  2. submit the data to the csvlint app
  3. download the standardised version

Your Environment

google chrome version: Version 81.0.4044.129 (Official Build) (64-bit)
Windows 10 laptop
atom for opening and inspecting files

Is it possible to look at getting UTF-8 support added to csvlint?

Many thanks

Jamie

this may be connected to #267

thanks @Jamie-Atkinson for the detailed report, we're just getting this application back into regular maintenance, so hopefully we'll be able to look at this before too long.

Super thanks Floppy/ All