davedelong/CHCSVParser

Apostrophe in CSV file

rrosendahl opened this issue · 3 comments

Maybe I'm missing something, but I'm parsing a CSV file (from Excel). Everything seems to work well - including parsing fields which contains commas and are embedded in "...": - until the parser comes across the first semicolon (the field is not inside "..."). The parser stops as if it was looking for a 2nd semicolon, which doesn't exist in the file. How do I solve for this? (Excel doesn't see the need to embed cells with semicolons inside "..." when exporting.)

Okay, interesting. I did some more digging and found out this issue exists with certain characters:

  • 8217 (decimal) - which looks like a semicolon, well, almost
  • ė - french characters.


    Not sure if this is an encoding issue or related?
    BTW: How can one specify the encoding when using
    arrayWithContentsOfDelimitedURL:(NSURL *)fileURL options:(CHCSVParserOptions)options delimiter:(unichar)delimiter ?

Is there a fix or hack to avoid this? I have a .csv which I can't parse beyond the first apostrophe, when I try to use:

rows = [NSMutableArray arrayWithContentsOfCSVURL:pathToFile];

Thanks!

This is how I managed to force encoding, bit awkward but works:

// filename starts with one "/", so add "file://" with only two slashes
NSString *urlPath = [NSString stringWithFormat:@"file://%@", filename];
NSInputStream *stream = [NSInputStream inputStreamWithURL:[NSURL URLWithString:urlPath]];
NSStringEncoding encoding = NSUTF8StringEncoding;
CHCSVParser *p = [[CHCSVParser alloc] initWithInputStream:stream usedEncoding:&encoding delimiter:'\t'];
p.delegate = self;
[p parse];

However I haven't yet been able to fix input syntax problems while parsing, it fails at first (unexpected) unicode character. Still trying, but considering going back to old parser.