Unicode separated values (USV) don't work
bakul opened this issue · 7 comments
From See https://github.com/sixarm/usv, in USV
Fields are separated by ␟ = U+241F = Symbol for Unit Separator &
Records are separated by ␞ = U+241E = Symbol for Record Separator
From https://news.ycombinator.com/item?id=31360327
$ cat t.usv && echo
id␟name␟age␞1␟Bob "Billy" Smith␟42␞2␟Jane
Brown␟37
$ goawk -F␟ -vRS=␞ -vOFS=, '{ print $1, $2, $3 }' t.usv
id,name,age
1,Bob "Billy" Smith,42
2,Jane
Brown,37
This works in goawk, gawk & mawk but not awk. The USV values are kind of hard to see
indeed. USV is not supported in OTA.
Shouldn't the user be allowed to pick any regexp as a field and any char/string as a record separator? Now that awk is extended to Unicode, I don't see why the above shouldn't be possible.
I did some experimentation, and there's a general problem here, using Unicode characters as RS and apparently as FS. Something broke sometime, since I had done some (minimal) testing using Unicode as RS. The code is somewhat fragile, unfortunately. I am reopening this issue, but I don't know when it will be solved.
this has been fixed - thank you @arnoldrobbins
@benhoyt Please update your forum post that this issue is fixed.
@arnoldrobbins Unfortunately one can't edit or even reply to HN comments after a certain amount of time, and for that forum post that time has elapsed. Thanks for the fix though!