[ BUG ] running with separator ¶
greyHairChooseLife opened this issue · 1 comments
greyHairChooseLife commented
Hi, there.
$ tw --separator '¶' ./my.csv
$ tw --infer-schema safe --separator '¶' ./my.csv
$ tw --infer-schema no --separator '¶' ./my.csv
These returns all same.
Error: ComputeError(ErrString("could not parse `1009880005252�` as dtype `str` at column '�' (column number 1)\n\nThe current offset in the file is 154 bytes.\n\nYou might want to try:\n- increasing `infer_schema_length` (e.g. `infer_schema_length=10000`),\n- specifying correct dtype with the `dtypes` argument\n- setting `ignore_errors` to `True`,\n- adding `1009880005252�` to the `null_values` list.\n\nOriginal error: ```invalid utf-8 sequence```"))
It says invalid utf-8 but it is valid utf-8. Any clue, please?
Regards
shshemi commented
Hi,
Thank you for reporting this.
The problem is that the underlying CSV library (Polars) assumes that separator and quote characters are ASCII. Therefore, the Pilcrow character is cast into u8
and turned into an invalid character.
A more user-friendly message will be shown in the next version.
Let me know if I can help with anything else.
Bests