Write Data.ByteString.Char8 is probably wrong

Question

Write Data.ByteString.Char8 is probably wrong

Zane-XY opened this issue 11 years ago · 2 comments

I open this issue for a discussion, I agree it's wrong for representing non-ascii encoding using Char8, but this argument is misleading:

The short answer is 'any time you write Data.ByteString.Char8 in your code, your code is probably wrong

There're a lot of cases, especially when handling a large chunk of String, you know it's ASCII, for example you have a dictionary contains 10,000 words, there's nothing wrong to use Char8. It should be used with caution but not "any time, probably wrong".

Answer 1 · 2014-04-23T08:52:06.000Z

The text already mentions that HTTP headers are guaranteed ASCII. Apart from a handful of network/systems programming examples, I don't know of any cases in which data is wrong when it's not ASCII.

In particular, a dictionary with 10,000 words can consist of many languages, very few of which contain only ASCII (and even in English some people like to use diacritics from time to time, e.g. "naïve"). Reading it as BS and decoding it using Text is not much more complicated, and preserves all data.

Answer 2 · 2014-04-23T09:12:36.000Z

Basically what I mean is when the author has full control over the source of input, for example the dictionary is generated by the author or just impossible containing other characters, and making everything UTF8 compatible is kinda like premature optimization. Anyway, I'll close this, thanks for replying.