utf-8 encoding with BOM
TheFeelipe opened this issue · 4 comments
there is a problem in the glz::read_file_json function to read json files with utf-8 encoding with BOM
apparently it cannot recognize the first bytes to recognize the file encoding, is there any way already existing in the library to recognize it?
error: expected_brace (5)
JSON requires UTF8, so a BOM shouldn't be needed. And, the BOM is not valid JSON for parsing into an object. So, having the BOM is technically invalid JSON.
8.1 of the RFC
Implementations MUST NOT add a byte order mark (U+FEFF) to the
beginning of a networked-transmitted JSON text. In the interests of
interoperability, implementations that parse JSON texts MAY ignore
the presence of a byte order mark rather than treating it as an
error.
Here is a helpful discussion on the topic:
JSON Specification and the usage of BOM
Since the specification says that the BOM may be ignored, I can add a compile time option to ignore reading the BOM and not error.
I'll keep this issue alive until that feature has been added.
Thanks for reporting this!
As I think about this more I don't really like the idea of supporting something that doesn't round-trip.
@TheFeelipe, I think it might be best to make your own file reading function into a std::string that discards this BOM.
If you can argue for why the BOM might be a good idea in some cases then I might reconsider. But, right now I'm thinking of avoiding this in Glaze because it isn't really something I want to encourage.
For some reason, characters with accents and similar bugs are bugged outside of utf8-bom or ansi, thinking about unicode it is more viable to use utf8-bom, at least on Windows, using visual studio 2022, even the encoding of .cpp pages needs to be in utf8-bom
about creating my own function to discard the good, yes it may be more viable
I'm closing this, but let me know if you ever think Glaze ought to add some features around BOM handling.