UTF-8 Safe Mode
dsheets opened this issue · 2 comments
dsheets commented
Right now, sexplib uses String.escaped
for serializing strings. If those strings contain high bytes that are not part of UTF-8 encoded sequences, they will be output as-is. This results in behavior like:
# Format.printf "%a@." Sexp.pp_mach (Sexp.of_string "(String\"\247\")");;
(String �)
When sexplib is used for logging and debugging, this can cause issues when UTF-8 valid text is expected. Perhaps the function used to escape strings could be parameterized? It would be really nice to efficiently (not generating and then iterating over the buffer checking for non-UTF-8 bytes and copying into another buffer) output UTF-8-safe strings.
Deleted user commented
Escaping all non-ascii characters seems like a good default. I submitted a change internally, it should be ready for the next release
dsheets commented
Great! Thanks!