ahrefs/atd

atdgen-ocaml: utf-8 Vs byte-array strings

Opened this issue · 0 comments

(I've the impression this should be an FAQ but could not find any discussion on this:)

Atdgen maps ATD “strings” to JSON strings which are supposed to be valid Unicode (UTF-8 in practice), and also directly to OCaml string values which can be arbitrary byte-arrays.

Should Mod_j functions have the option failing earlier if an input string is not valid? (I guess that would be having default or first-class-citizen validator entries? -j-pp seems to only work in one direction).

Does it make sense to add a byte-array core type to ATD?

Many tools already just don't care, should this just be documented somewhere properly?

Right now the ATD definition doc just says “Sequence of bytes or characters” …