tarantool/avro-schema

Support UTF-8 in record names, field names and enums

Totktonada opened this issue · 0 comments

  1. Should we check utf-8 validity?
    • I think yes, because is seems that there are no way to ban certain
      symbols in encoding-unaware way.
    • But once we checked it is valid utf8 we can still use built-in regexps
      (it allows to don't rewrite internals a lot).
  2. Should we check for some symbols like period or zero byte?
    • Period at least, see, say, fullname (frontend.lua).
  3. How to better organize this feature with utf8_enums flag?
    • I think we should just keep this flag and prefer this behaviour when both
      flags are provided. But the deletion unlikely will hurt anyone.
  4. Use tarantool facilities for identifiers?
    • No cost way: don't use tarantool identifiers, don't perform any validity
      check.
    • Use tarantool identifiers. It seems to be the good way. There are two
      possible approaches (both requires new utf8 module):
      • Add forbidden symbols into identifier_check* and expose identifier.c
        into Lua (add to utf8 module).
      • Expose identifier.c into Lua (add to utf8 module) and perform the
        identifier traversal using utf8.next for forbidden symbols.

Blocked by: tarantool/tarantool#3405

The feature is to enable under flag, because of the spec compatibility.