Support UTF-8 in record names, field names and enums
Totktonada opened this issue · 0 comments
Totktonada commented
- Should we check utf-8 validity?
- I think yes, because is seems that there are no way to ban certain
symbols in encoding-unaware way. - But once we checked it is valid utf8 we can still use built-in regexps
(it allows to don't rewrite internals a lot).
- I think yes, because is seems that there are no way to ban certain
- Should we check for some symbols like period or zero byte?
- Period at least, see, say, fullname (frontend.lua).
- How to better organize this feature with utf8_enums flag?
- I think we should just keep this flag and prefer this behaviour when both
flags are provided. But the deletion unlikely will hurt anyone.
- I think we should just keep this flag and prefer this behaviour when both
- Use tarantool facilities for identifiers?
- No cost way: don't use tarantool identifiers, don't perform any validity
check. - Use tarantool identifiers. It seems to be the good way. There are two
possible approaches (both requires new utf8 module):- Add forbidden symbols into identifier_check* and expose identifier.c
into Lua (add to utf8 module). - Expose identifier.c into Lua (add to utf8 module) and perform the
identifier traversal using utf8.next for forbidden symbols.
- Add forbidden symbols into identifier_check* and expose identifier.c
- No cost way: don't use tarantool identifiers, don't perform any validity
Blocked by: tarantool/tarantool#3405
The feature is to enable under flag, because of the spec compatibility.