ocaml-community/sedlex

Usage of lexers / regexps defined in an external module

ELLIOTTCABLE opened this issue · 3 comments

Given that the lids are declared outside of any syntax extension, I assumed that something like this would work:

(* uAX31.ml *)
let joining_type_transparent = [%sedlex.regexp?
  0x00AD (* 'SOFT HYPHEN' *)
| 0x0300 .. 0x036F (* 'COMBINING GRAVE ACCENT' - 'COMBINING LATIN SMALL LETTER X' *)
| 0x0483 .. 0x0489 (* 'COMBINING CYRILLIC TITLO' - 'COMBINING CYRILLIC MILLIONS SIGN' *)
]
(* lexer.ml *)
let transparent = [%sedlex.regexp? UAX31.joining_type_transparent | ...]

match%sedlex s with
| transparent -> (* ... *)
| _ -> ()

Is there a way to refer to lexer definitions in another file? Or do I need to concatenate all my lexer code into a single file at build-time?

The syntax extension has no way of storing information on declarations such that, when invoked on another file, it is capable of re-loading them for use.

(BTW, the "one file for the lexer" design may be a bit flawed, but flex, ocamllex, etc., all follow that same model. The one parsing tool in our ecosystem I can think of that doesn't do that is Menhir, which allows you to split a grammar into many files.)