smolkaj/nice-parser

Why the usage of `cppo`?

ELLIOTTCABLE opened this issue · 5 comments

(Related: #2; is this the described “hack?”)

So, I'm relatively new to OCaml (hence using this project to try and understand parsing in OCaml! 🤣); but I don't understand why CPP-style source-transformation is necessary to reach Menhir's generated “token definitions”:

#include "Tokens.ml"
  [@@deriving show, enumerate]

module Sedlexing = LexBuffer
open LexBuffer
(* ... so on, so forth ... *)

Can't we use some sort of built-in language mechanic to include the symbols from the generated Tokens module into both Lexer.mli and the (eventual) Lexer.ml?

This has been a show-stopper for me; as a newbie, I still really depend on Merlin to understand what's going on and write code — and the CPPO invocation at the top of these files completely breaks Merlin, right now. But if there's no other way to do this … )'=

(It looks like somebody's trying to build support for it, ocaml/merlin#548, but that may be a long way off? I'm kinda an outsider, so I'm not sure. See also: let-def/merlin-extend#7.)

I need to include the type definition syntactically, not just semantically, so that [@@deriving show, enumerate] will work. The reason is that the deriving mechanism is implemented as a syntax-to-syntax transformation.

Just to be clear, this is definitely a hack, as pointed out in #2. Maybe we can convince the menhir author to provide a to_string function, so we don't have to derive it.

Well, pending menhir#6, I have an equally-janky-but-in-a-different-way solution.

This completely discards any external dependencies, instead using jbuilder's promotion functionality and cross-platform pseudo-scripting to merge Tokens.ml into the Lexer module files:

(rule
 ((targets (Lexer.sedlex.ml))
  (deps    (Lexer.sedlex.body.ml Tokens.ml))
  (action  (with-stdout-to Lexer.sedlex.ml
    (progn (cat Tokens.ml)
           (echo "  [@@deriving show, enumerate]")
           (cat Lexer.sedlex.body.ml))))))

It's … not pretty, but it works.

With that in place, I removed the inclusion-headers from the Lexer files (in the source-tree); and if I want to iterate on the Lexer files, and keep the various OCaml tooling (like Merlin) working, then I just add open Tokens at the top of the Lexer file I'm working on: boom, no errors.

Ideally, I'd like to commit open Tokens into the file, and then have that preprocessed out, in favour of the actual source-text inclusion that cppo and/or my jbuilder configuration do, before building — but I couldn't figure out a way to do that, without 1. depending on external tooling (which takes me right back to cppo), or 2. writing a ppx-rewriter to replace an open statement with the verbatim content of the mentioned module (which is a bit beyond my skills).

(I didn't bother to pull-request, because it depends on unpublished jbuilder functionality. cppo is probably a net win, compared to this, if you don't need Merlin.)

Hope this helps somebody! <3

Looks like we can use ppx_import to solve this in a cleaner way soon: ocaml-ppx/ppx_import#26