lezer-parser/lezer

Consider making node names exports rather than strings

marijnh opened this issue · 5 comments

This would make using them less error-prone and allow autocompletion and such.

It does make some things that are currently allowed, such as multiple productions mapping to the same node name, problematic.

Also interfaces like NodeType.match, which map from node names (and optionally space-separated lists of them) to other values become a lot more awkward, because you'd need to import each node and put it between square brackets in the object literal, and can't list multiple types in a single property in a straightforward way anymore. ({[`${Identifier} ${Number}`]: x} doesn't really roll of the tongue).

Group names would then probably also need to become numbers, for consistency.

I agree, however I think it would be easier/better to use Typescript types instead of making them exports. I.e.

type NodeType = "Expression" | "Declaration" | ...;

I think the best way to do this is:

  1. Parameterise the parser over the node type, i.e. instead of:
export interface SyntaxNode {
  /// The name of the node (`.type.name`).
  name: string

Do

export interface SyntaxNode<T extends string> {
  /// The name of the node (`.type.name`).
  name: T

Admittedly you have to add it in quite a few places but it would be worth it.

  1. Change nodeNames from a string to an array of strings. I'm not sure why it uses a string in the first place really. I doubt that it is faster since the only thing you do with node names is compare them so surely it is easier for V8 to deal with separate string constants than arbitrary substrings of a bigger string? In other words, instead of:

    nodeNames: "⚠ JsonText True False Null Number String } { Object Property PropertyName ] [ Array",

Do

  nodeNames: ["⚠", "JsonText", "True", "False", "Null", "Number", "String", "}",  "{", "Object", "Property", "PropertyName",  "]",  "[",  "Array"],

Then either have it generate a Typescript .d.ts file, or better yet add support for proper Typescript output. If you do then you can do this little trick:

const nodeNames = ["⚠", "JsonText", "True", "False", "Null", "Number", "String", "}",  "{", "Object", "Property", "PropertyName",  "]",  "[",  "Array"] as const;
type NodeName = typeof nodeNames[number];

It infers NodeName as "⚠" | "JsonText" | "True" | "False".

Here's a branch where I added the node names type as a generic parameter: https://github.com/Timmmm/lezer-tree/blob/parameterised_node_names/src/tree.ts

It does add a ton of <NN extends string = string> everywhere which is a bit verbose. I'm not sure of any simpler way that is still correct though. Also I had to use any in a couple of places but nothing too bad.

That's definitely not what I had in mind, though. I don't really see many ways in which this would be usable or helpful. What's the idea behind it?

It would do exactly what you said:

This would make using them less error-prone and allow autocompletion and such.

Is that not what this issue is about?

Not really. I was considering exporting the node IDs as bindings. I have no intention to type-parameterize half the library on node name strings.

It's rarely a good idea to jump in and start a big refactor of someone else's library without first discussing the direction you're going in.