strblr/pegase

Creating an AlternativeParser from string injected into template literal

Closed this issue · 4 comments

There seems to be a slight unexpected behavior in how peg interprets a string passed into a template literal which already has alteration formatting applied to it:

const alternatives = ['a', 'b', 'c'];
const asAlternative = alternatives.map(a => `"${a}"`).join(' | ')
const normal = peg`"a" | "b" | "c"`
const injection = peg`${asAlternative}`

In the above snippet, normal should generate an AlternativeParser, but injection instead generates a LiteralParser:

console.log(normal) =>

AlternativeParser {
  defaultOptions: {},
  parsers: [
    LiteralParser {
      defaultOptions: {},
      literal: 'a',
      emit: true,
      expected: [Object]
    },
    LiteralParser {
      defaultOptions: {},
      literal: 'b',
      emit: true,
      expected: [Object]
    },
    LiteralParser {
      defaultOptions: {},
      literal: 'c',
      emit: true,
      expected: [Object]
    }
  ]
}

console.log(injection) => 

LiteralParser {
  defaultOptions: {},
  literal: '"a" | "b" | "c"',
  emit: false,
  expected: { type: 'LITERAL', literal: '"a" | "b" | "c"' }
}

As can be seen from the console logging on injection, the string value passed into the template literal appears to be a valid alternative expression for a PEG. Is there a mechanism---or potential road map---to support a use case along these lines?

(Obvious caveat: there could be some oddity in how template literals work, to which I am ignorant, regarding embedding string fragments in the fashion suggested above. I cannot rule user error out, so preemptive apologies should that be the case.)

You're raising an excellent point. So the way this is currently implemented, template tag arguments that appear at a "parser position" in the grammar (i.e. in opposition to a directive argument, or a repetition quantifier for example where template args can also appear) are cast into a Parser using the plugin chain. By default, there's one plugin in the chain called defaultPlugin which casts numbers and strings into LiteralParser and RegExp into RegexParser:

https://github.com/ostrebler/pegase/blob/0aedbad40acc439d71537b51703b47e225f8bc0e/src/utility.ts#L446-L453

But that leave out the possibility of programmatically computing a grammar string and injecting it. I agree this should be possible.

A simple way would be to add an overload signature to peg to make it accept strings in a simple function call manner:

peg` some peg `;
peg("some peg"); // also valid

This would solve your problem by allowing you to do the following:

const alternatives = ['a', 'b', 'c'];
const asAlternative = peg(alternatives.map(a => `"${a}"`).join(' | '))
const injection = peg`${asAlternative}`

It could also become the default behavior of the casting done in defaultPlugin. String literals that should act as LiteralParser could then be injected with the following syntax instead:

const foo = "foo", bar = "bar";
peg` "${foo}" | '${bar}' `

This has two advantages: it's semantically more expressive (one instantly knows foo and bar are string literals), and it allows emissive parsing of external string literals by using double quotes.

I'll think it through and come up with a solution quickly (edit: done in v 0.3.15, you can now call peg directly with a string argument). Please keep in mind that this library is still under development and that this is a pre-release. I'm currently experimenting another parser generation system that uses the Function constructor. If it happens to be really faster (which I hope it will), this will change a lot of things.

Let me know if you need more help.

Thanks for the quick response and version update. I can confirm that the string parameter overload works as expected. This actually resolves the use case that I was interested in, so this can be closed, should you like.

@joshuabowers Quick update if you're actively using Pegase: I just released v0.4.0. I re-extracted logging logic (warnings and failures) into a separate Logger class. That's because I'm planning on externalizing visitor logic into a proper Visitor class, and a same Logger must be quickly sharable between the parser and multiple visitors.

Basically it just means you have to rename result.log() to result.logger.toString(), where result is a parse result. The warnings and failures arrays is on result.logger.

I'm close to a release of version 1 btw.

@ostrebler thanks for the notification; I'll keep those changes in mind.