gustavoaca1997/Caribay

How to give names to parsed expressions?

Opened this issue · 0 comments

Hi! Thank you for the great project. Is there a way to give names to captured groups? I noticed that they end up in integer indices of the returned tables, and this makes post-processing extremely difficult for large grammars since one has to remember the correct index and the code is also difficult to read.

Is there a way to give names to parsed expression so that the result can show up like { ["value"] = 3, ["pos"] = 1, ["tag"] = INT } instead of { [1] = 3, ["pos"] = 1, ["tag"] = INT }? Thanks!

Also as a bonus a way to discard matching patterns (i.e. don't insert them into the returned table) would be another great improvement over usability.


As a small example to clarify: this is the current code with its result below.

local generator = require"caribay.generator"

function dump(o)
   if type(o) == 'table' then
      local s = '{ '
      for k, v in pairs(o) do
	 if type(k) ~= 'number' then k = '"'..k..'"' end
	 s = s .. '['..k..'] = ' .. dump(v) .. ', '
      end
      return s .. '}'
   else
      return tostring(o)
   end
end

local grammar = [[
    grammar <- value*
    fragment value <- FLOAT / INT / STRING
    INT <- ("+" / "-")? [0-9]+
    FLOAT <- ("+" / "-")? [0-9]+ '.' [0-9]+
    STRING <- '"' [^\"]+ '"'
]]

local match = generator.gen(grammar)
print(dump(match([[3 4.5 "Hey!" 9]])))

Result:

{ [1] = { [1] = 3, ["pos"] = 1, ["tag"] = INT, },
  [2] = { [1] = 4.5, ["pos"] = 3, ["tag"] = FLOAT, },
  [3] = { [1] = "Hey!", ["pos"] = 7, ["tag"] = STRING, },
  [4] = { [1] = 9, ["pos"] = 14, ["tag"] = INT, }, ["pos"] = 1, ["tag"] = grammar, }

It would be great if the grammar could be specified as:

local grammar = [[
    grammar <- value*
    fragment value <- FLOAT / INT / STRING
    INT <- :value: ("+" / "-")? [0-9]+
    FLOAT <- :value: ("+" / "-")? [0-9]+ '.' [0-9]+
    STRING <- :string: '"' [^\"]+ '"'
]]

and the result would then be:

{ [1] = { ["value"] = 3, ["pos"] = 1, ["tag"] = INT, },
  [2] = { ["value"] = 4.5, ["pos"] = 3, ["tag"] = FLOAT, },
  [3] = { ["string"] = "Hey!", ["pos"] = 7, ["tag"] = STRING, },
  [4] = { ["value"] = 9, ["pos"] = 14, ["tag"] = INT, }, ["pos"] = 1, ["tag"] = grammar, }

It would make post-processing after parsing much much easier!