How to give names to parsed expressions?
Opened this issue · 0 comments
Hi! Thank you for the great project. Is there a way to give names to captured groups? I noticed that they end up in integer indices of the returned tables, and this makes post-processing extremely difficult for large grammars since one has to remember the correct index and the code is also difficult to read.
Is there a way to give names to parsed expression so that the result can show up like { ["value"] = 3, ["pos"] = 1, ["tag"] = INT }
instead of { [1] = 3, ["pos"] = 1, ["tag"] = INT }
? Thanks!
Also as a bonus a way to discard matching patterns (i.e. don't insert them into the returned table) would be another great improvement over usability.
As a small example to clarify: this is the current code with its result below.
local generator = require"caribay.generator"
function dump(o)
if type(o) == 'table' then
local s = '{ '
for k, v in pairs(o) do
if type(k) ~= 'number' then k = '"'..k..'"' end
s = s .. '['..k..'] = ' .. dump(v) .. ', '
end
return s .. '}'
else
return tostring(o)
end
end
local grammar = [[
grammar <- value*
fragment value <- FLOAT / INT / STRING
INT <- ("+" / "-")? [0-9]+
FLOAT <- ("+" / "-")? [0-9]+ '.' [0-9]+
STRING <- '"' [^\"]+ '"'
]]
local match = generator.gen(grammar)
print(dump(match([[3 4.5 "Hey!" 9]])))
Result:
{ [1] = { [1] = 3, ["pos"] = 1, ["tag"] = INT, },
[2] = { [1] = 4.5, ["pos"] = 3, ["tag"] = FLOAT, },
[3] = { [1] = "Hey!", ["pos"] = 7, ["tag"] = STRING, },
[4] = { [1] = 9, ["pos"] = 14, ["tag"] = INT, }, ["pos"] = 1, ["tag"] = grammar, }
It would be great if the grammar could be specified as:
local grammar = [[
grammar <- value*
fragment value <- FLOAT / INT / STRING
INT <- :value: ("+" / "-")? [0-9]+
FLOAT <- :value: ("+" / "-")? [0-9]+ '.' [0-9]+
STRING <- :string: '"' [^\"]+ '"'
]]
and the result would then be:
{ [1] = { ["value"] = 3, ["pos"] = 1, ["tag"] = INT, },
[2] = { ["value"] = 4.5, ["pos"] = 3, ["tag"] = FLOAT, },
[3] = { ["string"] = "Hey!", ["pos"] = 7, ["tag"] = STRING, },
[4] = { ["value"] = 9, ["pos"] = 14, ["tag"] = INT, }, ["pos"] = 1, ["tag"] = grammar, }
It would make post-processing after parsing much much easier!