Is it possible to use a G4 grammar from Instaparse? (Clojure grammar)
timothypratley opened this issue · 4 comments
I think G4 is an ANTLR thing, I'm not sure why they like that format but it does not seem to be natively compatible with Instaparse... at least I tried loading this grammar:
https://github.com/antlr/grammars-v4/blob/master/clojure/Clojure.g4
And got an error.
I think I have to translate G4 to EBNF?
Are there any tools or examples I could draw on here?
My apologies if there is a better forum to ask this question in!
I am not familiar with G4, so unfortunately I don't have any advice to offer on how to do the translation.
No worries! On closer inspection, I think the only differences are:
- G4 has comments as:
/* multi-line */
and// single line
- fragment <-- I'm not sure what this is exactly but well it seems like a way to specify part of a rule for reuse.
So I suspect translating them is pretty easy, I'll report back with more details if I can get it working
Just for reference, this is what I came up with:
file: form * ;
<form>: literal | list | vector | map | reader_macro;
<forms>: form * ;
list: <'('> forms <')'> ;
vector: <'['> forms <']'> ;
map: <'{'> (form form)* <'}'> ;
set: <'#{'> forms <'}'> ;
reader_macro
: lambda
| meta_data
| regex
| var_quote
| host_expr
| set
| tag
| discard
| dispatch
| deref
| quote
| backtick
| unquote
| unquote_splicing
| gensym
;
quote: <'\''> form ;
backtick: <'`'> form ;
unquote: <'~'> form ;
unquote_splicing: <'~@'> form ;
tag: <'^'> form form ;
deref: <'@'> form ;
gensym: SYMBOL <'#'> ;
lambda: <'#('> form* <')'> ;
meta_data: <'#^'> (map form | form) ;
var_quote: <'#\''> symbol ;
host_expr: <'#+'> form form ;
discard: <'#_'> form ;
dispatch: <'#'> symbol form ;
regex: <'#'> string ;
literal: string | number | character | nil | BOOLEAN | keyword | symbol | param_name ;
string: STRING;
hex: HEX;
bin: BIN;
bign: BIGN;
number: FLOAT | hex | bin | bign | LONG ;
character : named_char | u_hex_quad | any_char ;
named_char: CHAR_NAMED ;
any_char: CHAR_ANY ;
u_hex_quad: CHAR_U ;
nil: NIL;
keyword: macro_keyword | simple_keyword;
<simple_keyword>: ':' symbol;
<macro_keyword>: ':' ':' symbol;
symbol: ns_symbol | simple_sym;
<simple_sym>: SYMBOL;
<ns_symbol>: NS_SYMBOL;
param_name: PARAM_NAME;
<STRING> : <'"'> #"(^\"|\\\")*" <'"'>;
<FLOAT>
: '-'? #"[0-9]+" FLOAT_TAIL
| '-'? 'Infinity'
| '-'? 'NaN'
;
<FLOAT_TAIL>: FLOAT_DECIMAL FLOAT_EXP | FLOAT_DECIMAL | FLOAT_EXP ;
<FLOAT_DECIMAL>: '.' #"[0-9]+" ;
<FLOAT_EXP>: #"[eE]" '-'? #"[0-9]+" ;
<HEXD>: #"[0-9a-fA-F]" ;
<HEX>: '0' #"[xX]" HEXD+ ;
<BIN>: '0' #"[bB][10]+" ;
<LONG>: '-'? #"[0-9]+[lL]?";
<BIGN>: '-'? #"[0-9]+[nN]";
<CHAR_U> : '\\' 'u'#"[0-9D-Fd-f]" HEXD HEXD HEXD ;
<CHAR_NAMED>: '\\' ( 'newline' | 'return' | 'space' | 'tab' | 'formfeed' | 'backspace' ) ;
<CHAR_ANY>: '\\' #"." ;
<NIL> : 'nil';
<BOOLEAN> : 'true' | 'false' ;
<SYMBOL>: '.' | '/' | NAME ;
<NS_SYMBOL>: NAME '/' SYMBOL ;
<PARAM_NAME>: '%' (#"[1..9][0...9]*"|'&')? ;
<NAME>: SYMBOL_HEAD SYMBOL_REST* (':' SYMBOL_REST+)* ;
<SYMBOL_HEAD>: #"[^0..9\^`\\\"#~@:/%\()\[\]{} \n\r\t,]" ;
<SYMBOL_REST>: SYMBOL_HEAD | #"[0..9]" | '.' ;
<COMMENT>: ';' #"[^\r\n]*" ;
It's not quite right but I'm going to come back to it later.
This version seems to be mostly working :)
https://github.com/timothypratley/clojure-ebnf-grammar/blob/master/resources/clojure.ebnf