Error parsing v6 addresses in short format
mkorkalo opened this issue · 2 comments
mkorkalo commented
This rule is valid in suricata, but fails parsing here.
>>> rules = parsuricata.parse_rules('alert ip $HOME_NET any -> [2a00:1450:4010:0c0e:0000:0000:0000:005e] any (msg:"msg";)')
>>> rules = parsuricata.parse_rules('alert ip $HOME_NET any -> [2a00:1450:4010:0c0e::005e] any (msg:"msg";)')
Traceback (most recent call last):
File ".../lark/parsers/lalr_parser.py", line 126, in feed_token
action, arg = states[state][token.type]
KeyError: '__ANON_6'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File ".../parsuricata/__init__.py", line 9, in parse_rules
return parser.parse(source)
File ".../lark/lark.py", line 581, in parse
return self.parser.parse(text, start=start, on_error=on_error)
File ".../lark/parser_frontends.py", line 106, in parse
return self.parser.parse(stream, chosen_start, **kw)
File ".../lark/parsers/lalr_parser.py", line 41, in parse
return self.parser.parse(lexer, start)
File ".../lark/parsers/lalr_parser.py", line 171, in parse
return self.parse_from_state(parser_state)
File ".../lark/parsers/lalr_parser.py", line 188, in parse_from_state
raise e
File ".../lark/parsers/lalr_parser.py", line 179, in parse_from_state
state.feed_token(token)
File ".../lark/parsers/lalr_parser.py", line 129, in feed_token
raise UnexpectedToken(token, expected, state=self, interactive_parser=None)
lark.exceptions.UnexpectedToken: Unexpected token Token('__ANON_6', '005') at line 1, column 49.
Expected one of:
* RSQB
* COMMA
theY4Kman commented
Hmm, yeah, it looks like the popular IPv6 regex from this StackOverflow answer does not handle reduced addresses well.
(([0-9a-fA-F]{1,4}:){7,7}[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,7}:|([0-9a-fA-F]{1,4}:){1,6}:[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,5}(:[0-9a-fA-F]{1,4}){1,2}|([0-9a-fA-F]{1,4}:){1,4}(:[0-9a-fA-F]{1,4}){1,3}|([0-9a-fA-F]{1,4}:){1,3}(:[0-9a-fA-F]{1,4}){1,4}|([0-9a-fA-F]{1,4}:){1,2}(:[0-9a-fA-F]{1,4}){1,5}|[0-9a-fA-F]{1,4}:((:[0-9a-fA-F]{1,4}){1,6})|:((:[0-9a-fA-F]{1,4}){1,7}|:)|fe80:(:[0-9a-fA-F]{0,4}){0,4}%[0-9a-zA-Z]{1,}|::(ffff(:0{1,4}){0,1}:){0,1}((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])|([0-9a-fA-F]{1,4}:){1,4}:((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9]))
But a comment on the answer does appear to provide a regex that does work well
(fe80:(:[0-9a-fA-F]{0,4}){0,4}%[0-9a-zA-Z]{1,}|([0-9a-fA-F]{1,4}:){1,2}(:[0-9a-fA-F]{1,4}){1,5}|([0-9a-fA-F]{1,4}:){1,3}(:[0-9a-fA-F]{1,4}){1,4}|[0-9a-fA-F]{1,4}:((:[0-9a-fA-F]{1,4}){1,6})|([0-9a-fA-F]{1,4}:){1,4}(:[0-9a-fA-F]{1,4}){1,3}|([0-9a-fA-F]{1,4}:){1,5}(:[0-9a-fA-F]{1,4}){1,2}|([0-9a-fA-F]{1,4}:){1,6}:[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,7}:|([0-9a-fA-F]{1,4}:){7,7}[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,4}:((25[0-5]|2[0-4][0-9]|1{0,1}[0-9]{0,1}[0-9])\.{3,3})(25[0-5]|2[0-4][0-9]|1{0,1}[0-9]{0,1}[0-9])|::(ffff(:0{1,4}){0,1}:){0,1}((25[0-5]|2[0-4][0-9]|1{0,1}[0-9]{0,1}[0-9])\.{3,3})(25[0-5]|2[0-4][0-9]|1{0,1}[0-9]{0,1}[0-9])|:((:[0-9a-fA-F]{1,4}){1,7}|:))
I'll swap them out and push out a release.
theY4Kman commented
Okie dokes, fixed and released in version 0.3.3
Thanks for taking the time out of your day to report this <3