theY4Kman/parsuricata

Error parsing v6 addresses in short format

mkorkalo opened this issue · 2 comments

This rule is valid in suricata, but fails parsing here.

>>> rules = parsuricata.parse_rules('alert ip $HOME_NET any -> [2a00:1450:4010:0c0e:0000:0000:0000:005e] any (msg:"msg";)')
>>> rules = parsuricata.parse_rules('alert ip $HOME_NET any -> [2a00:1450:4010:0c0e::005e] any (msg:"msg";)')
Traceback (most recent call last):
  File ".../lark/parsers/lalr_parser.py", line 126, in feed_token
    action, arg = states[state][token.type]
KeyError: '__ANON_6'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File ".../parsuricata/__init__.py", line 9, in parse_rules
    return parser.parse(source)
  File ".../lark/lark.py", line 581, in parse
    return self.parser.parse(text, start=start, on_error=on_error)
  File ".../lark/parser_frontends.py", line 106, in parse
    return self.parser.parse(stream, chosen_start, **kw)
  File ".../lark/parsers/lalr_parser.py", line 41, in parse
    return self.parser.parse(lexer, start)
  File ".../lark/parsers/lalr_parser.py", line 171, in parse
    return self.parse_from_state(parser_state)
  File ".../lark/parsers/lalr_parser.py", line 188, in parse_from_state
    raise e
  File ".../lark/parsers/lalr_parser.py", line 179, in parse_from_state
    state.feed_token(token)
  File ".../lark/parsers/lalr_parser.py", line 129, in feed_token
    raise UnexpectedToken(token, expected, state=self, interactive_parser=None)
lark.exceptions.UnexpectedToken: Unexpected token Token('__ANON_6', '005') at line 1, column 49.
Expected one of:
	* RSQB
	* COMMA

Hmm, yeah, it looks like the popular IPv6 regex from this StackOverflow answer does not handle reduced addresses well.

(([0-9a-fA-F]{1,4}:){7,7}[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,7}:|([0-9a-fA-F]{1,4}:){1,6}:[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,5}(:[0-9a-fA-F]{1,4}){1,2}|([0-9a-fA-F]{1,4}:){1,4}(:[0-9a-fA-F]{1,4}){1,3}|([0-9a-fA-F]{1,4}:){1,3}(:[0-9a-fA-F]{1,4}){1,4}|([0-9a-fA-F]{1,4}:){1,2}(:[0-9a-fA-F]{1,4}){1,5}|[0-9a-fA-F]{1,4}:((:[0-9a-fA-F]{1,4}){1,6})|:((:[0-9a-fA-F]{1,4}){1,7}|:)|fe80:(:[0-9a-fA-F]{0,4}){0,4}%[0-9a-zA-Z]{1,}|::(ffff(:0{1,4}){0,1}:){0,1}((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])|([0-9a-fA-F]{1,4}:){1,4}:((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9]))

image

But a comment on the answer does appear to provide a regex that does work well

(fe80:(:[0-9a-fA-F]{0,4}){0,4}%[0-9a-zA-Z]{1,}|([0-9a-fA-F]{1,4}:){1,2}(:[0-9a-fA-F]{1,4}){1,5}|([0-9a-fA-F]{1,4}:){1,3}(:[0-9a-fA-F]{1,4}){1,4}|[0-9a-fA-F]{1,4}:((:[0-9a-fA-F]{1,4}){1,6})|([0-9a-fA-F]{1,4}:){1,4}(:[0-9a-fA-F]{1,4}){1,3}|([0-9a-fA-F]{1,4}:){1,5}(:[0-9a-fA-F]{1,4}){1,2}|([0-9a-fA-F]{1,4}:){1,6}:[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,7}:|([0-9a-fA-F]{1,4}:){7,7}[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,4}:((25[0-5]|2[0-4][0-9]|1{0,1}[0-9]{0,1}[0-9])\.{3,3})(25[0-5]|2[0-4][0-9]|1{0,1}[0-9]{0,1}[0-9])|::(ffff(:0{1,4}){0,1}:){0,1}((25[0-5]|2[0-4][0-9]|1{0,1}[0-9]{0,1}[0-9])\.{3,3})(25[0-5]|2[0-4][0-9]|1{0,1}[0-9]{0,1}[0-9])|:((:[0-9a-fA-F]{1,4}){1,7}|:))

image

I'll swap them out and push out a release.

Okie dokes, fixed and released in version 0.3.3

Thanks for taking the time out of your day to report this <3