Problem with parsing literal string regex
Closed this issue · 6 comments
funzoneq commented
require 'toml'
stream = <<-EOS
# <28>Jan 14 14:49:55 6.14.2-4550-TEST craftd[1264]: Minor alarm set, BGP Routing Protocol usage requires a license
[UdpInput]
address = ":514"
decoder = "syslog_transform_decoder"
test = 'liter\a/l'
[syslog_transform_decoder]
type = "PayloadRegexDecoder"
match_regex = '^<(?P<Pri>\d+)>(?P<Timestamp>\w{3}\s+\d+\s+\d+:\d+:\d+) (?P<Hostname>[^\s]+) (?P<Process>[\w\/]+)\[(?P<Pid>\d+)\]:\s+(?P<Message>[^\n]+)'
timestamp_layout = "Jan _2 15:04:05"
EOS
TOML.parse(stream)
Gives the following error:
TOML::ParseError: Failed to parse input on line 9 at offset 14
match_regex = '^<(?P<Pri>d+)>(?P<Timestamp>w{3} +d+ +d+:d+:d+) (?P<Hostname>[^ ]+) (?P<Process>[w/]+)[(?P<Pid>d+)]: +(?P<Message>[^
^
from /Library/Ruby/Gems/2.0.0/gems/toml-rb-0.3.8/lib/toml/parser.rb:15:in `rescue in initialize'
from /Library/Ruby/Gems/2.0.0/gems/toml-rb-0.3.8/lib/toml/parser.rb:11:in `initialize'
from /Library/Ruby/Gems/2.0.0/gems/toml-rb-0.3.8/lib/toml.rb:30:in `new'
from /Library/Ruby/Gems/2.0.0/gems/toml-rb-0.3.8/lib/toml.rb:30:in `parse'
from (irb):78
from /usr/bin/irb:12:in `<main>'
emancu commented
@funzoneq Thanks for reporting this.
I found the error, it is because the \n
is not escaped on your regular expression.
I'm not sure if this is an error or not. So give me a little of time for this.
FYI there is a channel on freenode #toml-rb
where you can find me
emancu commented
If you read it from a file it works.
So probably it is an issue or is weird behavior on ruby's strings.
emancu commented
Could you tell me your expected hash ? (Only the regular expression needed)
funzoneq commented
heka = { "syslog_transform_decoder" => { "match_regex" => '^<(?P<Pri>\d+)>(?P<Timestamp>\w{3}\s+\d+\s+\d+:\d+:\d+) (?P<Hostname>[^\s]+) (?P<Process>[\w\/]+)\[(?P<Pid>\d+)\]:\s+(?P<Message>[^\n]+)' }}
would output:
{
"syslog_transform_decoder"=> {
"match_regex"=>"^<(?P<Pri>\\d+)>(?P<Timestamp>\\w{3}\\s+\\d+\\s+\\d+:\\d+:\\d+) (?P<Hostname>[^\\s]+) (?P<Process>[\\w\\/]+)\\[(?P<Pid>\\d+)\\]:\\s+(?P<Message>[^\\n]+)"
}
}
emancu commented
@funzoneq The issue is you were using <<-EOF.
Look at this example
irb> a = %q(match_regex = '^<(?P<Pri>\d+)>(?P<Timestamp>\w{3}\s+\d+\s+\d+:\d+:\d+) (?P<Hostname>[^\s]+) (?P<Process>[\w\/]+)\[(?P<Pid>\d+)\]:\s+(?P<Message>[^\n]+)')
=> "match_regex = '^<(?P<Pri>\\d+)>(?P<Timestamp>\\w{3}\\s+\\d+\\s+\\d+:\\d+:\\d+) (?P<Hostname>[^\\s]+) (?P<Process>[\\w\\/]+)\\[(?P<Pid>\\d+)\\]:\\s+(?P<Message>[^\\n]+)'"
irb> TOML.parse a
=> {"match_regex"=>
"^<(?P<Pri>\\d+)>(?P<Timestamp>\\w{3}\\s+\\d+\\s+\\d+:\\d+:\\d+) (?P<Hostname>[^\\s]+) (?P<Process>[\\w\\/]+)\\[(?P<Pid>\\d+)\\]:\\s+(?P<Message>[^\\n]+)"}
funzoneq commented
Ok, weird. Thanks for the clarification.