logstash-plugins/logstash-input-file

input fails to extract lines with custom delimiter

kares opened this issue · 1 comments

kares commented

when a custom delimiter is used instead of "\n" e.g. delimiter => "</Some>" (assuming "multi-line" content in a single line),
the plugin fails to properly split lines - this mostly happens when the plugin manages to buffer up more than one "line":
"<Some><Content1>...</Content1></Some> <Some><Content2>...</Content2>"

and boils down to:

>> line = "<Some><Content1>...</Content1></Some> <Some><Content2>...</Content2>"
>> tokenizer = FileWatch::BufferedTokenizer.new("</Some>")

>> tokenizer.extract(line)
=> ["<Some><Content1>...</Content1>"]
>> tokenizer.extract(line)
=> [" <Some><Content2>...</Content2><Some><Content1>...</Content1>"]

which ends not extracting the 2 lines properly.

kares commented

should also use the opportunity to drop the FileWatch::BufferedTokenizer which is actually part of LS instead of the plugin!
BufferedTokenizer uses a Java API for a Ruby split method. the method writes back-ref $~ which means it needs proper framing, otherwise things might fail or overwrite $~ in unexpected ways. a regexp-less line index-ing would do the job.