logstash-plugins/logstash-filter-grok

Regression on grok is case of match failure

andsel opened this issue · 4 comments

Logstash information:

Please include the following information:

  1. Logstash version (e.g. bin/logstash --version) 8.4.0+
  2. Logstash installation source (e.g. built from source, with a package manager: DEB/RPM, expanded from tar or zip archive, docker) tar
  3. How is Logstash being run (e.g. as a service/service manager: systemd, upstart, etc. Via command line, docker/kubernetes) command line
  4. How was the Logstash Plugin installed : already present in the distribution

JVM (e.g. java -version):

If the affected version of Logstash is 7.9 (or earlier), or if it is NOT using the bundled JDK or using the 'no-jdk' version in 7.10 (or higher), please provide the following information:

  1. JVM version (java -version)
  2. JVM installation source (e.g. from the Operating System's package manager, from source, etc).
  3. Value of the JAVA_HOME environment variable if set.

OS version (uname -a if on a Unix-like system):

Description of the problem including expected versus actual behavior:

Steps to reproduce:

  1. Copy the following script into the Logstash installation root folder
reproducer
# encoding: utf-8

# Script to test file input plugin in Logstash environment
# 
# To run, copy script to Logstash folder and run:
# 
# bin/ruby grok_regression_reproducer.rb

################### Setup Logstash and Log4j ##################
require_relative "lib/bootstrap/environment"
LogStash::Bundler.setup!({:without => [:build, :development]})
require "logstash-core"
require "logstash/environment"
require "securerandom"
require "grok-pure"

log4j_test_path = "/tmp/log4j_test_log"
java.lang.System.setProperty("ls.logs", log4j_test_path)
java.lang.System.setProperty("ls.log.format", "plain")
java.lang.System.setProperty("ls.log.level", "info")
java.lang.System.setProperty("ls.pipeline.separate_logs", "false")
LogStash::Logging::Logger::reconfigure(URI.encode(::File.join(LogStash::Environment::LOGSTASH_HOME, "config", "log4j2.properties")))
include LogStash::Util::Loggable

logger.info(">>> Grok setup")

vendored_gems_path = "/path_to_logstash_root/vendor/bundle/jruby/2.6.0/gems" # with Logstash >= 8.4
#vendored_gems_path = "/path_to_logstash_root/vendor/bundle/jruby/2.5.0/gems" # with Logstash 8.3
logstash_patterns_core = "logstash-patterns-core-4.3.4"

pattern = "%{GREEDYDATA}(Cannot|Failed) login (?<user>\S+) %{GREEDYDATA}"
grok = Grok.new
grok.logger = logger
grok.add_patterns_from_file("#{vendored_gems_path}/#{logstash_patterns_core}/patterns/ecs-v1/grok-patterns")
grok.compile(pattern, true)

logger.info(">>> Grok instantiated and start processing")

event = LogStash::Event.new()
rand_str = SecureRandom.alphanumeric(5000)
msg = "foo bar blubb cannot login i123456 #{rand_str}"
event.set("message", msg)
#10000.times do
100.times do
grok.execute(msg)
end

logger.info(">>> Grok terminated")
  1. run using the Logstash bundled Ruby bin/ruby grok_regression_reproducer.rb
  2. In Logstash 8.3 it takes some seconds for 100 iterations while a minute on Logstash 8.4.0+. With Logstash 8.3 JRuby is 9.2.20 while with 8.4.0 is 9.3.6.
    In both versions logstash-filter-grok and jls-grok remains the same, 4.4.2 and 0.11.5 respectively.

Provide logs (if relevant):

Logstash 8.3

[2022-11-22T15:57:50,368][INFO ][object                   ] >>> Grok setup
[2022-11-22T15:57:50,380][INFO ][object                   ] >>> Grok instantiated and start processing
[2022-11-22T15:57:50,473][INFO ][object                   ] >>> Grok terminated

Logstash 8.4

[2022-11-23T14:30:45,675][INFO ][object                   ] >>> Grok setup
[2022-11-23T14:30:45,788][INFO ][object                   ] >>> Grok instantiated and start processing
[2022-11-23T14:32:22,792][INFO ][object                   ] >>> Grok terminated

Would really be interested on what is causing this.

It's reproducible in plain JRuby code:

# encoding: utf-8

# 
# bundle install && bundle exec ruby grok_regression_reproducer.rb
#
# with JRuby 9.2.20 very fast but with 9.3.4+ is slow

require "securerandom"

puts ">>> Grok setup, using JRuby #{JRUBY_VERSION}"

regexp = Regexp.new("(?:.*)(Cannot|Failed) login (?<user>S+) (?:.*)", Regexp::MULTILINE)

puts ">>> Grok instantiated and start processing"
start_time = Time.now

rand_str = SecureRandom.alphanumeric(5000)
msg = "foo bar blubb cannot login i123456 #{rand_str}"
#10000.times do
100.times do
  regexp.match(msg)
end
stop_time = Time.now

execution_time = stop_time - start_time
puts ">>> Grok terminated, executes in #{execution_time} sec"

Console output:

andrea@kalimera: grok_regression_test (main) $ ruby regexp_regression_reproducer.rb
>>> Grok setup, using JRuby 9.2.20.1
>>> Grok instantiated and start processing
>>> Grok terminated, executes in 0.19011 sec
andrea@kalimera: grok_regression_test (main) $ rvm use jruby-9.3.4.0
Using /usr/share/rvm/gems/jruby-9.3.4.0
andrea@kalimera: grok_regression_test (main) $ ruby regexp_regression_reproducer.rb
>>> Grok setup, using JRuby 9.3.4.0
>>> Grok instantiated and start processing
>>> Grok terminated, executes in 51.010906000000006 sec

Create the issue jruby/jruby#7484 in JRuby repository

andsel commented

Regression will be fixed with JRuby 9.3.10.0