Regression on grok is case of match failure
andsel opened this issue · 4 comments
andsel commented
Logstash information:
Please include the following information:
- Logstash version (e.g.
bin/logstash --version
)8.4.0+
- Logstash installation source (e.g. built from source, with a package manager: DEB/RPM, expanded from tar or zip archive, docker)
tar
- How is Logstash being run (e.g. as a service/service manager: systemd, upstart, etc. Via command line, docker/kubernetes)
command line
- How was the Logstash Plugin installed : already present in the distribution
JVM (e.g. java -version
):
If the affected version of Logstash is 7.9 (or earlier), or if it is NOT using the bundled JDK or using the 'no-jdk' version in 7.10 (or higher), please provide the following information:
- JVM version (
java -version
) - JVM installation source (e.g. from the Operating System's package manager, from source, etc).
- Value of the
JAVA_HOME
environment variable if set.
OS version (uname -a
if on a Unix-like system):
Description of the problem including expected versus actual behavior:
Steps to reproduce:
- Copy the following script into the Logstash installation root folder
reproducer
# encoding: utf-8
# Script to test file input plugin in Logstash environment
#
# To run, copy script to Logstash folder and run:
#
# bin/ruby grok_regression_reproducer.rb
################### Setup Logstash and Log4j ##################
require_relative "lib/bootstrap/environment"
LogStash::Bundler.setup!({:without => [:build, :development]})
require "logstash-core"
require "logstash/environment"
require "securerandom"
require "grok-pure"
log4j_test_path = "/tmp/log4j_test_log"
java.lang.System.setProperty("ls.logs", log4j_test_path)
java.lang.System.setProperty("ls.log.format", "plain")
java.lang.System.setProperty("ls.log.level", "info")
java.lang.System.setProperty("ls.pipeline.separate_logs", "false")
LogStash::Logging::Logger::reconfigure(URI.encode(::File.join(LogStash::Environment::LOGSTASH_HOME, "config", "log4j2.properties")))
include LogStash::Util::Loggable
logger.info(">>> Grok setup")
vendored_gems_path = "/path_to_logstash_root/vendor/bundle/jruby/2.6.0/gems" # with Logstash >= 8.4
#vendored_gems_path = "/path_to_logstash_root/vendor/bundle/jruby/2.5.0/gems" # with Logstash 8.3
logstash_patterns_core = "logstash-patterns-core-4.3.4"
pattern = "%{GREEDYDATA}(Cannot|Failed) login (?<user>\S+) %{GREEDYDATA}"
grok = Grok.new
grok.logger = logger
grok.add_patterns_from_file("#{vendored_gems_path}/#{logstash_patterns_core}/patterns/ecs-v1/grok-patterns")
grok.compile(pattern, true)
logger.info(">>> Grok instantiated and start processing")
event = LogStash::Event.new()
rand_str = SecureRandom.alphanumeric(5000)
msg = "foo bar blubb cannot login i123456 #{rand_str}"
event.set("message", msg)
#10000.times do
100.times do
grok.execute(msg)
end
logger.info(">>> Grok terminated")
- run using the Logstash bundled Ruby
bin/ruby grok_regression_reproducer.rb
- In Logstash
8.3
it takes some seconds for 100 iterations while a minute on Logstash8.4.0+
. With Logstash8.3
JRuby is9.2.20
while with8.4.0
is9.3.6
.
In both versionslogstash-filter-grok
andjls-grok
remains the same,4.4.2
and0.11.5
respectively.
Provide logs (if relevant):
Logstash 8.3
[2022-11-22T15:57:50,368][INFO ][object ] >>> Grok setup
[2022-11-22T15:57:50,380][INFO ][object ] >>> Grok instantiated and start processing
[2022-11-22T15:57:50,473][INFO ][object ] >>> Grok terminated
Logstash 8.4
[2022-11-23T14:30:45,675][INFO ][object ] >>> Grok setup
[2022-11-23T14:30:45,788][INFO ][object ] >>> Grok instantiated and start processing
[2022-11-23T14:32:22,792][INFO ][object ] >>> Grok terminated
rpasche commented
Would really be interested on what is causing this.
andsel commented
It's reproducible in plain JRuby code:
# encoding: utf-8
#
# bundle install && bundle exec ruby grok_regression_reproducer.rb
#
# with JRuby 9.2.20 very fast but with 9.3.4+ is slow
require "securerandom"
puts ">>> Grok setup, using JRuby #{JRUBY_VERSION}"
regexp = Regexp.new("(?:.*)(Cannot|Failed) login (?<user>S+) (?:.*)", Regexp::MULTILINE)
puts ">>> Grok instantiated and start processing"
start_time = Time.now
rand_str = SecureRandom.alphanumeric(5000)
msg = "foo bar blubb cannot login i123456 #{rand_str}"
#10000.times do
100.times do
regexp.match(msg)
end
stop_time = Time.now
execution_time = stop_time - start_time
puts ">>> Grok terminated, executes in #{execution_time} sec"
Console output:
andrea@kalimera: grok_regression_test (main) $ ruby regexp_regression_reproducer.rb
>>> Grok setup, using JRuby 9.2.20.1
>>> Grok instantiated and start processing
>>> Grok terminated, executes in 0.19011 sec
andrea@kalimera: grok_regression_test (main) $ rvm use jruby-9.3.4.0
Using /usr/share/rvm/gems/jruby-9.3.4.0
andrea@kalimera: grok_regression_test (main) $ ruby regexp_regression_reproducer.rb
>>> Grok setup, using JRuby 9.3.4.0
>>> Grok instantiated and start processing
>>> Grok terminated, executes in 51.010906000000006 sec
andsel commented
Create the issue jruby/jruby#7484 in JRuby repository
andsel commented
Regression will be fixed with JRuby 9.3.10.0