Excessive memory usage regression
Closed this issue · 2 comments
Hi,
I upgraded from OpenBSD 7.4 to OpenBSD 7.5 which uses onetrueawk and noticed a regression.
I have a script which uses awk to process mail and httpd logs and noticed excessive memory usage compared to the previous version.
From some bisecting I found this:
The commit 9e254e5 from 16 nov 2023 seems to be OK.
The commit 345f907 from 20 nov 2023 and onwards seems to have this behaviour.
Below is a small simplified test script to (hopefully) reproduce the behaviour on your machine also.
Script:
AWK="$HOME/tmp/onetrueawk/awk/a.out"
# generate test input (~138MB), simulates a log file.
generate() {
i=0
while :; do
echo '*.codemadness.org 127.0.0.1 - - [14/Apr/2024:00:00:43 +0200] "GET / HTTP/1.1" 200 6 "" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"'
i=$((i + 1))
test "$i" = "696969" && break
done > input.txt
}
run() {
LC_ALL=C $AWK '/"POST /' < input.txt
}
generate
run
Thank you,
hi hiltjo, thanks for spotting this.
I tracked the issue to a change we made to regular expression engine to deal with a pathological case where gototab was blowing up. we now resize that table when needed. in some cases (eg. /"POST / but not /POST /) this is causing the excessive memory use and slowdown you observe. I know both examples run without any issues in earlier versions. I will look into this.
fixed, thanks Arnold.