onetrueawk/awk

Excessive memory usage regression

Closed this issue · 2 comments

Hi,

I upgraded from OpenBSD 7.4 to OpenBSD 7.5 which uses onetrueawk and noticed a regression.
I have a script which uses awk to process mail and httpd logs and noticed excessive memory usage compared to the previous version.

From some bisecting I found this:

The commit 9e254e5 from 16 nov 2023 seems to be OK.
The commit 345f907 from 20 nov 2023 and onwards seems to have this behaviour.

Below is a small simplified test script to (hopefully) reproduce the behaviour on your machine also.

Script:

AWK="$HOME/tmp/onetrueawk/awk/a.out"

# generate test input (~138MB), simulates a log file.
generate() {
	i=0
	while :; do
		echo '*.codemadness.org 127.0.0.1 - - [14/Apr/2024:00:00:43 +0200] "GET / HTTP/1.1" 200 6 "" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"'
		i=$((i + 1))
		test "$i" = "696969" && break
	done > input.txt
}

run() {
	LC_ALL=C $AWK '/"POST /' < input.txt
}

generate
run

Thank you,

hi hiltjo, thanks for spotting this.
I tracked the issue to a change we made to regular expression engine to deal with a pathological case where gototab was blowing up. we now resize that table when needed. in some cases (eg. /"POST / but not /POST /) this is causing the excessive memory use and slowdown you observe. I know both examples run without any issues in earlier versions. I will look into this.

fixed, thanks Arnold.