Severe performance regression
Closed this issue · 3 comments
For some very rough benchmarking purposes I've just tried cmark and md2html from MD4C project to process the benchinput.md
as produced by cmark's make bench
.
On my windows machine, cmark (current head) seems to take horrendous 40+ secs, while MD4C is below 1 sec still with a safe margin. cmark used to be about 2 times slower than MD4C so the current factor of 60 is quite significant.
/src/cmark @master $ time -p /prj/md4c/build32-release/md2html/md2html.exe ./bench/benchinput.md >/dev/null
real 0.73
user 0.00
sys 0.01
/src/cmark @master $ time -p cmark ./bench/benchinput.md >/dev/null
real 44.02
user 0.01
sys 0.00
/src/cmark @master $
P.S. Don't mind the zeroes for sys/user. Likely because it's the MSYS2 environment on Windows.
P.P.S. I haven't used cmark for year or two so the regression may be there for quite some time already.
Sorry for the noise, it's caused by something in the MSYS2 environment. When run outside of it, such a slowdown doesn't occur.
(Reopening due the following findings. But feel free to close if you see it as irrelevant.)
I was interested in what's behind the heavy slowdown under MSYS2. Turns out there's exactly one single culprit, namely calling printf()
on this line:
Line 72 in 76cbc2d
What's even crazier, it seems that under hood the printf()
(from C:\WINDOWS\SysWOW64\msvcrt.dll
which is default target runtime lib for mingw64 toolchain) internally calls putc()
for every single character. Depending on details how the other side of the pipe (shell) reads from the stdout, it may also explain the difference between bash in MSYS2 and CMD.COM.
Replacing the printf()
with fwrite()
makes the 40+ secs down to ~1.4 sec.
Interesting. This seems a perfectly good reason to use fwrite
here.