commonmark/cmark

Severe performance regression

Closed this issue · 3 comments

mity commented

For some very rough benchmarking purposes I've just tried cmark and md2html from MD4C project to process the benchinput.md as produced by cmark's make bench.

On my windows machine, cmark (current head) seems to take horrendous 40+ secs, while MD4C is below 1 sec still with a safe margin. cmark used to be about 2 times slower than MD4C so the current factor of 60 is quite significant.

/src/cmark @master $ time -p /prj/md4c/build32-release/md2html/md2html.exe ./bench/benchinput.md >/dev/null
real 0.73
user 0.00
sys 0.01
/src/cmark @master $ time -p cmark ./bench/benchinput.md >/dev/null
real 44.02
user 0.01
sys 0.00
/src/cmark @master $

P.S. Don't mind the zeroes for sys/user. Likely because it's the MSYS2 environment on Windows.

P.P.S. I haven't used cmark for year or two so the regression may be there for quite some time already.

mity commented

Sorry for the noise, it's caused by something in the MSYS2 environment. When run outside of it, such a slowdown doesn't occur.

mity commented

(Reopening due the following findings. But feel free to close if you see it as irrelevant.)

I was interested in what's behind the heavy slowdown under MSYS2. Turns out there's exactly one single culprit, namely calling printf() on this line:

printf("%s", result);

What's even crazier, it seems that under hood the printf() (from C:\WINDOWS\SysWOW64\msvcrt.dll which is default target runtime lib for mingw64 toolchain) internally calls putc() for every single character. Depending on details how the other side of the pipe (shell) reads from the stdout, it may also explain the difference between bash in MSYS2 and CMD.COM.

Replacing the printf() with fwrite() makes the 40+ secs down to ~1.4 sec.

jgm commented

Interesting. This seems a perfectly good reason to use fwrite here.