Memory usage compared to gcc compiler
mingodad opened this issue · 13 comments
Looking for a ready to use C
peg grammar I found this one https://github.com/pointlander/peg/blob/master/grammars/c/c.peg and needed to make some changes (see attached) to build and parse the generated C
file using packcc
then I did a comparison of time and memory usage against gcc
compiling it and got the result shown bellow, the generated parser compiled with -O2
uses 12x more memory than gcc
compiling without optimization and 6.6x more when compiling with -O2
.
packcc -l -o c99-mouse c99-mouse.peg
gcc -E c99-mouse.c > c99-mouse.pp.c
/usr/bin/time gcc -g -o c99-mouse c99-mouse.c
0.55user 0.02system 0:00.58elapsed 100%CPU (0avgtext+0avgdata 60484maxresident)k
0inputs+568outputs (0major+16781minor)pagefaults 0swaps
/usr/bin/time gcc -g -O2 -o c99-mouse c99-mouse.c
2.79user 0.07system 0:02.86elapsed 99%CPU (0avgtext+0avgdata 110656maxresident)k
0inputs+1304outputs (0major+36607minor)pagefaults 0swaps
/usr/bin/time ./c99-mouse c99-mouse.pp.c
1.16user 0.21system 0:01.38elapsed 99%CPU (0avgtext+0avgdata 740176maxresident)k
0inputs+0outputs (0major+184749minor)pagefaults 0swaps
And for comparison with https://github.com/edubart/lpegrex/blob/main/parsers/c11.lua that outputs an AST
, the one generated with packcc
only checks the syntax and uses 7.7x memory and 2x the time.
/usr/bin/time lua examples/c11-ast.lua c99-mouse.pp.c > /dev/null
0.46user 0.04system 0:00.50elapsed 99%CPU (0avgtext+0avgdata 96116maxresident)k
0inputs+0outputs (0major+25617minor)pagefaults 0swaps
And here is the same using https://github.com/soasme/PeppaPEG with a slightly modified grammar (attached),
notice that PeppaPEG
also is generating a parse tree
, it uses as much memory as Lua
but is slower.
/usr/bin/time ./cli parse -G c99-mouse.peg -e TranslationUnit c99-mouse.pp.c > /dev/null
4.69user 0.02system 0:04.71elapsed 100%CPU (0avgtext+0avgdata 97920maxresident)k
0inputs+0outputs (0major+24145minor)pagefaults 0swaps
I did some measurements on how/when the memory is allocated (see attached) and here a sample of the summaries:
Header (FULL)
line of call on c99-mouse.c | total malloced + total reallocated | total malloced | total reallocated
Allocation by calls
330 2096896 0 2096896
466 87390512 87390512 0
502 17548048 0 17548048
525 269296480 269296480 0
614 215437184 215437184 0
695 143300672 0 143300672
718 22290240 22290240 0
749 16776976 0 16776976
818 80788944 80788944 0
842 2032 0 2032
857 152 152 0
Header (FIRST 50 lines)
line on parsed file c99-mouse.pp.c | total malloced + total reallocated | total malloced | total reallocated
Allocation by line
0 0 0 0
1 1656 1112 544
2 1512 1112 400
3 1784 1112 672
4 1304 1112 192
5 2264 1112 1152
6 1368 1112 256
7 3288 1112 2176
8 672 608 64
9 336 304 32
10 336 304 32
11 592 304 288
12 336 304 32
13 336 304 32
14 904 808 96
15 1240 1112 128
16 1752 1112 640
17 5336 1112 4224
18 1240 1112 128
19 1240 1112 128
20 1240 1112 128
21 2776 1112 1664
22 9432 1112 8320
23 1240 1112 128
24 1240 1112 128
25 1240 1112 128
26 1240 1112 128
27 1240 1112 128
28 1240 1112 128
29 1240 1112 128
30 1240 1112 128
31 3288 1112 2176
32 17624 1112 16512
33 1240 1112 128
34 1240 1112 128
35 672 608 64
36 336 304 32
37 336 304 32
38 1360 304 1056
39 336 304 32
40 904 808 96
41 1240 1112 128
42 672 608 64
43 904 808 96
44 20456 16072 4384
45 31944 26424 5520
46 704 608 96
47 904 808 96
48 1304 1112 192
49 1240 1112 128
50 1368 1112 256
...
And peg
from Piumatra (https://github.com/gpakosz/peg.git) is the winner (see attached):
./peg c99-mouse.peg > c99-mouse-peg.c
gcc -O2 -o c99-mouse c99-mouse.c
/usr/bin/time cat c99-mouse.pp.c | ./c99-mouse
0.00user 0.00system 0:00.11elapsed 4%CPU (0avgtext+0avgdata 2196maxresident)k
0inputs+0outputs (0major+113minor)pagefaults 0swaps
yy 1
It is not very fair to compare the result of generic, multi-purpose packrat parser-generator to hand-written, single-purpose parser, that is optimized by some of the worlds best developers over last 35 years...
I don't see any issue here, packcc never claimed to be memory efficient. The main goal is to be easy to work with and generate easily understandable code.
Thanks for your interesting report.
PackCC requires more memory to support left-recursive grammars, which cannot be handled by most PEG parser generators.
By the way,
/usr/bin/time cat c99-mouse.pp.c | ./c99-mouse
should be
cat c99-mouse.pp.c | /usr/bin/time ./c99-mouse
Thank you for pointing out my mistake, here is again the output for the peg from Piumarta:
/usr/bin/time ./c99-mouse c99-mouse.pp.c
yy 1 : 15790
0.27user 0.00system 0:00.27elapsed 100%CPU (0avgtext+0avgdata 2228maxresident)k
0inputs+0outputs (0major+217minor)pagefaults 0swaps
And here is the attached files used to produce the output shown above:
piumatra-peg-0.18.zip
I'm making some changes to peg/leg here https://github.com/mingodad/peg
Also c99-mouse.peg
doesn't use left recursion so probably some savings can be achieved for such cases because packcc
already know that left recursion
machinery is not necessary.
That makes sense. Left-recursive grammars can be detected before generating parsers.
To tell the truth, I'm not so interested in a parser generator without left-recursive grammar support, but I add it to my todo list because of repeated requests.
There was a similar comment: #46 (comment) .
And here is the test with https://github.com/yhirose/cpp-peglib tool peglint
() with and without producing an AST (parser tree):
/usr/bin/time peglint c99-mouse2.peg c99-mouse.peg.pp.c
1.35user 0.00system 0:01.35elapsed 99%CPU (0avgtext+0avgdata 4588maxresident)k
0inputs+0outputs (0major+359minor)pagefaults 0swaps
mingo@mingo-X550VX:~/dev/c/A_grammars/cpp-peglib/example$ /usr/bin/time peglint --packrat c99-mouse2.peg c99-mouse.peg.pp.c
0.35user 0.01system 0:00.37elapsed 99%CPU (0avgtext+0avgdata 40580maxresident)k
0inputs+0outputs (0major+9363minor)pagefaults 0swaps
mingo@mingo-X550VX:~/dev/c/A_grammars/cpp-peglib/example$ /usr/bin/time peglint --ast c99-mouse2.peg c99-mouse.peg.pp.c > /dev/null
2.33user 0.05system 0:02.39elapsed 99%CPU (0avgtext+0avgdata 133792maxresident)k
0inputs+0outputs (0major+36504minor)pagefaults 0swaps
mingo@mingo-X550VX:~/dev/c/A_grammars/cpp-peglib/example$ /usr/bin/time peglint --packrat --ast c99-mouse2.peg c99-mouse.peg.pp.c > /dev/null
0.74user 0.05system 0:00.80elapsed 100%CPU (0avgtext+0avgdata 182072maxresident)k
0inputs+0outputs (0major+48993minor)pagefaults 0swaps
Hello @mingodad
I have stumbled into this issue after a long time and realized that you might be having same problem as I had some time ago. I was using packcc to generate Kotlin parser, but it was slow and took a lot of memory. This have led me to start working on pegof , which can optimize the grammar before it is compiled.
I have just tried with the grammar from your original post and here is the result:
lines | bytes | rules | terms | duration | memory | |
---|---|---|---|---|---|---|
original | 10091 | 427671 | 195 | 865 | 9836 | 578432 |
optimized by pegof | 8639 | 370298 | 35 | 909 | 3126 | 198656 |
difference in percent | 85% | 86% | 17% | 105% | 31% | 34% |
You'll be probably most interested in the last two columns. Duration here is how long it took to process c99-mouse.pp.c
10 times, expressed in milliseconds. Memory is peak resident memory (as reported by GNU time) during those ten runs.
I'm not sure if you're still interested in this after two years (sorry for disturbing you if that is the case), but it might be useful to you in some future project or to anyone who finds this issue when looking for solution to some similar problem.
Thank you !
Even after two years it's still good news !
I'm looking at it now.