Nested templates: performance issue
Closed this issue · 6 comments
We were aware of the fact that nesting templates in each other might have a small performance issue, but in my last commit I could set up some 3-4 levels of sub-templates, which showed decent differences - although almost the same was run:
> system.time(rapport.html("nortest", ius2008, var = "age"))
Trying to open /tmp/RtmpMXGtSg/R-report7b9eb363.html with xdg-open...
user system elapsed
0.497 0.020 0.514
> system.time(rapport.html("descriptives-univar-numeric", ius2008, var = "age"))
Trying to open /tmp/RtmpMXGtSg/R-report54c5ab21.html with xdg-open...
user system elapsed
1.326 0.050 1.382
> system.time(rapport.html("descriptives-univar", ius2008, var = "age"))
Trying to open /tmp/RtmpMXGtSg/R-report2dd50213.html with xdg-open...
user system elapsed
2.730 0.030 2.778
> system.time(rapport.html("descriptives-multivar.tpl", ius2008, vars = c("age")))
Trying to open /tmp/RtmpMXGtSg/R-report248d48fe.html with xdg-open...
user system elapsed
5.512 0.114 5.651
TODO: investigate which lapply
loop is the devil one - which I can handle tomorrow.
I have run some profiling, IMHO this scary performance issue is based on print
and nothing to do with my lapply
loops in rapport
.
Compare the result with printing:
> system.time(evals('(runif(100000)^pi)'))
user system elapsed
0.977 0.010 0.994
> system.time(eval(parse(text='(runif(100000)^pi)')))
user system elapsed
0.014 0.000 0.012
With tests when printing is not done:
> system.time(evals('x <- runif(100000)^pi'))
user system elapsed
0.017 0.003 0.021
> system.time(eval(parse(text='x <- runif(100000)^pi')))
user system elapsed
0.013 0.000 0.013
Plain demo:
> system.time(evals('1:100000'))
user system elapsed
0.557 0.000 0.563
> system.time(eval(parse(text='1:100000')))
user system elapsed
0.000 0.000 0.001
Just for reference:
> system.time(rapport("nortest", ius2008, var = "age"))
user system elapsed
0.430 0.010 0.453
> system.time(eval(parse(text='rapport("nortest", ius2008, var = "age")')))
user system elapsed
0.420 0.014 0.435
> system.time(evaluate('rapport("nortest", ius2008, var = "age")'))
user system elapsed
0.460 0.003 0.465
> system.time(evals('rapport("nortest", ius2008, var = "age")'))
user system elapsed
0.874 0.013 0.900
Which seems OK. So where is the issue in nested templates if not in print method? :)
We really have some overhead :)
Some test templates I have made:
- test.tpl
<!--head
Title: TEST
Author: TEST
Description: TEST
var | numeric | TEST | TEST
head-->
<%
rp.hist(var)
%>
- test2.tpl
<!--head
Title: TEST
Author: TEST
Description: TEST
var | numeric | TEST | TEST
head-->
<%
rapport('test.tpl', data=rp.data, var=rp.name(var))
%>
- test3.tpl
<!--head
Title: TEST
Author: TEST
Description: TEST
var | numeric | TEST | TEST
head-->
<%
rapport('test2.tpl', data=rp.data, var=rp.name(var))
%>
Results: all commands does quite the same as you may see
> system.time(rp.hist(mtcars$hp))
user system elapsed
0.007 0.000 0.006
> system.time(rapport('test.tpl', data=mtcars, var='hp'))
user system elapsed
0.126 0.010 0.137
> system.time(rapport('test2.tpl', data=mtcars, var='hp'))
user system elapsed
0.200 0.010 0.211
> system.time(rapport('test3.tpl', data=mtcars, var='hp'))
user system elapsed
0.417 0.000 0.427
It is quite straightforward that each rapport
call doubles the execution time as everything is run twice - so this is not a print
, lapply
or any other technical problem after all, there is no esoteric phenomena. If I would not placed the command that outputs on the last line of each chunk, than it would triple the times at each rapport
call...
The question is if we could modify the code to do not run rapport
twice, like not checking for output and skip evaluate
, as it would definitely print out something anyway!
I do some test runs tomorrow and will commit soon. Stay tuned :)
Some performance tweaks by last commit:
> system.time(eval(parse(text='(runif(100000)^pi)')))
user system elapsed
0.010 0.000 0.012
> system.time(eval.msgs('(runif(100000)^pi)'))
user system elapsed
0.160 0.000 0.162
> system.time(evals('(runif(100000)^pi)'))
user system elapsed
1.243 0.000 1.255
> system.time(evals('(runif(100000)^pi)', check.output = FALSE))
user system elapsed
0.154 0.000 0.153
The last commit had about 10 chars change in rapport
based on evals
new parameter and strict
templates.
Performance boost (compare to the above results!):
> system.time(rp.hist(mtcars$hp))
user system elapsed
0.004 0.000 0.003
> system.time(rapport('test.tpl', data=mtcars, var='hp'))
user system elapsed
0.017 0.000 0.015
> system.time(rapport('test2.tpl', data=mtcars, var='hp'))
user system elapsed
0.023 0.000 0.023
> system.time(rapport('test3.tpl', data=mtcars, var='hp'))
user system elapsed
0.033 0.000 0.034