Rapporter/rapport

Nested templates: performance issue

Closed this issue · 6 comments

We were aware of the fact that nesting templates in each other might have a small performance issue, but in my last commit I could set up some 3-4 levels of sub-templates, which showed decent differences - although almost the same was run:

> system.time(rapport.html("nortest", ius2008, var = "age"))
Trying to open /tmp/RtmpMXGtSg/R-report7b9eb363.html with xdg-open...
   user  system elapsed 
  0.497   0.020   0.514 

> system.time(rapport.html("descriptives-univar-numeric", ius2008, var = "age"))
Trying to open /tmp/RtmpMXGtSg/R-report54c5ab21.html with xdg-open...
   user  system elapsed 
  1.326   0.050   1.382 

> system.time(rapport.html("descriptives-univar", ius2008, var = "age"))
Trying to open /tmp/RtmpMXGtSg/R-report2dd50213.html with xdg-open...
   user  system elapsed 
  2.730   0.030   2.778 

> system.time(rapport.html("descriptives-multivar.tpl", ius2008, vars = c("age")))
Trying to open /tmp/RtmpMXGtSg/R-report248d48fe.html with xdg-open...
   user  system elapsed 
  5.512   0.114   5.651 

TODO: investigate which lapply loop is the devil one - which I can handle tomorrow.

I have run some profiling, IMHO this scary performance issue is based on print and nothing to do with my lapply loops in rapport.

Compare the result with printing:

> system.time(evals('(runif(100000)^pi)'))
   user  system elapsed 
  0.977   0.010   0.994 
> system.time(eval(parse(text='(runif(100000)^pi)')))
   user  system elapsed 
  0.014   0.000   0.012 

With tests when printing is not done:

> system.time(evals('x <- runif(100000)^pi'))
   user  system elapsed 
  0.017   0.003   0.021 
> system.time(eval(parse(text='x <- runif(100000)^pi')))
   user  system elapsed 
  0.013   0.000   0.013 

Plain demo:

> system.time(evals('1:100000'))
   user  system elapsed 
  0.557   0.000   0.563 
> system.time(eval(parse(text='1:100000')))
   user  system elapsed 
  0.000   0.000   0.001 

Just for reference:

> system.time(rapport("nortest", ius2008, var = "age"))
   user  system elapsed 
  0.430   0.010   0.453 
> system.time(eval(parse(text='rapport("nortest", ius2008, var = "age")')))
   user  system elapsed 
  0.420   0.014   0.435 
> system.time(evaluate('rapport("nortest", ius2008, var = "age")'))
   user  system elapsed 
  0.460   0.003   0.465 
> system.time(evals('rapport("nortest", ius2008, var = "age")'))
   user  system elapsed 
  0.874   0.013   0.900 

Which seems OK. So where is the issue in nested templates if not in print method? :)

We really have some overhead :)

Some test templates I have made:

  • test.tpl
<!--head
Title:        TEST
Author:       TEST
Description:  TEST
var | numeric   | TEST  | TEST
head-->
<%
rp.hist(var)
%>
  • test2.tpl

<!--head
Title:        TEST
Author:       TEST
Description:  TEST
var | numeric   | TEST  | TEST
head-->

<%
rapport('test.tpl', data=rp.data, var=rp.name(var))
%>
  • test3.tpl
<!--head
Title:        TEST
Author:       TEST
Description:  TEST
var | numeric   | TEST  | TEST
head-->

<%
rapport('test2.tpl', data=rp.data, var=rp.name(var))
%>

Results: all commands does quite the same as you may see

> system.time(rp.hist(mtcars$hp))
   user  system elapsed 
  0.007   0.000   0.006 
> system.time(rapport('test.tpl', data=mtcars, var='hp'))
   user  system elapsed 
  0.126   0.010   0.137 
> system.time(rapport('test2.tpl', data=mtcars, var='hp'))
   user  system elapsed 
  0.200   0.010   0.211 
> system.time(rapport('test3.tpl', data=mtcars, var='hp'))
   user  system elapsed 
  0.417   0.000   0.427 

It is quite straightforward that each rapport call doubles the execution time as everything is run twice - so this is not a print, lapply or any other technical problem after all, there is no esoteric phenomena. If I would not placed the command that outputs on the last line of each chunk, than it would triple the times at each rapport call...

The question is if we could modify the code to do not run rapport twice, like not checking for output and skip evaluate, as it would definitely print out something anyway!

I do some test runs tomorrow and will commit soon. Stay tuned :)

Some performance tweaks by last commit:

> system.time(eval(parse(text='(runif(100000)^pi)')))
   user  system elapsed 
  0.010   0.000   0.012 
> system.time(eval.msgs('(runif(100000)^pi)'))
   user  system elapsed 
  0.160   0.000   0.162 
> system.time(evals('(runif(100000)^pi)'))
   user  system elapsed 
  1.243   0.000   1.255 
> system.time(evals('(runif(100000)^pi)', check.output = FALSE))
   user  system elapsed 
  0.154   0.000   0.153 

The last commit had about 10 chars change in rapport based on evals new parameter and strict templates.

Performance boost (compare to the above results!):

> system.time(rp.hist(mtcars$hp))
   user  system elapsed 
  0.004   0.000   0.003 
> system.time(rapport('test.tpl', data=mtcars, var='hp'))
   user  system elapsed 
  0.017   0.000   0.015 
> system.time(rapport('test2.tpl', data=mtcars, var='hp'))
   user  system elapsed 
  0.023   0.000   0.023 
> system.time(rapport('test3.tpl', data=mtcars, var='hp'))
   user  system elapsed 
  0.033   0.000   0.034

Awesome! Great job @daroczig!!! Thanx!