tpq/propr

`updateCutoffs.propr` speedup

Closed this issue · 2 comments

Hey Thom...I've got another nice speedup of updateCutoffs.propr. It's simple...in fact, I probably should've seen it years ago.

You just need to set p = 0 in the calls to propr that get called for each permutation inside the updateCutoffs.propr function.

I did some profiling, saw that the resampling that happens inside propr was taking the most time during the updateCutoffs function, then doubled checked the code to ensure that the propr permutations are not necessary in this context, and then switched it to generate no permutations from within updateCutoffs.

Here's the patch

$ git diff R/2-proprCall.R
diff --git a/R/2-proprCall.R b/R/2-proprCall.R
index 242b6a2..145b769 100755
--- a/R/2-proprCall.R
+++ b/R/2-proprCall.R
@@ -169,7 +169,7 @@ updateCutoffs.propr <- function(object, cutoff, ncores){
   getFdrRandcounts <- function(ct.k){
 
     pr.k <- suppressMessages(
-      propr::propr(ct.k, object@metric, ivar = object@ivar, alpha = object@alpha))
+      propr::propr(ct.k, object@metric, ivar = object@ivar, alpha = object@alpha, p = 0))
 
     # Vector of propr scores for each pair of taxa.
     pkt <- pr.k@results$propr
@@ -237,7 +237,7 @@ updateCutoffs.propr <- function(object, cutoff, ncores){
       # Calculate propr exactly based on @metric, @ivar, and @alpha
       ct.k <- object@permutes[[k]]
       pr.k <- suppressMessages(
-        propr(ct.k, object@metric, ivar = object@ivar, alpha = object@alpha))
+        propr(ct.k, object@metric, ivar = object@ivar, alpha = object@alpha, p = 0))
       pkt <- pr.k@results$propr
 
       # Find number of permuted theta less than cutoff

If you want, I could open a pull request, or, if it is more convenient for you, here it is as a patch file: 2-proprCall.R.patch.txt

Timing

The data is 25 samples and varying number of taxa, ran with 1, 2, and 4 cores.

A simple change, but with nice results...

original

with_speed_up

Here's the benchmark code to generate the figures if you'd like to reproduce. (txt extension is so it can upload to GitHub)

bench.R.txt

Oh, and the test suite still passes after the patch (on my local machine at least).

tpq commented

Ahhhh brilliant!! Yes, this makes sense to me. The reason behind storing the index in the @permutes slot is so you only have to call the random seed once & still get the same result from updateCutoffs() thereafter. I cannot think of any reason why you would need to fill the @permutes slot from within updateCutoffs(). Great stuff! I will accept the pull request if you can submit. Also please add yourself as a package contributor in the DESCRIPTION so you list it on your CV :)