lcolladotor/derfinder

Speed up p-value calculation

lcolladotor opened this issue · 4 comments

It would be useful to speed up the p-value calculation step.

One quick solution would be to round the areas a bit, then find the p-values for the unique areas and then assign them to each candidate DER.

> length(fullRegions$area)
[1] 689543
## Not many unique areas
> length(unique(fullRegions$area))
[1] 689535
## By rounding to 2 digits the areas greatly simplify
> length(unique(round(fullRegions$area, 2)))
[1] 151811

I seemed to have much faster code for this now. I'll make a couple tests before pushing the new code to the repo

ok great. would be interested to make sure we have some tests to make sure
the p-values don't change much.

On Tue, Jun 9, 2015 at 12:21 PM Leonardo Collado-Torres <
notifications@github.com> wrote:

I seemed to have much faster code for this now. I'll make a couple tests
before pushing the new code to the repo


Reply to this email directly or view it on GitHub
#29 (comment)
.

  Absolute difference (secs) Absolute difference (mins)
1                    32520.7                   542.0117
  Absolute difference (hours) original / new new / original
1                    9.033528       25055.47   3.991144e-05

Tested with Brainspan data, got exact same results in a few seconds compared to the 9 hours from before. The old version is 25k times slower than the new version with this data. I suspect that new version barely increases in run time as the number of values (both observed and null) increases. There are details about the speed of findInterval() in its help page.

In conclusion, this new function is great and I'll add it soon. Probably worth pushing to release branch, or could be a motivation to have users switch to devel. Although, commits to release branch are only supposed to be bug fixes.

See derMisc repo for details.

wow that is awesome!

On Thu, Jun 11, 2015 at 4:26 PM Leonardo Collado-Torres <
notifications@github.com> wrote:

Absolute difference (secs) Absolute difference (mins)1 32520.7 542.0117
Absolute difference (hours) original / new new / original1 9.033528 25055.47 3.991144e-05

Tested with Brainspan data, got exact same results in a few seconds
compared to the 9 hours from before. The old version is 25k times slower
than the new version with this data. I suspect that new version barely
increases in run time as the number of values (both observed and null)
increases. There are details about the speed of findInterval() in its
help page.

In conclusion, this new function is great and I'll add it soon. Probably
worth pushing to release branch, or could be a motivation to have users
switch to devel. Although, commits to release branch are only supposed to
be bug fixes.

See derMisc repo for details.


Reply to this email directly or view it on GitHub
#29 (comment)
.