Need function to compare models
Closed this issue · 5 comments
Would be nice to have a function that we can give two models and it outputs a p-value. @BTopcuoglu had something like this in her mBio paper repository. Here's what @courtneyarmour has modified for use with mikropml...
library(tidyverse)
library(mosaic)
# function to calculate pvalues
perm_p_value_cond <- function(data, cond_name_1, cond_name_2){
# subset results to select AUC and condition columns and
# filter to only the two conditions of interest
test_subsample <- data %>%
select(AUC,condition) %>%
filter((condition == cond_name_1 | condition == cond_name_2))
# quantify the absolute value of the difference in mean AUC between the two conditions (observed difference)
obs <- abs(diff(mosaic::mean(AUC ~ condition, data = test_subsample)))
# use mosaic to shuffle condition labels and calculate difference in mean AUC - repeat 10,000 times
auc.null <- do(10000) * diff(mosaic::mean(AUC ~ shuffle(condition), data = test_subsample))
n <- length(auc.null[,1]) #number of shuffled calculations
# r = #replications at least as extreme as observed effect
r <- sum(abs(auc.null[,1]) >= obs) #count number of shuffled differences as big or bigger than observed difference
# compute Monte Carlo p-value with correction (Davison & Hinkley, 1997)
p.value=(r+1)/(n+1)
return(p.value)
}
# example input data
data <- read_csv("./example_data.csv")
# conditions to compare
conditions <- c("conditionA","conditionB","conditionC")
# generate table of all pairwise comparisons
p.table.condition <- NULL
p.table.condition <- expand_grid(x=1:length(conditions),
y=1:length(conditions),
) %>%
filter(x < y) %>%
mutate(condition1 = conditions[x], condition2 = conditions[y]) %>%
select(-x, -y) %>%
group_by(condition1, condition2) %>%
summarize(p_value = perm_p_value_cond(data,condition1,condition2),
.groups="drop")
write_csv(p.table.condition,"./pvalues_by_condition.csv")
perm_p_value_cond
is probably close to what we need. Is this in scope for mikropml?
Great, we should be able to adapt @courtneyarmour's code and incorporate it in the package. It sounds in-scope to me!
Do you want to take a stab at this @courtneyarmour? There are a few things that should probably be changed in this code before incorporation into mikropml:
- If you write a few unit tests with input and expected output, that would make it easier for anyone to work on this.
- It'd be ideal if it didn't add a new dependency (
mosaic
). Inget_feature_importance()
, we implemented a permutation test withfuture.apply
functions, so this could use a similar approach to allow for parallel processing. - Need to use
package::function()
syntax when referring to functions except for those in mikropml or base R. - No magic numbers. There should probably be a parameter called
nperms
with a default value of10000
. - We mostly adhere to the tidyverse styleguide, so:
- Underscores (
_
) are preferred over dots (.
) to separate words in variable names. - Variable names should be descriptive. Only in rare cases are single-letter variable names okay.
- Function names should start with a verb.
- Use tidyverse functions instead of base R when working with data frames.
- Avoid using reserved words like
data
.
- Underscores (
sounds good, I'll give it a go
@kelly-sovacool should this go in a new R script or is there an existing script it can be added to?
@courtneyarmour I would probably put it in a new R script. Maybe compare_models.R
? They're just named & organized topically.