gadget-framework/mfdb

mfdb_sample_count not return bootstrapped samples

Closed this issue · 3 comments

I am attempting to set up a bootstrapping gadget model; however, I cannot seem to export samples from bootstrapped areas using mfdb_sample_count. I am following code already set up here. Below is a minimum reproducible example that gives the same effect.

Basically I am setting up n resampled areacells in a list, and then trying to export using mfdb_sample_count, which should return a list with n data.frames. However, it only returns 1. Am I thinking about this correctly or not?

Thanks for any help.

Minimum Reproducible Example

library(mfdb)

# setup a simple database and populate with data
mdb <- mfdb('zebrafish')

year <- 1:10
month <- 1
area <- 1:10

base_data <- expand.grid(year = year, 
                         month = month, 
                         areacell = area)
count <- sample(1:1000, 100, replace = T)
data <- cbind(base_data, count)
mfdb_import_area(mdb, data.frame(name = 1:10))

# solved: must also import division for bootstrapping to work
# these two lines were not included originally
divList <- as.list(1:10); names(divList) <- 1:10
mfdb_import_division(mdb, divList)

mfdb_import_survey(mdb, data, data_source = 'test')

# attempt to retrieve samples
defaults <- list(
    areacell = mfdb_group(`1` = 1:10),
    timestep = mfdb_timestep_quarterly,
    year = 1:10
)
n <- 10 # number of bootstrap samples to retrieve
export_test <- mfdb_sample_count(mdb, 
                                 cols=NULL, 
                                 params = c(list(), defaults))

defaults <- within(defaults,
                   {areacell = mfdb_bootstrap_group(n, 
                                                   defaults$areacell,
                                                   seed = 270)})

bootstrap_test <- mfdb_sample_count(mdb,
                                    cols=NULL,
                                    params = c(list(), defaults))

length(export_test) == length(bootstrap_test)
# > TRUE

mfdb_disconnect(mdb)
mfdb('zebrafish', destroy_schema=T)

Okay, I believe this has to do with the distinction between column names in the data_in argument for mfdb_import_survey. For example, if we change the name in the defaults list to area instead of areacell, then mfdb_sample_count returns a list of the appropriate number of bootstrap samples. However, all data.frames in the list are 0 columns and 0 rows. But, if I try to change the imported data column name in mfdb_import_survey to area instead of areacell I get the following error:

Error in sanitise_col(mdb, data_in, "areacell", lookup = "areacell") : 
  Input data is missing 'areacell'. Columns available: year,month,area,count

How can I import data_in with a column name of area instead of areacell?

I think I've solved my own problem thanks to help from @bthe. Area must also be added with mfdb_import_division as in the following two lines.

divList <- as.list(1:10); names(divList) <- 1:10
mfdb_import_division(mdb, divList)

See updated code in the original post.

Yeah, area -> divisions is an odd special case, picked up from DST^2. Glad you got it sorted though, sorry I didn't get here in time.