flr/FLCore

parallelization of FLR

marchtaylor opened this issue · 5 comments

Some colleagues and I are having some difficulty setting up parallelization using FLR. Specifically, we are trying to apply a genetic algorithm (GA::ga), which I believe uses the parallel package. As long as we leave parallel=FALSE in that function, it runs without errors. With parallelization, we receive the following error

 Error in { : 
  task 1 failed - "error in evaluating the argument 'j' in selecting a method for function '[<-': Error in Summary.factor(1L, na.rm = FALSE) : 
  'min' not meaningful for factors
Calls: ac -> Summary.factor
"

This code is not part of our "fitness" function, nor do I see it within ga, so my guess is that it may be somewhere in FLCore. I had a previous problem with parallelization using the parallel package, which I believe dealt with conflicting iter() functions. I solved this by using snow for parallelization. I'm an not sure if this is a related issue, but was hoping someone might have some ideas.

Cheers

Hi,

'[' is overloaded for FLQuant, and it appears to call it with a factor for the year (j) dimension.

Can you please give us some code, for example based on ple4, so I can track it down?

Hi Iago,
Thanks for your quick response. By making a minimal example, we realized that the problem is somewhere in our "fitness" function - so it's not a FLR-specific issue. We will head back through our code and try to figure out the issue. Here is the minimal example in case anyone is interested:

# required packages
library(FLCore)
library(FLash)
library(FLAssess)
library(ggplotFL)
library(GA)


# load data
data(ple4)
plot(ple4)


# SRR
ple4SR <- as.FLSR(ple4)
model(ple4SR) <- bevholt
ple4SR <- fmle(ple4SR) 
plot(ple4SR)


# fitness function
stffun <- function(Ftarget){
nyear <- 10
# Make the control object
ctrl_target <- data.frame(
  year = seq(from = range(ple4)["maxyear"], length.out = nyear),
  quantity = "f",
  val = Ftarget,
  max = NA, min=NA
)
ctrl_obj <- fwdControl(ctrl_target)
ple4_stf <- stf(ple4, nyear)
ple4_stf <- fwd(ple4_stf, ctrl = ctrl_obj, sr = ple4SR) 
# plot(ple4_stf)
# fitness "score" = total harvest in final year
return(sum(ple4_stf@catch[, ac(range(ple4)["maxyear"]+nyear-1)])) 
}


# Genetic algorithm search for Fmsy ---------------------------------------

# works without parallelization
system.time(
res1 <- ga(
  type = "real-valued",
  fitness = stffun,
  min = 0.05,
  max = 1.5,
  popSize = 20,
  maxiter = 5,
  seed = 1,
  parallel = FALSE
)
)
res1@fitnessValue
res1@solution # best F
plot(res1)

# also works in parallel
system.time(
res2 <- ga(
  type = "real-valued",
  fitness = stffun,
  min = 0.05,
  max = 1.5,
  popSize = 20,
  maxiter = 5,
  seed = 1,
  parallel = TRUE # 4
)
)
res2@fitnessValue
res2@solution # best F
plot(res2)

OK, did it have to do with a factor being used for subsetting? Glad to hear it works now.

We still no not know where the error is coming from, but it is deeper in our model than in the functions called by this basic example. I will update here if I find any issues with FLR-related functions.

Perfect. Let me know and I will reopen it and investigate.