Factors and data.frames in R 4.0
Closed this issue · 11 comments
I hit a problem when testing the code in Kéry & Royle (2016) Applied Hierarchical Modeling vol 1 section 10.9 p.592ff with R 4.0.1 RC. It uses unmarkedFrameOccu
and occu
, but the issue may apply to other unmarkedFrame*
functions.
Everything seemed to work fine (though I haven't checked results vs 3.6.3) until I passed a fitted model to AICcmodavg::mb.gof.test
, when I got no applicable method for 'droplevels' applied to an object of class "character"
.
The list for obsCovs
includes time
, a 3-column character matrix. Looking at the summary of the umf
object, this is converted to a factor in R 3.6.3 but is still character in R 4.0.
I think the issue is the change in the default for stringsAsFactors
in data.frame
from TRUE to FALSE w.e.f. R 4.0.
Setting options(stringsAsFactors = TRUE)
fixes the problem, but elicits a grumpy warning and is not a long-term solution. mb.gof.test
then works, but only with parallel=FALSE
; presumably the option would have to be set on the workers for it to work in parallel.
It probably only needs addition of stringsAsFactors = TRUE
in calls to data.frame
. I looked at the source code but couldn't find my way around, so will not do a pull request.
Thanks, Mike
I'm using unmarked
1.0.0 and AICcmodavg
2.2-2.
This is a bug in AICcmodavg
. Here's the line that needs the adjustment:
https://github.com/cran/AICcmodavg/blob/9bdb2199725f0cf50f2adba09e0a6d265615f1f3/R/mb.gof.test.R#L46
I emailed a fix to Marc a few weeks ago (the package doesn't have a public repository). Also, I'm not sure if AHM uses the MB chi-square test for Royle-Nichols models, but the current function gives incorrect results. I sent a fix for that too.
Also potentially relevant:
https://groups.google.com/forum/#!topic/unmarked/x5fxjSRDb1Y
I wouldn't be surprised if unmarked has some stringsAsFactors=FALSE
issues lurking, but I haven't found them yet.
Thanks Ken. Sorry I hadn't checked the unmarked forum. But there's still a difference with R 4.0. A toy example, check output for "Observation-level covariates":
getRversion()
[1] ‘4.0.1’
library(unmarked)
set.seed(2020)
y <- matrix(rbinom(30, 1, 0.3), ncol=3)
time <- matrix(as.character(1:3), nrow=10, ncol = 3, byrow = TRUE)
summary(unmarkedFrameOccu(y = y, obsCovs = list(time = time)))
unmarkedFrame Object
10 sites
Maximum number of observations per site: 3
Mean number of observations per site: 3
Sites with at least one detection: 7
Tabulation of y observations:
0 1
22 8
Observation-level covariates:
time
Length:30
Class :character
Mode :character
options(stringsAsFactors = TRUE) # gives dire warning
summary(unmarkedFrameOccu(y = y, obsCovs = list(time = time)))
unmarkedFrame Object
... [same output omitted]...
Observation-level covariates:
time
1:10
2:10
3:10
The last output is what you get with R 3.6.
The model fitting still works, presumably because the coercion is done later, maybe by model.matrix
. So not serious, but the new summary output for obsCovs
is not very useful.
Regards, Mike
Makes sense. I'll work on something to fix summary
methods for this situation. I see the need to make sure things are backwards compatible here. I also think it would be good to encourage users to explicitly specify variables as factors outside of unmarked
, rather than relying on the automatic conversion of characters to factors. That seems to be me to be more in the spirit of the changes made in 4.0. This probably means changing some of the example code and maybe vignettes.
explicitly specify variables as factors outside of
unmarked
...
Do you have a neat way to do this? I can't. I can't put a factor into a matrix, and if I construct the necessary data frame it still gets converted back to character.
getRversion()
[1] ‘4.0.1’
library(unmarked)
set.seed(2020)
y <- matrix(rbinom(30, 1, 0.3), ncol=3)
time <- matrix(as.character(1:3), nrow=10, ncol = 3, byrow = TRUE)
str(t1 <- factor(time)) # now a vector
str(t2 <- matrix(t1, ncol=3)) # back to character again
t3 <- data.frame(T1 = factor(rep(1, 10), levels=(c("1", "2", "3"))),
T2 = factor(rep(2, 10), levels=(c("1", "2", "3"))),
T3 = factor(rep(3, 10), levels=(c("1", "2", "3"))))
str(t3) # ok, try this
head(t3)
summary(unmarkedFrameOccu(y = y, obsCovs = list(time = t3)))
unmarkedFrame Object
10 sites
Maximum number of observations per site: 3
Mean number of observations per site: 3
Sites with at least one detection: 7
Tabulation of y observations:
0 1
22 8
Observation-level covariates:
time
Length:30
Class :character # !!!!!
Mode :character
I'm guessing that at some point the matrices/data frames input to unmarkedFrameOccu
are converted to vectors then passed to cbind
or equivalent. Converting my data frame of factors to vector converts them to character.
Regards, Mike
You can supply the obs covs in long format:
y <- matrix(rbinom(30, 1, 0.3), ncol=3)
obs <- data.frame(time=factor(rep(c(1:3), 10)))
umf <- unmarkedFrameOccu(y, obsCovs=obs)
summary(umf)
unmarkedFrame Object
10 sites
Maximum number of observations per site: 3
Mean number of observations per site: 3
Sites with at least one detection: 9
Tabulation of y observations:
0 1
19 11
Observation-level covariates:
time
1:10
2:10
3:10
You're right, though, that probably every umf creation method needs to be examined for this issue.
Hi guys, I don't see any value in having character strings in unmarkedFrame objects. We can't use them for anything. Could we automatically convert them to factors with a warning?
I was hoping to avoid it but Mike's examples have convinced me. It does feel like there might be some unexpected consequences, eg related to prediction.
An alternative would be to throw an error instead of issuing a warning. This would make the user deal with it.
Mike's right though that if you want to supply obs covs as a list of matrices/data frames there is no way to supply a factor correctly. I'm not sure all users would figure out to use the long format instead, if they are used to always using the list approach.