fitAR() results & estimates for the same value in the time series
Opened this issue · 12 comments
Describe the bug
Hello!
I have run the fitAR() model over a raster stack and have now started exploring the results.
In that raster stack, there is a bunch of pixels that have the same value across all the layers. For those pixels, I get a tiny value of coefficient and low p-value (see example below). I know that the model "estimates parameters for the regression model with AR(1) random error terms", but I find this result confusing and false positive output. The estimate is rather on the random error and that is an artificial result.
Of course, I can round up the coefficient estimate and then get 0, but the p-value still exists. But I do not see a point in generating that result at all.
For the same raster stack, and those stable pixels, while applying the Mann-Kendal test, I get 0 for the coefficient and p-value 1.
Maybe in case a function fitAR() is applied on the vector consisting of the same values, the fitting should be skipped, and the output just gives the coeff = 0, and p-value = 1? Unless that would further affect the spatiotemporal part of the modeling?
Maybe adding a condition like skip.same = TRUE/FALSE would either yield for TRUE: coeff = 0, p-value =1; and for FALSE: regular model fit.
Thanks!
To Reproduce
x <- c(rep(46,20))
t <- 1:20
AR.time <- fitAR(x ~ t)
Warning messages:
1: In optimize(function(par) fn(par, ...)/con$fnscale, lower = lower, :
NA/Inf replaced by maximum positive value
2: In optimize(function(par) fn(par, ...)/con$fnscale, lower = lower, :
NA/Inf replaced by maximum positive value
summary(AR.time)
Call:
fitAR(formula = x ~ t)
Residuals:
Min 1Q Median 3Q Max
-3.553e-14 -2.132e-14 -7.105e-15 1.776e-15 1.421e-14
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.600e+01 7.335e-29 5.371e+15 < 2e-16 ***
t -2.490e-15 5.157e-31 3.467e+00 0.00275 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Mean squared error: 0
Log-likelihood: 577.7417
Desktop (please complete the following information):
- OS: Platform: x86_64-w64-mingw32/x64 (64-bit); Running under: Windows 10 x64 (build 19045)
- R: R version 4.2.2 (2022-10-31 ucrt)
- Package version: 1.0.0
Will you also change the p-value? I am using a threshold on p-value for data filtering and that artificial value is not needed as well.
What do you mean by automatically? Setting a zero for the coefficient and 1 for the p-value maybe with a comment in the output that values do not change should give a clear message.
Thanks a lot for the quick answer:)
Best,
Monika
As Tony mentioned, this is because it does not make sense to run a regression for a series of constants. The "true" value of the t-statistics, in this case, would be Inf
(46/0) and NaN
(0/0), according to the formula
> abs(round(AR.time$coefficients)) / round(AR.time$SE)
(Intercept) t
Inf NaN
We might consider putting a check for this sort of thing in the function, but I agree with Tony that we expect researchers to filter their own data.
The expectation is that all non-compatible data (missing values, constant over time, etc) are removed prior to analyzing.
Package version: 1.0.0
@Rapsodia86, If you are using 1.0.0, I'd recommend upgrading to the current stable version 1.0.4, which is on CRAN. We've fixed a number of bugs and improved the stability since 1.0.0.
Of course, it makes no sense to run that regression on constants. But even if I know about that case, I would rather prefer to have a trend function handling this than filtering rasters up front. Because the fitAR() does not handle NA, right?
On the other hand, I can wrap fitAR() in a small checking function when applying it on a raster stack; the question is how much longer it will run:
AR_ts_fun <- function(x) {
t<- 1:20
if (any(is.na(x))){
r_out <- c(NA,NA)}else if(
all(x == x[1]))
{r_out <- c(1,0)}else{
AR.time <- fitAR(x ~ t)
r_out <- c(AR.time$pval[2],AR.time$coefficients[2])}
return(r_out)
}
Right, an individual comment would only work if someone prints the model summary; for running in the loop without extracting the info is not useful.
@morrowcj I will upgrade the package, thanks!
Okay, thanks for the clarification.
In this specific scenario I am working on, I do not want to have any NA. It would make no sense to compare trends for pixels having different time series lengths.
It is common. Just in this instance, I do explore trends of a couple of variables constructed from the same dataset. So, I want to have consistent time-series and constant statistical power.
I'm going to reopen this, because I do think it is a good feature to consider.