morrowcj/remotePARTS

fitAR() results & estimates for the same value in the time series

Opened this issue · 12 comments

Describe the bug
Hello!

I have run the fitAR() model over a raster stack and have now started exploring the results.
In that raster stack, there is a bunch of pixels that have the same value across all the layers. For those pixels, I get a tiny value of coefficient and low p-value (see example below). I know that the model "estimates parameters for the regression model with AR(1) random error terms", but I find this result confusing and false positive output. The estimate is rather on the random error and that is an artificial result.
Of course, I can round up the coefficient estimate and then get 0, but the p-value still exists. But I do not see a point in generating that result at all.
For the same raster stack, and those stable pixels, while applying the Mann-Kendal test, I get 0 for the coefficient and p-value 1.

Maybe in case a function fitAR() is applied on the vector consisting of the same values, the fitting should be skipped, and the output just gives the coeff = 0, and p-value = 1? Unless that would further affect the spatiotemporal part of the modeling?
Maybe adding a condition like skip.same = TRUE/FALSE would either yield for TRUE: coeff = 0, p-value =1; and for FALSE: regular model fit.

Thanks!
To Reproduce

x <- c(rep(46,20))
t <- 1:20
AR.time <- fitAR(x ~ t)
Warning messages:
1: In optimize(function(par) fn(par, ...)/con$fnscale, lower = lower,  :
  NA/Inf replaced by maximum positive value
2: In optimize(function(par) fn(par, ...)/con$fnscale, lower = lower,  :
  NA/Inf replaced by maximum positive value
summary(AR.time)

Call:
fitAR(formula = x ~ t)
Residuals:
       Min         1Q     Median         3Q        Max 
-3.553e-14 -2.132e-14 -7.105e-15  1.776e-15  1.421e-14 
Coefficients:
              Estimate Std. Error   t value Pr(>|t|)    
(Intercept)  4.600e+01  7.335e-29 5.371e+15  < 2e-16 ***
t           -2.490e-15  5.157e-31 3.467e+00  0.00275 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Mean squared error: 0
Log-likelihood: 577.7417

Desktop (please complete the following information):

  • OS: Platform: x86_64-w64-mingw32/x64 (64-bit); Running under: Windows 10 x64 (build 19045)
  • R: R version 4.2.2 (2022-10-31 ucrt)
  • Package version: 1.0.0

Will you also change the p-value? I am using a threshold on p-value for data filtering and that artificial value is not needed as well.
What do you mean by automatically? Setting a zero for the coefficient and 1 for the p-value maybe with a comment in the output that values do not change should give a clear message.

Thanks a lot for the quick answer:)

Best,
Monika

As Tony mentioned, this is because it does not make sense to run a regression for a series of constants. The "true" value of the t-statistics, in this case, would be Inf (46/0) and NaN (0/0), according to the formula $|\beta| \div \sigma$.

> abs(round(AR.time$coefficients)) / round(AR.time$SE)
(Intercept)           t 
        Inf         NaN 

We might consider putting a check for this sort of thing in the function, but I agree with Tony that we expect researchers to filter their own data.


The expectation is that all non-compatible data (missing values, constant over time, etc) are removed prior to analyzing.

Package version: 1.0.0

@Rapsodia86, If you are using 1.0.0, I'd recommend upgrading to the current stable version 1.0.4, which is on CRAN. We've fixed a number of bugs and improved the stability since 1.0.0.

Of course, it makes no sense to run that regression on constants. But even if I know about that case, I would rather prefer to have a trend function handling this than filtering rasters up front. Because the fitAR() does not handle NA, right?
On the other hand, I can wrap fitAR() in a small checking function when applying it on a raster stack; the question is how much longer it will run:

 AR_ts_fun <- function(x) {
    t<- 1:20
    if (any(is.na(x))){
    r_out <- c(NA,NA)}else if(
      all(x == x[1]))
    {r_out <- c(1,0)}else{  
    AR.time <- fitAR(x ~ t)
    r_out <- c(AR.time$pval[2],AR.time$coefficients[2])}
    return(r_out)
  }

Right, an individual comment would only work if someone prints the model summary; for running in the loop without extracting the info is not useful.

@morrowcj I will upgrade the package, thanks!

@arives @morrowcj Thanks again for working on this!

Okay, thanks for the clarification.
In this specific scenario I am working on, I do not want to have any NA. It would make no sense to compare trends for pixels having different time series lengths.

It is common. Just in this instance, I do explore trends of a couple of variables constructed from the same dataset. So, I want to have consistent time-series and constant statistical power.

I'm going to reopen this, because I do think it is a good feature to consider.