HW1: ADF tests and auto.arima()

Question

HW1: ADF tests and auto.arima()

Closed this issue 4 years ago · 7 comments

Couple of questions on HW1.

First, should the adf.test() and ur.df() functions lead to the same conclusions as long as we specify the same assumptions/lags within each function? For example, I would expect that the following two lines of code should lead to the same conclusion (WRT rejecting or not rejecting the null). However, I am finding that that is not the case.
adf.test(dat, k = 0)
ur.df(dat, type = "trend", lags = 0)

For question 4b, when using the auto.arima() function with trace=TRUE, I am finding that none of the candidate models are within Delta AICc = 2 of the top model. Is this user error on my part, or am I missing something?

Answer 1 · 2021-01-20T20:42:27.000Z

Hi @mbuonanduci. I'll let @eeholmes handle the first part about the Dickey Fuller tests, but when I tried 4b, I did indeed find several models within 2 AIC units of one another. I'd suggest you check that you're using the correct value for Species, the correct range for Year, and the correct response variable.

Answer 2 · 2021-01-20T22:00:16.000Z

Thanks @mdscheuerell! I have to say, I'm a bit stumped with 4b. I am subsetting the landings data using Species == "Anchovy", response variable log.metric.tons, and start/end years of 1964 and 1987, respectively. The best fitting model has AICc = -5.727702, and while a couple of others come close, none are quite within Delta AICc = 2.

Answer 3 · 2021-01-20T22:35:03.000Z

The key here is that the Q is asking

What models are within ΔAIC of 2? What is different about these models?

without mentioning anything about the "best" model. So, are any of your models within 2 units of one another?

Answer 4 · 2021-01-21T00:20:18.000Z

@mbuonanduci Ah, that should have been anchovy 1964-2007 not 1964-1984 in that question. For the longer dataset, there are a number within 2 of the best model. However, as written, the answer would be 'none are quite within Delta AICc = 2'. If you would like, repeat with 1964-2007. Then you'll see 2 models within Delta AICc of the best (best plus 2nd best). Or write 'none within Delta AICc = 2'

Answer 5 · 2021-01-21T01:29:08.000Z

@mbuonanduci Regarding the unit root test. You set it up correctly and yes they give the same result.

adf.test(dat, k = 0)
ur.df(dat, type = "trend", lags = 0)

The output of adf.test() tells you

data:  dat
Dickey-Fuller = -2.854, Lag order = 0, p-value = 0.247
alternative hypothesis: stationary

Test statistic is -2.854 and p-value is large so null (non-stationary) is not rejected.

The ur.df() output is cryptic but you can see that the 1st test statistic is the same as that reported by adf.test()

############################################################### 
# Augmented Dickey-Fuller Test Unit Root / Cointegration Test # 
############################################################### 

The value of the test statistic is: -2.854 2.8125 4.0747

I find it easiest to get the ur.df() test levels the way I did in lecture:

test <- ur.df(dat, type = "trend", lags = 0)
attr(test, "teststat")
attr(test,"cval")

This gives

> attr(test, "teststat")
               tau3    phi2     phi3
statistic -2.854003 2.81248 4.074721
> attr(test,"cval")
      1pct  5pct 10pct
tau3 -4.38 -3.60 -3.24
phi2  8.21  5.68  4.67
phi3 10.61  7.24  5.91

tau3 is the test statistic you want. This is the one for $\gamma = 0$ in $y_t - y_{t-1} = \gamma y_{t-1} ...$

You can also get this info using

test <- ur.df(dat, type = "trend", lags = 0)
summary(test)

But the tau3 test statistic is not labeled in the output.

Value of test-statistic is: -2.854 2.8125 4.0747 

Critical values for test statistics: 
      1pct  5pct 10pct
tau3 -4.38 -3.60 -3.24
phi2  8.21  5.68  4.67
phi3 10.61  7.24  5.91

Answer 6 · 2021-01-21T01:47:58.000Z

This stackoverflow discussion of ur.df() nicely lays the output that you see from summary() and how to interpret it. https://stats.stackexchange.com/a/117212/291254

Answer 7 · 2021-01-21T03:45:02.000Z

Thank you @eeholmes! I was interpreting the ur.df() output incorrectly. Really appreciate the help!