Processing math: 100%
  • Difference Stationary Time Series
  • Model Selection and unit root test procedures
    • Procedure A
    • Procedure B
    • Procedure C
    • Model Forecasts
  • Another example of analysing data for the presence of a unit root
    • Plotting time series against lagged versions of themselves
    • Checking the correlation between our time series and its lagged values
    • ACF and PACF plots
    • Repeating the model selection and unit root test procedures
    • Forecasting the series:

Difference Stationary Time Series

We will examine monthly index data of the New York Stock Exchange from 1952:1 to 1996:1:

suppressPackageStartupMessages({library(forecast)})
nyse <- read.table(url("http://uosis.mif.vu.lt/~rlapinskas/(data%20R&GRETL/nyse.txt"), header = TRUE)
nyse <- ts(nyse, start = 1952, frequency = 12)
par(mfrow = c(1, 3))
plot(nyse); plot(log(nyse)); plot(diff(log(nyse)))

we shall model the variable log(nyse) in two different but equivalent ways:

  1. As a process with a linear trend whose deviations from the trend are described as the AR(1) process: {Yt=α+δt+utut=ϕut1+wt
lnyse <- log(nyse)
mod.a <- arima(lnyse, c(1,0,0), xreg = time(lnyse))
mod.a
## 
## Call:
## arima(x = lnyse, order = c(1, 0, 0), xreg = time(lnyse))
## 
## Coefficients:
##          ar1  intercept  time(lnyse)
##       0.9832  -144.2528       0.0763
## s.e.  0.0076    12.4969       0.0063
## 
## sigma^2 estimated as 0.001632:  log likelihood = 945.18,  aic = -1882.37
  1. As an autoregressive process with a linear trend: Yt=α+ϕYt1+˜δt+wt=(α(1ϕ)+ϕδ)+ϕYt1+δ(1ϕ)t+wt
suppressPackageStartupMessages({library(dynlm)})
mod.l <- dynlm(lnyse ~ time(lnyse) + L(lnyse))
summary(mod.l)
## 
## Time series regression with "ts" data:
## Start = 1952(2), End = 1996(1)
## 
## Call:
## dynlm(formula = lnyse ~ time(lnyse) + L(lnyse))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.221927 -0.025066  0.002876  0.026943  0.147915 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -2.169524   1.124569  -1.929   0.0542 .  
## time(lnyse)  0.001152   0.000595   1.936   0.0534 .  
## L(lnyse)     0.984518   0.008150 120.792   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.04055 on 525 degrees of freedom
## Multiple R-squared:  0.9981, Adjusted R-squared:  0.9981 
## F-statistic: 1.38e+05 on 2 and 525 DF,  p-value: < 2.2e-16

The coefficients of the two models differ because of different parametrization, however, both models practically coincide (their residual differences are in fact zeros):

summary(mod.a$residuals - mod.l$residuals)
##       Min.    1st Qu.     Median       Mean    3rd Qu.       Max. 
## -8.119e-04 -3.638e-04 -3.215e-05  1.198e-04  6.636e-04  1.021e-03

The most important coefficient is that of the lagged term - in the first case it equals 0.9832 and in the second 0.984518 (do not forget that these are only the estimates of ϕ from Yt=ϕYt1+wt).

The estimate is very close to 1, however, we cannot test the null hypothesis H0:ϕ=1 by using the presented p-value of <2e-16 (it is used to test the hypothesis H0:ϕ=0).

We cannot test the hypothesis H0:ϕ=1 in a standard way due to the fact that (ˆϕ1)/s.e, provided H0 is true, has not the Student but another, namely, the Dickey-Fuller distribution.

Note: to forecast the first model using forecast(), we need to create a variable for the future value of time:

t.1 <- c(tail(time(lnyse), 1)) + 1 / frequency(lnyse)
time.f <- seq(from = t.1, by = 1 / frequency(lnyse), length.out = 20)
plot(forecast(mod.a, h = 20, xreg = time.f), include = 120)


Model Selection and unit root test procedures

Procedure A

We shall apply the OLS method and proceed as follows:

  1. Since the order of AR is unknown, we begin with a ‘sufficiently high’ order pmax. It is easy to verify that the process: Yt=α+ϕ1Yt1+...+ϕpYtp+δt+wt can be rewritten as: ΔYt=α+ρYt1+γ1ΔYt1+...+γp1ΔYtp+1+δt+wt For example, the process: Yt=α+ϕ1Yt1+ϕ2Yt2+δt+wt can be expressed as ΔYt=α+(ϕ1+ϕ21)Yt1ϕ2ΔYt1+δt+wt

Now, the original hypothesis H0: the process has a unit root will transform to H0:ρ=0.

  1. Let us find the ‘right’ order of the model. To do this, use the model’s output table to test the hypothesis γpmax1=0. If the p-value > 0.05, reduce the order and repeat the calculations. When the ‘right’ order is found, we shall test the significance of δ.

  2. The significance of ρ is tested in a different way:

  1. If the final version contains the linear term δt, the 5% Dickey-Fuller critical value is approx. -3.45. thus, if the t-ratio of the term Yt1 is less than -3.45, the null H0 the process has a unit root is rejected and we decide that the AR process is stationary.

  2. If the final version of the model does not contain δt, the 5% Dickey-Fuller critical value is approx -2.89. Thus, if the t-ratio of the term Yt1 is less than -2.89, the null H0 the process has a unit root is rejected and we decide that the AR process is stationary.



We shall apply the procedure to lnyse. Let use begin with pmax=4:

mod.4 <- dynlm(d(lnyse) ~ L(lnyse) + L(d(lnyse), 1:4) + time(lnyse))
summary(mod.4)
## 
## Time series regression with "ts" data:
## Start = 1952(6), End = 1995(13)
## 
## Call:
## dynlm(formula = d(lnyse) ~ L(lnyse) + L(d(lnyse), 1:4) + time(lnyse))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.220902 -0.024758  0.001637  0.026627  0.152618 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)  
## (Intercept)       -2.3687707  1.1485800  -2.062   0.0397 *
## L(lnyse)          -0.0172567  0.0083590  -2.064   0.0395 *
## L(d(lnyse), 1:4)1  0.0544182  0.0439376   1.239   0.2161  
## L(d(lnyse), 1:4)2 -0.0282788  0.0439795  -0.643   0.5205  
## L(d(lnyse), 1:4)3  0.0150921  0.0439341   0.344   0.7313  
## L(d(lnyse), 1:4)4  0.0364522  0.0439285   0.830   0.4070  
## time(lnyse)        0.0012584  0.0006077   2.071   0.0389 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.04061 on 517 degrees of freedom
## Multiple R-squared:  0.01244,    Adjusted R-squared:  0.0009837 
## F-statistic: 1.086 on 6 and 517 DF,  p-value: 0.3696

We see that the coefficient of L(d(lnyse), 1:4)4 is insignificant (because its p-value = 0.4070 > 0.05). So, now, instead of L(d(lnyse), 1:4), we reduce the maximum order to pmax=3:

mod.3 <- dynlm(d(lnyse) ~ L(lnyse) + L(d(lnyse), 1:3) + time(lnyse))
summary(mod.3)
## 
## Time series regression with "ts" data:
## Start = 1952(5), End = 1996(1)
## 
## Call:
## dynlm(formula = d(lnyse) ~ L(lnyse) + L(d(lnyse), 1:3) + time(lnyse))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.220029 -0.025348  0.002672  0.027110  0.152142 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)  
## (Intercept)       -2.2563268  1.1441167  -1.972   0.0491 *
## L(lnyse)          -0.0161543  0.0083185  -1.942   0.0527 .
## L(d(lnyse), 1:3)1  0.0513588  0.0439385   1.169   0.2430  
## L(d(lnyse), 1:3)2 -0.0273597  0.0439384  -0.623   0.5338  
## L(d(lnyse), 1:3)3  0.0151113  0.0439494   0.344   0.7311  
## time(lnyse)        0.0011980  0.0006054   1.979   0.0484 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.04065 on 519 degrees of freedom
## Multiple R-squared:  0.01048,    Adjusted R-squared:  0.000945 
## F-statistic: 1.099 on 5 and 519 DF,  p-value: 0.3599

Coefficient of L(d(lnyse), 1:3)3 is insignificant (p-value = 0.7311 > 0.05). So we continue the procedure (we omit the results - check them! - the coefficients of L(d(lnyse), 1:2)2 and L(d(lnyse), 1) are insignificant):

mod.2 <- dynlm(d(lnyse) ~ L(lnyse) + L(d(lnyse), 1:2) + time(lnyse))
# summary(mod.2)
mod.1 <- dynlm(d(lnyse) ~ L(lnyse) + L(d(lnyse), 1) + time(lnyse))
# summary(mod.1)

until we get such a model:

mod.0 <- dynlm(d(lnyse) ~ L(lnyse) + time(lnyse))
summary(mod.0)
## 
## Time series regression with "ts" data:
## Start = 1952(2), End = 1996(1)
## 
## Call:
## dynlm(formula = d(lnyse) ~ L(lnyse) + time(lnyse))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.221927 -0.025066  0.002876  0.026943  0.147915 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)  
## (Intercept) -2.169524   1.124569  -1.929   0.0542 .
## L(lnyse)    -0.015482   0.008150  -1.900   0.0580 .
## time(lnyse)  0.001152   0.000595   1.936   0.0534 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.04055 on 525 degrees of freedom
## Multiple R-squared:  0.0071, Adjusted R-squared:  0.003317 
## F-statistic: 1.877 on 2 and 525 DF,  p-value: 0.1541

Where we see that our linear term time(lnyse) is not significant. So, our final model:

mod.0 <- dynlm(d(lnyse) ~ L(lnyse))
summary(mod.0)
## 
## Time series regression with "ts" data:
## Start = 1952(2), End = 1996(1)
## 
## Call:
## dynlm(formula = d(lnyse) ~ L(lnyse))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.225147 -0.024324  0.002601  0.025993  0.157881 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)
## (Intercept)  0.0077315  0.0121721   0.635    0.526
## L(lnyse)    -0.0001374  0.0019082  -0.072    0.943
## 
## Residual standard error: 0.04065 on 526 degrees of freedom
## Multiple R-squared:  9.851e-06,  Adjusted R-squared:  -0.001891 
## F-statistic: 0.005181 on 1 and 526 DF,  p-value: 0.9426

Our final model does not contain a linear term, so the Dickey-Fuller critical value i -2.89. The t-value from our model is -0.072 > -2.89, so we have no ground to reject ρ=0.

Thus, lnyse has a unit root and it can be modeled as:

mod.f <- dynlm(d(lnyse) ~ 1)
summary(mod.f)
## 
## Time series regression with "ts" data:
## Start = 1952(2), End = 1996(1)
## 
## Call:
## dynlm(formula = d(lnyse) ~ 1)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.225318 -0.024289  0.002496  0.026039  0.157962 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.006865   0.001768   3.884 0.000116 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.04062 on 527 degrees of freedom

In other words, lnyset=0.006865+lnyset1+wt, i.e. this is a random walk with drift.


Procedure B

The 5% Dickey-Fuller approximate critical values were presented above. In our case, -0.072 was definitely greater than -2.89, but, generally, once the order and necessity of δt are established, use the Augmented Dickey-Fuller test, implemented in ur.df() function. Recall that this test is applied in the cases:

  • Yt=ϕYt1+wt ("none" option);
  • Yt=α+ϕYt1+wt ("drift" option);
  • Yt=α+ϕYt1+δt+wt ("trend" option).

In all three cases, we test the null hypothesis H0:ϕ=1 (this is equivalent to H0:ρ=0). It’s critical values depend on type option and are equal to, respectively, tau1, tau2, tau3. Note that the critical values remain the same even if the process is not AR(1), but AR(p).



suppressPackageStartupMessages({library(urca)})
lnyse.df <- ur.df(lnyse, lags = 0, type = "drift")
summary(lnyse.df)
## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression drift 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.225147 -0.024324  0.002601  0.025993  0.157881 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)
## (Intercept)  0.0077315  0.0121721   0.635    0.526
## z.lag.1     -0.0001374  0.0019082  -0.072    0.943
## 
## Residual standard error: 0.04065 on 526 degrees of freedom
## Multiple R-squared:  9.851e-06,  Adjusted R-squared:  -0.001891 
## F-statistic: 0.005181 on 1 and 526 DF,  p-value: 0.9426
## 
## 
## Value of test-statistic is: -0.072 7.5294 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau2 -3.43 -2.86 -2.57
## phi1  6.43  4.59  3.78

Since the Value of test-statistic is: -0.072 and is closer to zero than the 5% critical value : tau2 = -2.86 < -0.072 < 0, there is no ground to reject the null H0: the process has a unit root.


Procedure C

Testing for unit root with the sequential procedure A and procedure B last long. The same ur.df function allows us to automate the procedure: once you choose the maximum number of lags (e.g. 4) and note to include a linear trend, the ur.df function creates all possible models with lags = 4, 3, 2, 1, 0 and chooses the one with the least BIC. Note that this model will differ from the previous, however, it is hard to say which one is more accurate.



lnyse.df <- ur.df(lnyse, type = "trend", lags = 4, selectlags = "BIC")
summary(lnyse.df)
## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression trend 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + tt + z.diff.lag)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.219863 -0.024691  0.003008  0.026555  0.153257 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)  
## (Intercept)  8.559e-02  3.915e-02   2.186   0.0292 *
## z.lag.1     -1.680e-02  8.209e-03  -2.046   0.0412 *
## tt           1.023e-04  4.983e-05   2.053   0.0406 *
## z.diff.lag   5.261e-02  4.380e-02   1.201   0.2302  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.04054 on 520 degrees of freedom
## Multiple R-squared:  0.01001,    Adjusted R-squared:  0.004295 
## F-statistic: 1.752 on 3 and 520 DF,  p-value: 0.1554
## 
## 
## Value of test-statistic is: -2.0463 6.0166 2.1302 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau3 -3.96 -3.41 -3.12
## phi2  6.09  4.68  4.03
## phi3  8.27  6.25  5.34

Value of test-statistic is: -2.0463 and Critical values for test statistics is tau3: -3.96. thus, the bottom line is: since the test statistic is greater than the critical value, lnyse is a process with (a linear trend and) a unit root.

If instead of type = "trend", we choose type = "drift", once again, we get that the process has a unit root.


Model Forecasts

Thus, we have three possible models to choose from for forecasting. Without any deeper argumentation, we select the mod.a. lnyse is forecasted with the formula lnyset=0.006865+lnyset1+wt: to do this in R, use the Arima function:

suppressPackageStartupMessages({library(forecast)})
lnyse.ur <- Arima(lnyse, c(0, 1, 0), include.drift = TRUE)
lnyse.ur
## Series: lnyse 
## ARIMA(0,1,0) with drift 
## 
## Coefficients:
##        drift
##       0.0069
## s.e.  0.0018
## 
## sigma^2 estimated as 0.00165:  log likelihood=942.79
## AIC=-1881.59   AICc=-1881.56   BIC=-1873.05
plot(forecast(lnyse.ur), include = 72)


Another example of analysing data for the presence of a unit root

We will examine quarterly long-run interest rates from 1954 Q1 to 1994 Q4:

suppressPackageStartupMessages({require(readxl)})
txt1 <- "http://uosis.mif.vu.lt/~rlapinskas/(data%20R&GRETL/"
txt2 <- "interestrates.xls"
tmp = tempfile(fileext = ".xls")
#Download the file
download.file(url = paste0(txt1, txt2), destfile = tmp, mode = "wb")
#Read it as an excel file
rate <- data.frame(read_excel(path = tmp))
rate <- ts(rate[, -1], start = 1954, frequency = 4)

Let us view the data:

plot(rate)

An initial look at the plots suggests that the data is not stationary since it is increasing throughout the period. However we cannot say if this is because of a trend (which would make this a TS series), or if this is because the series has a unit root and a drift (this would make it a DS series). [For an example comparing plots of DS with TS series see the lecture slides]

plot(diff(rate))

On the other hand, the differences seem to be varying over a constant value. What it more, there variance is not drastically changing with time, which suggests that our series might have a unit root.

Plotting time series against lagged versions of themselves

We will analyse only the Longrate variable. The analysis is identically carried out for the Shortrate variable.

If our data has a unit root, this would mean that the coefficient ϕ=1 in our equation Longratet=α+ϕLongratet1+ϵt. A good first-look can be seen by checking the Time Series Lag Plots with lag.plot, which plots the time series against lagged versions of themselves. They help visualize auto-dependence:

Longrate <- rate[, 1]
# Shortrate<- rate[, 2]
lag.plot(Longrate)

We can see that there is a clear linear dependence between Longrate and its first lag. However, plotting the difference lag.plot shows us that a relationship between ΔLongratet and its lag is not present when taking the first differences of the data.

lag.plot(diff(Longrate))

We can also calculate the correlation between Longartet and Longratet1:

Checking the correlation between our time series and its lagged values

cor(Longrate[-1], Longrate[-length(Longrate)])
## [1] 0.9997162
cor(diff(Longrate[-1]), diff(Longrate[-length(Longrate)]))
## [1] -0.002354711

The correlation between our initial data and its lag is almost 1, while the correlation of the differenced data is -0.00235, which suggests that a unit root may be present in our data.

ACF and PACF plots

Finally, we can check the (sample) ACF plot to see if our series is stationary:

par(mfrow = c(2, 2))
Acf(Longrate)
Pacf(Longrate)
Acf(diff(Longrate))
Pacf(diff(Longrate))

While the series is not stationary, it is not clear if the slow decline is because of a trend or because of a unit root. However, the fact that first differences appear to be WN, while the original data isn’t, suggests that our data might be a DS series, i.e. it might have a unit root.

Furthermore, the first bar of the PACF plot is almost 1, which, along with slowly declining ACF, gives an indication of a unit root in our time series.

Repeating the model selection and unit root test procedures

The result of this analysis allows us to guess that Longrate, most likely, has a unit root. We repeat the procedure of the previous example with pmax1=4 and finally end with the model

ΔYt=α+ρYt1+wt:

mod.0 <- dynlm(d(Longrate) ~ L(Longrate))
summary(mod.0)
## 
## Time series regression with "ts" data:
## Start = 1954(2), End = 1994(4)
## 
## Call:
## dynlm(formula = d(Longrate) ~ L(Longrate))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.027981 -0.004469  0.000053  0.004992  0.038346 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)   
## (Intercept)  0.038606   0.014392   2.682  0.00807 **
## L(Longrate) -0.003983   0.001871  -2.130  0.03473 * 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.009194 on 161 degrees of freedom
## Multiple R-squared:  0.0274, Adjusted R-squared:  0.02135 
## F-statistic: 4.535 on 1 and 161 DF,  p-value: 0.03473

It would be wrong to say that since the t-value |-2.13| > 2, we reject the null ρ=0. The right answer can be obtained with the Dickey-Fuller test:

Longrate.df <- ur.df(Longrate, 0, type = "drift")
summary(Longrate.df)
## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression drift 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.027981 -0.004469  0.000053  0.004992  0.038346 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)   
## (Intercept)  0.038606   0.014392   2.682  0.00807 **
## z.lag.1     -0.003983   0.001871  -2.130  0.03473 * 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.009194 on 161 degrees of freedom
## Multiple R-squared:  0.0274, Adjusted R-squared:  0.02135 
## F-statistic: 4.535 on 1 and 161 DF,  p-value: 0.03473
## 
## 
## Value of test-statistic is: -2.1295 63.8967 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau2 -3.46 -2.88 -2.57
## phi1  6.52  4.63  3.81

Because the t-value=-2.13 > -2.88, we have no ground to reject the null, therefore, Longrate is a random walk with drift:

Long.mdl <- Arima(Longrate, c(0, 1, 0), include.drift = TRUE)
coefficients(Long.mdl)
##       drift 
## 0.007994899

YtYt1=0.00799+wt

Forecasting the series:

We can also forecast this process using auto.arima:

Long.auto <- auto.arima(Longrate, d = 1)
plot(forecast(Long.auto, 12), include = 20)