# Set plot options in Jupyter Lab
options(repr.plot.width = 20)
options(repr.plot.height = 6)
Below we will simulate two cointegrated series.
$ \begin{aligned} Y_t &= \beta_2 X_t + u_t,\ u_t \sim I(0)\\ X_t &= X_{t-1} + v_t,\ v_t \sim I(0) \end{aligned} $
where $\beta_2$ is the cointegrating parameter.
In this case $Y_t \sim I(1)$ and $X_t \sim I(0)$. Notice that the first equation gives the long-run equilibrium relationship $Y_t - \beta_2 X_t = u_t \sim I(0)$, i.e. the first equation gives the linear combination of $X_t$ and $Y_t$, which is stationary. The second relationship is the common stochastic trend (i.e. unit root) relationship that we are already familiar with.
Further examples can be found at this answer on stackoverflow. A more advanced example can be found in this paper.
For this example, we will choose:
set.seed(123)
#
N <- 500
sd_u <- 0.5
sd_v <- 0.5
#
eps <- MASS::mvrnorm(N, mu = c(0, 0), Sigma = diag(c(sd_u, sd_v)))
#
u <- arima.sim(model = list(ar = 0.6), innov = eps[, 1], n = N)
v <- eps[, 2]
#
X <- cumsum(v)
Y <- X + u
Y <- ts(Y)
X <- ts(X)
We can examine the series
plot.ts(Y)
lines(ts(X), lty = 2, col = "red")
legend("topleft", legend = c("Y", "X"), lty = 1:2, col = c("black", "red"))
We can examine their ACF and CCF plots:
forecast::Acf(cbind(X, Y))
We see that the series exhibit a slowly decaying ACF and a slowly decaying CCF.
tseries::adf.test(Y, alternative = "stationary")
tseries::adf.test(X, alternative = "stationary")
In both cases, we do not reject the null hypothesis of a unit root.
coint_reg <- lm(Y ~ X)
coef(summary(coint_reg))
coint_res <- ts(coint_reg$res)
forecast::tsdisplay(coint_res)
library(dynlm)
A nive feature of using R
(or Python
, or any other econometric software, which allows some form of coding) is that you can automate some of the boring procedures (only if you know that you can trust the automation process).
In this case - we want to select the maximum lag for our residual unit root testing and reduce the maximum lag if it is insignificant - so, as this is a relatively trivial procedure, we can automate it:
for(max_lag in 10:0){
print(paste0("Selecting max residual lag: ", max_lag))
if(max_lag > 0){
m_lag <- paste0("+ L(d(coint_res), 1:", max_lag, ")")
}else{
m_lag <- ""
}
my_formula <- paste0("d(coint_res) ~ -1 + L(coint_res)", m_lag)
my_formula <- as.formula(my_formula)
resid_mdl <- dynlm(my_formula)
if(tail(coef(summary(resid_mdl))[, 4], 1) < 0.05) break
}
round(coef(summary(resid_mdl)), 5)
The estimated residual model is:
$ \Delta e_t = \rho e_{t-1} + w_t $
Since our long-run equilibrium equation has one xogeneous variable, $X$, and no trend, the critical value is -3.37
. The t-statistic of L(coint_res)
is -11.32324
.
Since -11.32324 < -3.37
, we reject the null hypothesis $H_0: \rho = 0$ and conclude that the residuals do not have a unit root (this is to be expected, since we simulate $u_t$ as stationary $\rm AR(1)$ process. This also explains why the lagged values, L(d(coint_res))
, we insignificant).
This means that we reject the null hypothesis of no cointegration and conclude that $Y$ and $X$ are cointegrated.
mdl_1 <- dynlm(d(Y) ~ 1 + L(coint_res) + L(d(Y), 1:3) + L(d(X), 0:3))
summary(mdl_1)
Again, we see insignificant variables. No that this can also be automated and you should try to do this following the previous example.
We instead move to the finalized model:
mdl_1 <- dynlm(d(Y) ~ 1 + L(coint_res, 1) + L(d(X), 0))
summary(mdl_1)
Note that L(d(X), 0)
is equivalent to d(X)
.
The estimated model is: $\Delta Y_t = -0.003525 - 0.407843 \widehat{e}_{t-1} + 1.056106 \Delta X_t + \epsilon_t$.
where:
1.056106
;1.00804449
, which we get from the cointegration regression:coef(summary(coint_reg))
Remember that we have simulated the following model:
$ \begin{cases} Y_t &= \beta_2 X_t + u_t,\\ u_t &= \widetilde{\phi} u_{t-1} + \epsilon_t, \epsilon_t \sim I(0)\\ X_t &= X_{t-1} + v_t,\ v_t \sim I(0) \end{cases} $
As we have done before, we can express $u_t = Y_t - \beta_2 X_t$ to get $Y_t - \beta_2 X_t = \widetilde{\phi} Y_{t-1} - \widetilde{\phi} \beta_2 X_{t-1} + \epsilon_t$. Finally, our re-written model is:
$ Y_t = \widetilde{\phi} Y_{t-1} + \beta_2 X_t - \widetilde{\phi} \beta_2 X_{t-1} + \epsilon_t $
Looking at the lecture slides, we see that we can re-write the model as the ECM:
$ \Delta Y_t = \beta_0 \Delta X_t - (1 - \widetilde{\phi}) \left[ Y_{t-1} - \widetilde{\beta} X_{t-1} \right] + \epsilon_t $
where:
If we compare the true values with the estimated ones from the model output - we see that they are the same (our model also includes the intercept component, however, it is insignificant).
Note that in the above expression we got that $1 - \widetilde{\phi} \neq 0$, since $\widetilde{\phi}$ is the autoregressive parameter of $u_t$ (and NOT for $Y_t$).
However, this does not mean that $Y_t$ is stationary. As mentioned before, both $X_t$ and $Y_t$ are $I(1)$. To see why, rewrite $X_t = 1/\beta_2 (Y_t - u_t)$ and insert it into the expression of $X_t$, to get $1/\beta_2 (Y_t - u_t) = 1/\beta_2 (Y_{t-1} - u_{t-1}) + v_t$, which we can write as:
$ Y_t = Y_{t-1} + \Delta u_t + \beta_2 v_t $
In other words, $Y_t \sim I(1)$.
However the above expression contains two different random components $u_t$ and $v_t$, which we cannot evaluate separately.
Hence why the previous expression only highlights the fact that $Y_t ~ \sim I(1)$, while the ECM form and the long-run equilibrium regression allows us to estimate all the relevant parameters.