Below we provide a general outline on building multivariate time series models based on the methods that were presented in the lectures along with some additional functions for a broader perspective:
Variable selection & purpose
The initial step is to understand the task at hand - what kind of variables should be included, what maximum lag order would make sense based on the frequency and description of the data. What is the underlying economic or financial theory that is assumed beforehand?
logarithms
?ADF
test - the ur.df()
form the urca
package allows to do just this with type = c("none", "drift", "trend")
for different model types: none
- no constant and no trend in the model, drift
- constant in the model, trend
- constant + trend in the model. lags = 10, selectlags = "BIC"
to estimate different \(AR(p)\) models up to \(p = 10\) and select the best one in terms of their \(BIC\) value. Note that the null hypothesis of the \(ADF\) test is that the data has a unit root Other tests are also available and should be used as well: pp.test()
from the tseries
package to carry out the Phillips-Perron
(PP) test for the null hypothesis that the data has a unit root. Note that the regression equation in this test incorporates a constant and a linear trend, so we cannot specify different model types like with the ur.df()
function. kpss.test()
from the tseries
package to carry out the Kwiatkowski-Phillips-Schmidt-Shin
(KPSS) test for the null hypothesis that the data is stationary. adf.test()
from the tseries
package which uses the regression equation with a constant and a linear trend for the Augmented Dickey-Fuller
(ADF) test for the null hypothesis that the data has a unit root. Again, we cannot specify different model types like in ur.df()
. If we determine that the data is not \(I(0)\), then we need to check if it is \(I(1)\). To do this, carry out the unit root tests on the differences of the data. If the differences do not have a unit root (i.e. \(\Delta Y_t\) is \(I(0)\)), then the series \(Y_t\) is \(I(1)\). If the differences do have a unit root, then we need to either differentiate the differences, and check if the series is \(I(2)\). If the series are \(I(2)\), then the interpretation of the data will change - e.g. \(\Delta \Delta Y_t\) could be interpreted as differences of the change in \(Y_t\). If the \(\Delta \Delta Y_t\) is still not \(I(0)\), then you should re-examine your data for any outliers or errors as higher order of differences will be harder to interpret.
VAR(p) model order selection
Usually model selection is done based on some kind of information criteria - usually \(AIC\) or \(BIC\) sometimes other information criteria. This can be done using VARselect()
from the vars
package. \(BIC\) penalizes model complexity more heavily.
The Engle-Granger procedure
- it states that if two variables \(Y_t\) and \(X_t\) are non-stationary and cointegrated, then a linear combination of them must be stationary. In other words, the test is done in two steps:
Step 1: estimate the long-run (equilibrium) equation: \[Y_t = \alpha + \gamma \cdot t + \beta X_t + u_t\] In order to test for cointegration, we need to test if \(\widehat{u}_t\) is stationary. This is determined via ADF
(ur.df()
) tests on the model residuals (the residuals have a zero-mean, so we exclude the constant term): \[\Delta \widehat{u}_t = \rho \widehat{u}_{t-1} + \sum_{i = 1}^p c_i \Delta \widehat{u}_{t-i} + w_t\] using critical values adjusted for the number of variables. The null hypothesis of ADF
test is \(H_0: \rho = 0\), i.e. the residuals have a unit root. If we reject the null hypothesis, then the residuals are stationary and we conclude that \(Y_t\) and \(X_t\) are cointegrated. Note that in this step the only important question is the stationarity or otherwise of the residuals, i.e. traditional diagnostic tests are unimportant.
Phillips-Ouliaris cointegration test
- tests the null hypothesis \(H_0: Y_t \text{ and } X_t \text{ are not cointegrated}\). this test can be performed via ca.po()
from the urca
package, or po.test()
from the tseries
package. Phillips and Ouliaris showed that the ADF and PP unit root tests applied to the estimated cointegrating residual do not have the usual DF distributions under the null \(H_0: \text{ no cointegration}\). Note that the test type type = "Pz"
, compared to the test type = "Pu"
, has the advantage that it is invariant to the normalization of the cointegration vector, i.e. it does not matter which variable is on the left hand side of the equation. Johansen test
- a test for cointegration that allows for more than one cointegrating relationship, unlike the Engle-Granger method. If the sample size is too small then the results will not be reliable and one should use Auto Regressive Distributed Lags (ADL). The ca.jo()
function from the package urca
can be used to carry out this test, which is based on the following \(VAR\) and \(VECM\) specifications:spec = "longrun"
must be specified when calling the ca.jo()
function.spec = "transitory"
when calling ca.jo()
, the VECM of the second specification form will be estimated. Johansen test
has two variations, both of which are likelihood-ratio tests. For both test statistics, the null hypothesis if \[H_0:\text{ no cointegration}\] The tests differ in terms of the alternative hypothesis:
Estimating the model
If the series is not cointegrated, we can estimate the model via \(VAR()\) function from package vars
for the differences of the series, \(\Delta Y_t\) (if a unit root is present). If the series are cointegrated, we need to consider the long-run relationship by estimating a \(VECM\) using either VECM()
from package tsDyn
, or cajorls()
from package urca
and specifying the number of cointegrating relations, which we found from the previous step. Depending on the function, we may also need to specify the lag
order of the \(VECM\) representation.
Model diagnostics tests
Now that we have the model estimated, we need to verify if it is well specified. This is usually done by examining the residuals from the model. The most important test for time series data are tests for autocorrelations (or serial correlations) of the residuals, also known as Portmanteu test
. Two most well-known versions of this test are the Ljung-Box
and the Box-Pierce
tests, which are implemented in the Box.test()
function from the stats
package. For multivariate time series alongside autocorrelation, another problem is the cross-correlation of the residuals, i.e., when \(cor(\epsilon_{1,t}, \epsilon_{2, t + s}) \neq 0,\quad s > 0\). For this reason, we may use the serial.test()
function from the vars
package, which computes the multivariate test for serially correlated errors. A multivariate Ljung-Box test is implemented in the mq()
function from the MTS
package.
Results & conclusions
After we verify that the model is adequate, we can either predict()
future values, or examine the impulse-response functions via ifr()
from the vars
package in order to check how the variables respond to a particular shock.
It is also strongly recommended to partition the dataset into training
and test
data subsets and evaluate the model on the training
data. Then we can compare the forecasted values with the test
data to measure the forecasting strength of our model by calculating the forecast error, like the \(MAPE\) or \(RMSE\). The size of the test
dataset is usually up to 20%
of the total sample. For time series data, the test
data can be the last 20%
of the time series observations. We can also repeat this process by estimating the model on the first T - j
time series and evaluating the forecasts \(h\) periods ahead, with different values of \(j = T - k, T - k - 1, ..., h\), where \(T-k\) is around 20%
of the sample. We can then evaluate how our forecast error changes as we increase the sample size.
Based on this answer at stackoverflow.com
We will generate the following VAR(2) model:
\[ \begin{pmatrix} Y_{t} \\ X_{t} \end{pmatrix} = \begin{pmatrix} 3 \\ 2 \end{pmatrix} + \begin{pmatrix} 0.5 & 0.4 \\ -0.2 & 0.6 \end{pmatrix} \begin{pmatrix} Y_{t-1} \\ X_{t-1} \end{pmatrix} + \begin{pmatrix} -0.2 & -0.1 \\ 0.3 & 0.1 \end{pmatrix} \begin{pmatrix} Y_{t-2} \\ X_{t-2} \end{pmatrix} + \begin{pmatrix} \epsilon_{1,t} \\ \epsilon_{2,t} \end{pmatrix} ,\quad \begin{pmatrix} \epsilon_{1,t} \\ \epsilon_{2,t} \end{pmatrix} \sim \mathcal{N} \left( \begin{bmatrix} 0 \\ 0 \end{bmatrix} , \begin{bmatrix} 8 & 6 \\ 6 & 9 \end{bmatrix} \right) \]
set.seed(123)
nsample = 500
alpha = c(3, 2)
Theta_1 = matrix(c(0.5, 0.4, -0.2, 0.6), ncol = 2, byrow = TRUE)
Theta_2 = matrix(c(-0.2, -0.1, 0.3, 0.1), ncol = 2, byrow = TRUE)
Sigma = matrix(c(8, 6, 6, 9), ncol = 2, byrow = TRUE)
e = MASS::mvrnorm(n = nsample, mu = c(0, 0), Sigma = Sigma)
Y <- matrix(data = 0, nrow = nsample, ncol = 2)
colnames(Y) <- c("Y", "X")
Y[1, ] = alpha + e[1, ]
Y[2, ] = alpha + Theta_1 %*% Y[1, ] + e[2, ]
for(j in 3:nsample){
Y[j, ] = alpha + Theta_1 %*% Y[j-1, ] + Theta_2 %*% Y[j-2, ] + e[j, ]
}
plot.ts(Y, plot.type = "multiple")
We could test for a unit root any number of ways: either sequentially by creating dynlm
equations, using ur.df
, or adf.test
:
suppressPackageStartupMessages({library(urca)})
adf_test_Y = ur.df(Y[, 1], type = "drift", lags = 5, selectlags = "BIC")
summary(adf_test_Y)
##
## ###############################################
## # Augmented Dickey-Fuller Test Unit Root Test #
## ###############################################
##
## Test regression drift
##
##
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + z.diff.lag)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.928 -2.091 -0.126 1.894 9.443
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.14957 0.43437 9.553 < 2e-16 ***
## z.lag.1 -0.48234 0.04807 -10.035 < 2e-16 ***
## z.diff.lag1 0.23322 0.04467 5.221 2.64e-07 ***
## z.diff.lag2 -0.16735 0.04450 -3.761 0.00019 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.939 on 490 degrees of freedom
## Multiple R-squared: 0.3082, Adjusted R-squared: 0.304
## F-statistic: 72.78 on 3 and 490 DF, p-value: < 2.2e-16
##
##
## Value of test-statistic is: -10.0349 50.3495
##
## Critical values for test statistics:
## 1pct 5pct 10pct
## tau2 -3.44 -2.87 -2.57
## phi1 6.47 4.61 3.79
adf_test_X = ur.df(Y[, 2], type = "drift", lags = 5, selectlags = "BIC")
summary(adf_test_X)
##
## ###############################################
## # Augmented Dickey-Fuller Test Unit Root Test #
## ###############################################
##
## Test regression drift
##
##
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + z.diff.lag)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.3332 -1.8041 -0.1827 1.9078 9.5116
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.96144 0.45762 6.471 2.36e-10 ***
## z.lag.1 -0.30167 0.04459 -6.765 3.81e-11 ***
## z.diff.lag1 -0.30417 0.05038 -6.038 3.09e-09 ***
## z.diff.lag2 -0.16015 0.04448 -3.600 0.00035 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.951 on 490 degrees of freedom
## Multiple R-squared: 0.2824, Adjusted R-squared: 0.278
## F-statistic: 64.28 on 3 and 490 DF, p-value: < 2.2e-16
##
##
## Value of test-statistic is: -6.765 22.8827
##
## Critical values for test statistics:
## 1pct 5pct 10pct
## tau2 -3.44 -2.87 -2.57
## phi1 6.47 4.61 3.79
Because the t value
of z.lag.1
is -10.035 < -2.87
- we reject the null hypothesis of a unit root in \(Y_t\). Since -6.765 < -2.87
- we reject the null hypothesis of a unit root in \(X_t\).
So, both \(Y_t\) and \(X_t\) are \(I(0)\).
We can automate the order selection via VARselect
using information criteria. Note that we can specify type = c("const", "trend", "both", "none")
if we want to include a constant or a trend (or both). Because we do not see any patterns of a trend, but we do see a variance around a non-zero mean, we select to include a constant:
suppressPackageStartupMessages({library(vars)})
VARselect(Y, lag.max = 5, type = "const")
## $selection
## AIC(n) HQ(n) SC(n) FPE(n)
## 2 2 2 2
##
## $criteria
## 1 2 3 4 5
## AIC(n) 4.112833 3.572596 3.585883 3.596214 3.606383
## HQ(n) 4.132840 3.605941 3.632565 3.656235 3.679741
## SC(n) 4.163797 3.657537 3.704799 3.749107 3.793252
## FPE(n) 61.119624 35.608981 36.085331 36.460226 36.833118
We see that a \(VAR(2)\) model is recommended.
Regarding Johansens procedure - the matrix \(\Pi\) has full rank when the process is stationary. Johansen procedure requires that the time series are checked for unit roots first, since it was not designed to be used for unit root checks. We may still try to carry out the test.
Note that we can examine an automated procedure which chooses the “best” VEC model (that is, its rank
and lag
) according to the information criteria (this approach is not based on the Johansen test):
suppressPackageStartupMessages({library(tsDyn)})
#Suppress function warnings:
suppressWarnings({
rank.select(Y, lag.max = 10, include = "const")
})
## Best AIC: rank= 2 lag= 1
## Best BIC: rank= 2 lag= 1
## Best HQ : rank= 2 lag= 1
This suggests that the \(\Pi_{2 \times 2}\) matrix would be full rank, indicating that both of our time series are stationary. If we look at the Johansen-Procedure
- since we have a lag = 2
in VAR via VARselect()
and the suggested lag = 1
in VECM via rank.select()
, we choose lag = 1
for the VECM order:
suppressWarnings({
temp_mdl <- VECM(Y, include = "const", lag = 1, estim = "ML")
summary(rank.test(temp_mdl))
})
## r trace trace_pval trace_pval_T eigen eigen_pval
## 1 0 342.81505 < 0.001 < 0.001 277.60295 < 0.001
## 2 1 65.21211 < 0.001 < 0.001 65.21211 < 0.001
Note that once we specify the \(VECM\) model itself, we cannot directly test for a full rank as the VECM model itself assumes that the data has a unit root (and thus the \(\Pi\) matrix does not have the full rank). Thankfully the output provides two tests:
trace
- tests the null hypothesis of \(rank = r\) against the alternative: \(r < rank \leq K\) (where K
is the number of rows in the \(\Pi\) matrix, i.e. the full rank), i.e. against the alternative that the system is stationary;eigenvalue
- tests the null hypothesis of \(rank = r\) against the alternative: \(rank = r + 1\).We see that for \(r = 1\) the trace_pval < 0.05
, so we reject the null hypothesis of \(r = 1\) (and that \(r = 0\)), which means that the system is stationary (because the only other option is \(r = 2\) - the full rank). Furthermore, eigen_pval < 0.05
, so we reject the null hypothesis, that \(r = 1\), against our alternative, that \(r = 1 + 1 = 2\).
Similarly, with ca.jo
(note that K
from ca.jo
corresponds to lag + 1
in rank.test
). Because we specified a \(VECM(1)\) for our rank.test
, we set K = 2
:
temp_tst = ca.jo(Y, K = 2, type = "trace")
paste0(temp_tst@type, ", ", temp_tst@model)
## [1] "trace statistic, with linear trend"
print(cbind(capture.output(summary(temp_tst))[11:16]))
## [,1]
## [1,] "Values of teststatistic and critical values of test:"
## [2,] ""
## [3,] " test 10pct 5pct 1pct"
## [4,] "r <= 1 | 65.21 6.50 8.18 11.65"
## [5,] "r = 0 | 342.82 15.66 17.95 23.52"
## [6,] ""
temp_tst = ca.jo(Y, K = 2, type = "eigen")
paste0(temp_tst@type, ", ", temp_tst@model)
## [1] "maximal eigenvalue statistic (lambda max), with linear trend"
print(cbind(capture.output(summary(temp_tst))[11:16]))
## [,1]
## [1,] "Values of teststatistic and critical values of test:"
## [2,] ""
## [3,] " test 10pct 5pct 1pct"
## [4,] "r <= 1 | 65.21 6.50 8.18 11.65"
## [5,] "r = 0 | 277.60 12.91 14.90 19.19"
## [6,] ""
We see that the results are the same - we reject \(H_0: r = 0\) and \(H_0: r \leq 1\) in both trace
and eigenvalue
tests, so our cointegration matrix has the full rank, i.e. the system is stationary.
Now that we have established that both of our time series are \(I(0)\), we can estimate a VAR model. Based on VARselect
, we have chosen a \(VAR(2)\) model with a constant:
VAR_mdl = VAR(Y, p = 2, type = "const")
summary(VAR_mdl)
##
## VAR Estimation Results:
## =========================
## Endogenous variables: Y, X
## Deterministic variables: const
## Sample size: 498
## Log Likelihood: -2292.506
## Roots of the characteristic polynomial:
## 0.7214 0.4899 0.4899 0.07252
## Call:
## VAR(y = Y, p = 2, type = "const")
##
##
## Estimation results for equation Y:
## ==================================
## Y = Y.l1 + X.l1 + Y.l2 + X.l2 + const
##
## Estimate Std. Error t value Pr(>|t|)
## Y.l1 0.51225 0.05000 10.246 < 2e-16 ***
## X.l1 0.34573 0.04846 7.135 3.48e-12 ***
## Y.l2 -0.22310 0.04790 -4.658 4.12e-06 ***
## X.l2 -0.10460 0.05046 -2.073 0.0387 *
## const 3.76166 0.42008 8.955 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## Residual standard error: 2.832 on 493 degrees of freedom
## Multiple R-Squared: 0.418, Adjusted R-squared: 0.4133
## F-statistic: 88.51 on 4 and 493 DF, p-value: < 2.2e-16
##
##
## Estimation results for equation X:
## ==================================
## X = Y.l1 + X.l1 + Y.l2 + X.l2 + const
##
## Estimate Std. Error t value Pr(>|t|)
## Y.l1 -0.22639 0.05072 -4.463 1.00e-05 ***
## X.l1 0.54255 0.04916 11.037 < 2e-16 ***
## Y.l2 0.29039 0.04859 5.976 4.39e-09 ***
## X.l2 0.07986 0.05119 1.560 0.119
## const 3.17114 0.42616 7.441 4.47e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## Residual standard error: 2.873 on 493 degrees of freedom
## Multiple R-Squared: 0.3778, Adjusted R-squared: 0.3727
## F-statistic: 74.83 on 4 and 493 DF, p-value: < 2.2e-16
##
##
##
## Covariance matrix of residuals:
## Y X
## Y 8.023 5.602
## X 5.602 8.257
##
## Correlation matrix of residuals:
## Y X
## Y 1.0000 0.6883
## X 0.6883 1.0000
Now we can write the model:
\[ \begin{cases} Y_t &= 3.76166 + 0.51225 Y_{t-1} + 0.34573 X_{t-1} - 0.22310 Y_{t-2} - 0.10460 X_{t-2}\\ X_t &= 3.17114 - 0.22639 Y_{t-1} + 0.54255 X_{t-1} + 0.29039 Y_{t-2} + 0.07986 X_{t-2} \end{cases} \]
We could also re-write the system in a matrix form for easier comparison with the actual model (try it!). We see that the second equation is not as accurately estimated as the first one (note the insignificant \(X_{t-2}\) coefficient).
We can estimate a restricted VAR, either by manually imposing zero restrictions, or automatically, based on coefficient significance:
VAR_mdl_restricted = restrict(VAR_mdl, method = "ser", thresh = 2.0, resmat = NULL)
summary(VAR_mdl_restricted)
##
## VAR Estimation Results:
## =========================
## Endogenous variables: Y, X
## Deterministic variables: const
## Sample size: 498
## Log Likelihood: -2294.831
## Roots of the characteristic polynomial:
## 0.6746 0.5249 0.5249 0.1855
## Call:
## VAR(y = Y, p = 2, type = "const")
##
##
## Estimation results for equation Y:
## ==================================
## Y = Y.l1 + X.l1 + Y.l2 + X.l2 + const
##
## Estimate Std. Error t value Pr(>|t|)
## Y.l1 0.51225 0.05000 10.246 < 2e-16 ***
## X.l1 0.34573 0.04846 7.135 3.48e-12 ***
## Y.l2 -0.22310 0.04790 -4.658 4.12e-06 ***
## X.l2 -0.10460 0.05046 -2.073 0.0387 *
## const 3.76166 0.42008 8.955 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## Residual standard error: 2.832 on 493 degrees of freedom
## Multiple R-Squared: 0.418, Adjusted R-squared: 0.4133
## F-statistic: 88.51 on 4 and 493 DF, p-value: < 2.2e-16
##
##
## Estimation results for equation X:
## ==================================
## X = Y.l1 + X.l1 + Y.l2 + const
##
## Estimate Std. Error t value Pr(>|t|)
## Y.l1 -0.22379 0.05077 -4.408 1.28e-05 ***
## X.l1 0.57220 0.04540 12.603 < 2e-16 ***
## Y.l2 0.32956 0.04166 7.910 1.70e-14 ***
## const 3.30262 0.41835 7.894 1.90e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## Residual standard error: 2.878 on 494 degrees of freedom
## Multiple R-Squared: 0.9248, Adjusted R-squared: 0.9242
## F-statistic: 1520 on 4 and 494 DF, p-value: < 2.2e-16
##
##
##
## Covariance matrix of residuals:
## Y X
## Y 8.023 5.602
## X 5.602 8.297
##
## Correlation matrix of residuals:
## Y X
## Y 1.0000 0.6866
## X 0.6866 1.0000
We now see our restricted VAR model:
\[ \begin{cases} Y_t &= 3.76166 + 0.51225 Y_{t-1} + 0.34573 X_{t-1} - 0.22310 Y_{t-2} - 0.10460 X_{t-2}\\ X_t &= 3.30262 - 0.22379 Y_{t-1} + 0.57220 X_{t-1} + 0.32956 Y_{t-2} \end{cases} \]
We also note the estimated Covariance matrix of residuals
:
\[ \widehat{\Sigma}_\epsilon = \begin{pmatrix} 8.023 & 5.602 \\ 5.602 & 8.297 \end{pmatrix} \]
which is quite close to the true \(\Sigma_\epsilon\).
We can also look at the fitted values:
plot(VAR_mdl_restricted, names = "Y")
plot(VAR_mdl_restricted, names = "X")
Which shows us - the actual and fitted values, residuals and zero-mean and residual ACF and PACF. The residuals appear to be white noise (though, the 12-th lag seems to be close to insignificance, it is small and at a very high lag order so we assume that the residuals are WN.)
We can quickly plot the IRF
(for completeness, we will work with the latest model - the restricted VAR):
plot(irf(VAR_mdl_restricted, impulse = "Y", boot = FALSE))
plot(irf(VAR_mdl_restricted, impulse = "X", boot = FALSE))
Finally, we can forecast the model 12
periods ahead. We will include the last ~50
historical observations in order to see the forecasts clearly:
VAR_mdl_restricted <- predict(VAR_mdl_restricted, n.ahead = 12, ci = 0.95)
plot(VAR_mdl_restricted, xlim = c(450, 520))
\[ \begin{pmatrix} Y_{t} \\ X_{t} \end{pmatrix} = \begin{pmatrix} 3 \\ 2 \end{pmatrix} + \begin{pmatrix} 0.4 & 0.5 \\ -0.3 & 0.8 \end{pmatrix} \begin{pmatrix} Y_{t-1} \\ X_{t-1} \end{pmatrix} + \begin{pmatrix} 0.6 & -0.5 \\ 0.3 & 0.2 \end{pmatrix} \begin{pmatrix} Y_{t-2} \\ X_{t-2} \end{pmatrix} + \begin{pmatrix} \epsilon_{1,t} \\ \epsilon_{2,t} \end{pmatrix} ,\quad \begin{pmatrix} \epsilon_{1,t} \\ \epsilon_{2,t} \end{pmatrix} \sim \mathcal{N} \left( \begin{bmatrix} 0 \\ 0 \end{bmatrix} , \begin{bmatrix} 8 & 6 \\ 6 & 9 \end{bmatrix} \right) \]
To verify that this VAR process has a unit root, we can use Python
:
import numpy as np
from sympy import *
#
Theta_1 = np.array([[0.4, 0.5], [-0.3, 0.8]])
Theta_2 = np.array([[0.6, -0.5], [0.3, 0.2]])
#
z = Symbol('z')
D = Matrix([[1, 0], [0, 1]]) - Matrix(Theta_1) * z - Matrix(Theta_2) * (z**2)
#
temp_roots = roots(D.det(), z)
temp_roots = [*temp_roots]
#
for j in temp_roots:
print("z = ", N(j, 4), "=> |z| = ", N(Abs(j), 4))
## z = 1.000 => |z| = 1.000
## z = 1.000 => |z| = 1.000
## z = -1.481 - 1.228*I => |z| = 1.924
## z = -1.481 + 1.228*I => |z| = 1.924
We see that this VAR specification has a couple of unit roots (as we would expect, since we will later see that both \(Y\) and \(X\) have unit roots). Furthermore, if we rewrite the model in differences, then:
\[ \Pi = \Theta_1 + \Theta_2 - I = \begin{pmatrix} 0.4 + 0.6 - 1 & 0.5 - 0.5\\ -0.3 +0.3 & 0.8 +0.2 - 1 \end{pmatrix} = \begin{pmatrix} 0 & 0 \\ 0 & 0 \end{pmatrix} \] i.e. we have no cointegration relations.
set.seed(123)
nsample = 100
alpha = c(3, 2)
Theta_1 = matrix(c(0.4, 0.5, -0.3, 0.8), ncol = 2, byrow = TRUE)
Theta_2 = matrix(c(0.6, -0.5, 0.3, 0.2), ncol = 2, byrow = TRUE)
Sigma = matrix(c(8, 6, 6, 9), ncol = 2, byrow = TRUE)
e = MASS::mvrnorm(n = nsample, mu = c(0, 0), Sigma = Sigma)
Y <- matrix(data = 0, nrow = nsample, ncol = 2)
colnames(Y) <- c("Y", "X")
Y[1, ] = alpha + e[1, ]
Y[2, ] = alpha + Theta_1 %*% Y[1, ] + e[2, ]
for(j in 3:nsample){
Y[j, ] = alpha + Theta_1 %*% Y[j-1, ] + Theta_2 %*% Y[j-2, ] + e[j, ]
}
plot.ts(Y, plot.type = "multiple")
We repeat the unit root test. Note that from the plots it is difficult to say if there is a trend in the data, or if it is because of a unit root. So, we present the results of including type = trend
in the ADF test equation:
suppressPackageStartupMessages({library(urca)})
adf_test_Y = ur.df(Y[, 1], type = "trend", lags = 5, selectlags = "BIC")
summary(adf_test_Y)
##
## ###############################################
## # Augmented Dickey-Fuller Test Unit Root Test #
## ###############################################
##
## Test regression trend
##
##
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + tt + z.diff.lag)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.5726 -1.9854 -0.1676 1.4915 7.8285
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.99419 0.70613 4.240 5.4e-05 ***
## z.lag.1 -0.16074 0.07512 -2.140 0.0351 *
## tt 0.38736 0.17599 2.201 0.0303 *
## z.diff.lag -0.24840 0.10450 -2.377 0.0196 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.918 on 90 degrees of freedom
## Multiple R-squared: 0.165, Adjusted R-squared: 0.1372
## F-statistic: 5.93 on 3 and 90 DF, p-value: 0.0009742
##
##
## Value of test-statistic is: -2.1397 25.6513 2.8283
##
## Critical values for test statistics:
## 1pct 5pct 10pct
## tau3 -4.04 -3.45 -3.15
## phi2 6.50 4.88 4.16
## phi3 8.73 6.49 5.47
adf_test_X = ur.df(Y[, 2], type = "trend", lags = 5, selectlags = "BIC")
summary(adf_test_X)
##
## ###############################################
## # Augmented Dickey-Fuller Test Unit Root Test #
## ###############################################
##
## Test regression trend
##
##
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + tt + z.diff.lag)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.0668 -1.9322 -0.1072 1.7905 8.3379
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.19265 0.75749 0.254 0.799827
## z.lag.1 -0.12387 0.06785 -1.826 0.071268 .
## tt 0.16200 0.07850 2.064 0.041958 *
## z.diff.lag1 -0.38782 0.10964 -3.537 0.000645 ***
## z.diff.lag2 -0.24106 0.10422 -2.313 0.023021 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.702 on 89 degrees of freedom
## Multiple R-squared: 0.2485, Adjusted R-squared: 0.2147
## F-statistic: 7.356 on 4 and 89 DF, p-value: 3.639e-05
##
##
## Value of test-statistic is: -1.8256 13.6211 3.497
##
## Critical values for test statistics:
## 1pct 5pct 10pct
## tau3 -4.04 -3.45 -3.15
## phi2 6.50 4.88 4.16
## phi3 8.73 6.49 5.47
In both cases the t value
of z.lag.1
is greater than the 5% critical value. The same results will be present when selecting type = "drift"
.
To be sure that the data is \(I(1)\), i.e. that the differences are stationary, we carry out the ADF on the differenced data as well:
plot.ts(diff(Y), plot.type = "multiple")
par(mfrow = c(2, 2))
forecast::Acf(diff(Y)[, 1], main = "diff(Y)")
forecast::Pacf(diff(Y)[, 1], main = "diff(Y)")
forecast::Acf(diff(Y)[, 2], main = "diff(X)")
forecast::Pacf(diff(Y)[, 2], main = "diff(X)")
Note that the average of the differenced data:
colMeans(diff(Y))
## Y X
## 2.474767 1.180927
is not zero, so we will use type = "drift"
:
adf_test_Y = ur.df(diff(Y[, 1]), type = "drift", lags = 5, selectlags = "BIC")
summary(adf_test_Y)
##
## ###############################################
## # Augmented Dickey-Fuller Test Unit Root Test #
## ###############################################
##
## Test regression drift
##
##
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + z.diff.lag)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.2217 -2.0774 0.1334 1.6366 7.1693
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.7757 0.5163 7.313 1.04e-10 ***
## z.lag.1 -1.5701 0.1695 -9.263 9.78e-15 ***
## z.diff.lag 0.1669 0.1034 1.613 0.11
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.948 on 90 degrees of freedom
## Multiple R-squared: 0.6829, Adjusted R-squared: 0.6759
## F-statistic: 96.91 on 2 and 90 DF, p-value: < 2.2e-16
##
##
## Value of test-statistic is: -9.2626 42.9299
##
## Critical values for test statistics:
## 1pct 5pct 10pct
## tau2 -3.51 -2.89 -2.58
## phi1 6.70 4.71 3.86
adf_test_X = ur.df(diff(Y[, 2]), type = "drift", lags = 5, selectlags = "BIC")
summary(adf_test_X)
##
## ###############################################
## # Augmented Dickey-Fuller Test Unit Root Test #
## ###############################################
##
## Test regression drift
##
##
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + z.diff.lag)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.1243 -1.7623 0.1669 1.8911 8.6072
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.9798 0.3514 5.634 1.99e-07 ***
## z.lag.1 -1.7205 0.1685 -10.210 < 2e-16 ***
## z.diff.lag 0.2703 0.1016 2.661 0.00923 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.791 on 90 degrees of freedom
## Multiple R-squared: 0.6981, Adjusted R-squared: 0.6914
## F-statistic: 104.1 on 2 and 90 DF, p-value: < 2.2e-16
##
##
## Value of test-statistic is: -10.2096 52.1362
##
## Critical values for test statistics:
## 1pct 5pct 10pct
## tau2 -3.51 -2.89 -2.58
## phi1 6.70 4.71 3.86
Because z.lag.1
has t value < -2.89
, we reject the null hypothesis of a unit root in both cases. So, the differences do not have a unit root.
We have confirmed that \(Y_t\) and \(X_t\) are \(I(1)\).
If \(rank(\Pi) = 0\) - then there are no cointegration relations in the system.
We begin with the approach, which is not based on the Johansen test. Note that depending on the options of our choice, we may get different results:
constant
:suppressWarnings({
rank.select(Y, lag.max = 10, include = "const")
})
## Best AIC: rank= 1 lag= 1
## Best BIC: rank= 0 lag= 1
## Best HQ : rank= 0 lag= 1
We may get a VECM(1) without any cointegration relations (via BIC
).
trend
:suppressWarnings({
rank.select(Y, lag.max = 10, include = "trend")
})
## Best AIC: rank= 2 lag= 1
## Best BIC: rank= 2 lag= 1
## Best HQ : rank= 2 lag= 1
We may get a full rank matrix - i.e. a stationary VAR.
const
and a trend
:suppressWarnings({
rank.select(Y, lag.max = 10, include = "both")
})
## Best AIC: rank= 2 lag= 1
## Best BIC: rank= 0 lag= 1
## Best HQ : rank= 0 lag= 1
Then we again get a \(VECM(1)\) without any cointegration relations (via BIC
).
In the second case, we would have made the incorrect conclusion that our series are stationary, when in fact we know (and have tested via ADF
) that both \(X\) and \(Y\) have unit roots! For this reason, you should always perform a number of different tests to make sure that their results align with each other. In our case: the data are \(I(1)\), so \(\Pi\) cannot have a full rank!
Let’s use the Johansen procedure:
suppressWarnings({
temp_mdl <- VECM(Y, include = "const", lag = 1, estim = "ML")
summary(rank.test(temp_mdl))
})
## r trace trace_pval trace_pval_T eigen eigen_pval
## 1 0 9.7548558 0.3056 0.3149 9.4104045 0.2592
## 2 1 0.3444513 0.5573 0.5631 0.3444513 0.5573
In this case, we see that both the trace
and the eigen
tests do not reject the null hypothesis of \(H_0: r = 0\). We stop after we do not reject the null for the smallest value of \(r\), i.e. we do not need to look at \(H_0: r = 1\) anymore. We can conclude that there are no cointegration relations in the equation.
suppressWarnings({
temp_mdl <- VECM(Y, include = "trend", lag = 1, estim = "ML")
summary(rank.test(temp_mdl))
})
## r trace trace_pval trace_pval_T eigen eigen_pval
## 1 0 12.383172 0.04803 0.05035 7.764731 0.19427
## 2 1 4.618441 0.03588 0.03735 4.618441 0.03758
In this case, we do not reject the null hypothesis that \(rank = 0\) (and we reject the null hypothesis that \(rank = 1\)). The results indicate that there are no cointegration relations in the system.
suppressWarnings({
temp_mdl <- VECM(Y, include = "both", lag = 1, estim = "ML")
summary(rank.test(temp_mdl))
})
## r trace trace_pval trace_pval_T eigen eigen_pval
## 1 0 15.431194 0.12413 0.12483 9.842641 0.41933
## 2 1 5.588553 0.01808 0.01964 5.588553 0.01808
We see that we do not reject the null hypothesis that \(rank = 0\) (and reject the null hypothesis that \(rank = 1\)). These results indicate that there are no cointegration relations.
We can also use the ca.jo
function:
temp_tst = ca.jo(Y, K = 2, type = "trace", ecdet = "const")
paste0(temp_tst@type, ", ", temp_tst@model)
## [1] "trace statistic, without linear trend and constant in cointegration"
print(cbind(capture.output(summary(temp_tst))[11:16]))
## [,1]
## [1,] "Values of teststatistic and critical values of test:"
## [2,] ""
## [3,] " test 10pct 5pct 1pct"
## [4,] "r <= 1 | 6.30 7.52 9.24 12.97"
## [5,] "r = 0 | 80.15 17.85 19.96 24.60"
## [6,] ""
We reject we reject \(H_0: r = 0\) and do not reject \(H_0: r \leq 1\), i.e. we do not reject \(H_0: r = 1\). This would indicate tat we have one cointegration relation.
Unfortunately, we do get some conflicting test results. However, most of them suggest a zero-rank for \(\Pi\), so we will go with those results (Note: a larger sample size could have more conclusive results).
Since we have a bivariate \(VAR\) model, we can carry out the EG procedure, or the Phillips-Ouliaris cointegration test.
cy.po <- ca.po(Y, demean = "trend", type="Pz")
summary(cy.po)
Because the test-statistic = 17.2675 < 81.3812
, we do not reject the null hypothesis that \(Y_t\) and \(X_t\) are not cointegrated.
Because the variables are \(I(1)\) and there is no cointegration, a \(VAR\) model for the differences is appropriate.
Because VARselect
suggested to use a \(VAR(2)\) model for the levels, \(\vec{Y}_t\). Furthermore:
dY <- diff(Y)
VARselect(dY, lag.max = 5)
## $selection
## AIC(n) HQ(n) SC(n) FPE(n)
## 3 1 1 3
##
## $criteria
## 1 2 3 4 5
## AIC(n) 3.472910 3.491372 3.454494 3.506594 3.582898
## HQ(n) 3.538483 3.600660 3.607497 3.703312 3.823331
## SC(n) 3.635248 3.761935 3.833283 3.993608 4.178138
## FPE(n) 32.231807 32.837559 31.659764 33.373786 36.055261
We do note, that \(AIC\) suggests a \(VAR(3)\), we will create a \(VAR(1)\) model for \(\Delta \vec{Y}_t\) (based on \(BIC\) and on the VARselect
results on Y).
d_VAR <- VAR(dY, p = 1)
summary(d_VAR)
##
## VAR Estimation Results:
## =========================
## Endogenous variables: Y, X
## Deterministic variables: const
## Sample size: 98
## Log Likelihood: -441.282
## Roots of the characteristic polynomial:
## 0.5252 0.5252
## Call:
## VAR(y = dY, p = 1)
##
##
## Estimation results for equation Y:
## ==================================
## Y = Y.l1 + X.l1 + const
##
## Estimate Std. Error t value Pr(>|t|)
## Y.l1 -0.6628 0.1122 -5.909 5.35e-08 ***
## X.l1 0.5356 0.1155 4.637 1.13e-05 ***
## const 3.4936 0.3502 9.975 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## Residual standard error: 2.704 on 95 degrees of freedom
## Multiple R-Squared: 0.2765, Adjusted R-squared: 0.2613
## F-statistic: 18.15 on 2 and 95 DF, p-value: 2.107e-07
##
##
## Estimation results for equation X:
## ==================================
## X = Y.l1 + X.l1 + const
##
## Estimate Std. Error t value Pr(>|t|)
## Y.l1 -0.3794 0.1139 -3.330 0.00124 **
## X.l1 -0.1095 0.1173 -0.933 0.35298
## const 2.2571 0.3558 6.345 7.54e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## Residual standard error: 2.746 on 95 degrees of freedom
## Multiple R-Squared: 0.2169, Adjusted R-squared: 0.2004
## F-statistic: 13.16 on 2 and 95 DF, p-value: 9.03e-06
##
##
##
## Covariance matrix of residuals:
## Y X
## Y 7.31 5.040
## X 5.04 7.542
##
## Correlation matrix of residuals:
## Y X
## Y 1.0000 0.6788
## X 0.6788 1.0000
The estimated model is:
\[ \begin{cases} \Delta Y_t &= 3.4936 - 0.6628 \Delta Y_{t-1} + 0.5356 \Delta X_{t-1}\\ \Delta X_t &= 2.2571 - 0.3794 \Delta Y_{t-1} - 0.1095 \Delta X_{t-1} \end{cases} \]
Compared to the true model for differences:
\[ \begin{cases} \Delta Y_t &= 3 - 0.6 \Delta Y_{t-1} + 0.5 \Delta X_{t-1} + \epsilon_{1, t}\\ \Delta X_t &= 2 - 0.3 \Delta Y_{t-1} - 0.2 \Delta X_{t-1} + \epsilon_{2, t} \end{cases} \] The coefficients are similar.
Note that this is the same \(VECM\) model with \(\Pi = \vec{0}\), i.e. without the cointegration matrix.
As before, we can create a reduced VAR by removing the insignificant variables:
dVAR_rest = restrict(d_VAR, method = "ser", thresh = 2.0, resmat = NULL)
summary(dVAR_rest)
##
## VAR Estimation Results:
## =========================
## Endogenous variables: Y, X
## Deterministic variables: const
## Sample size: 98
## Log Likelihood: -442.109
## Roots of the characteristic polynomial:
## 0.489 0.489
## Call:
## VAR(y = dY, p = 1)
##
##
## Estimation results for equation Y:
## ==================================
## Y = Y.l1 + X.l1 + const
##
## Estimate Std. Error t value Pr(>|t|)
## Y.l1 -0.6628 0.1122 -5.909 5.35e-08 ***
## X.l1 0.5356 0.1155 4.637 1.13e-05 ***
## const 3.4936 0.3502 9.975 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## Residual standard error: 2.704 on 95 degrees of freedom
## Multiple R-Squared: 0.2765, Adjusted R-squared: 0.2613
## F-statistic: 18.15 on 2 and 95 DF, p-value: 2.107e-07
##
##
## Estimation results for equation X:
## ==================================
## X = Y.l1 + const
##
## Estimate Std. Error t value Pr(>|t|)
## Y.l1 -0.44639 0.08843 -5.048 2.12e-06 ***
## const 2.29121 0.35364 6.479 3.95e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## Residual standard error: 2.744 on 96 degrees of freedom
## Multiple R-Squared: 0.3128, Adjusted R-squared: 0.2984
## F-statistic: 21.84 on 2 and 96 DF, p-value: 1.519e-08
##
##
##
## Covariance matrix of residuals:
## Y X
## Y 7.31 5.040
## X 5.04 7.611
##
## Correlation matrix of residuals:
## Y X
## Y 1.0000 0.6757
## X 0.6757 1.0000
print(BIC(d_VAR))
## [1] 910.0747
print(BIC(dVAR_rest))
## [1] 907.1424
Note that the reduced \(VAR\) model has a lower \(BIC\) value.
The residuals of the unrestricted \(VAR\):
suppressPackageStartupMessages({library("forecast")})
plot.ts(resid(d_VAR))
Acf(resid(d_VAR))
and the restricted \(VAR\)
suppressPackageStartupMessages({library("forecast")})
plot.ts(resid(dVAR_rest))
Acf(resid(dVAR_rest))
Appear to be white noise, indicating that the model specification is acceptable.
Note that we have modeled the differences of our data. Unlike Arma
or auto.arma
we cannot specify to differentiate our data when creating the model - instead we already passed the differentiated data to VAR()
. We estimated a \(VAR(2)\) on \(\Delta Y_t\). This means that \(\widehat{Y}_{3} = Y_{2} + \Delta \widehat{Y}_{3} = Y_{2} + \widehat{Y}_{3} - \widehat{Y}_{2}\); \(\widehat{Y}_{4} = Y_{2} + \Delta \widehat{Y}_{3} + \Delta \widehat{Y}_{4}\), etc.:
fitted_vals = fitted(dVAR_rest)
fitted_vals[, 1] <- Y[2, 1] + cumsum(fitted_vals[, 1])
fitted_vals[, 2] <- Y[2, 2] + cumsum(fitted_vals[, 2])
fitted_vals <- ts(fitted_vals, start = 3)
par(mfrow = c(2, 1))
plot.ts(Y[, 1], main = "Y")
lines(fitted_vals[, 1], col = "red")
plot.ts(Y[, 2], main = "X")
lines(fitted_vals[, 2], col = "red")
The same methodology is applied to the forecasts: \[Y_{T+h} = Y_T + \Delta Y_{T+1} + ... + \Delta Y_{T+h}\]
h = 20
forc_vals <- predict(dVAR_rest, n.ahead = h, ci = 0.95)$fcst
forc_vals <- data.frame(Y = forc_vals$Y[, 1], X = forc_vals$X[, 1])
forc_vals[, 1] <- Y[nrow(Y), 1] + cumsum(forc_vals$Y)
forc_vals[, 2] <- Y[nrow(Y), 2] + cumsum(forc_vals$X)
forc_vals <- ts(forc_vals, start = nrow(Y) + 1)
plot(ts(Y),
plot.type = "single", col = c(2, 4),
xlim = c(0, nrow(Y) + h),
ylim = c(min(Y), max(forc_vals)))
lines(forc_vals[, 1], col = "red", lty = 2)
lines(forc_vals[, 2], col = "blue", lty = 2)
legend(0, 300, c("Y historical", "X historical", "Y forecast", "X forecast"),
lty = c(1, 1, 2, 2), col = c(2, 4, 2, 4))
The final example concerns the cointegrated VAR with \(I(1)\) variables.
\[ \begin{pmatrix} Y_{t} \\ X_{t} \end{pmatrix} = \begin{pmatrix} 3 \\ 2 \end{pmatrix} + \begin{pmatrix} 0.0 & 0.5 \\ 0.0 & 0.5 \end{pmatrix} \begin{pmatrix} Y_{t-1} \\ X_{t-1} \end{pmatrix} + \begin{pmatrix} 0.0 & 0.5 \\ 0.0 & 0.5 \end{pmatrix} \begin{pmatrix} Y_{t-2} \\ X_{t-2} \end{pmatrix} + \begin{pmatrix} \epsilon_{1,t} \\ \epsilon_{2,t} \end{pmatrix} ,\quad \begin{pmatrix} \epsilon_{1,t} \\ \epsilon_{2,t} \end{pmatrix} \sim \mathcal{N} \left( \begin{bmatrix} 0 \\ 0 \end{bmatrix} , \begin{bmatrix} 8 & 6 \\ 6 & 9 \end{bmatrix} \right) \]
The process has a unit root:
import numpy as np
from sympy import *
#
Theta_1 = np.array([[0.0, 0.5], [0.0, 0.5]])
Theta_2 = np.array([[0.0, 0.5], [0.0, 0.5]])
#
z = Symbol('z')
D = Matrix([[1, 0], [0, 1]]) - Matrix(Theta_1) * z - Matrix(Theta_2) * (z**2)
#
temp_roots = roots(D.det(), z)
temp_roots = [*temp_roots]
#
for j in temp_roots:
print("z = ", N(j, 4), "=> |z| = ", N(Abs(j), 4))
## z = -2.000 => |z| = 2.000
## z = 1.000 => |z| = 1.000
And the cointegration matrix:
\[ \Pi = \Theta_1 + \Theta_2 - I = \begin{pmatrix} 0.0 + 0.0 - 1 & 0.5 + 0.5\\ 0.0 + 0.0 & 0.5 + 0.5 - 1 \end{pmatrix} = \begin{pmatrix} -1 & 1 \\ 0 & 0 \end{pmatrix} \Rightarrow rank(\Pi) = 1 \]
shows that the system has one cointegration relation.
P.S. From the second form of \(VECM\) representation we have \(\Gamma_1 = -\Theta_2\)..
set.seed(123)
nsample = 100
alpha = c(3, 2)
Theta_1 = matrix(c(0, 0.5, 0, 0.5), ncol = 2, byrow = TRUE)
Theta_2 = matrix(c(0, 0.5, 0, 0.5), ncol = 2, byrow = TRUE)
Sigma = matrix(c(8, 6, 6, 9), ncol = 2, byrow = TRUE)
e = MASS::mvrnorm(n = nsample, mu = c(0, 0), Sigma = Sigma)
Y <- matrix(data = 0, nrow = nsample, ncol = 2)
colnames(Y) <- c("Y", "X")
Y[1, ] = alpha + e[1, ]
Y[2, ] = alpha + Theta_1 %*% Y[1, ] + e[2, ]
for(j in 3:nsample){
Y[j, ] = alpha + Theta_1 %*% Y[j-1, ] + Theta_2 %*% Y[j-2, ] + e[j, ]
}
plot.ts(Y, plot.type = "multiple")
We repeat the unit root test as before:
suppressPackageStartupMessages({library(urca)})
adf_test_Y = ur.df(Y[, 1], type = "trend", lags = 5, selectlags = "BIC")
summary(adf_test_Y)
##
## ###############################################
## # Augmented Dickey-Fuller Test Unit Root Test #
## ###############################################
##
## Test regression trend
##
##
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + tt + z.diff.lag)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.1814 -2.0352 -0.1632 1.9533 7.9257
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.61037 0.74849 0.815 0.416958
## z.lag.1 -0.17328 0.07576 -2.287 0.024528 *
## tt 0.25686 0.10532 2.439 0.016698 *
## z.diff.lag -0.35091 0.10068 -3.486 0.000761 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.164 on 90 degrees of freedom
## Multiple R-squared: 0.2486, Adjusted R-squared: 0.2236
## F-statistic: 9.928 on 3 and 90 DF, p-value: 1.016e-05
##
##
## Value of test-statistic is: -2.2872 13.4654 3.672
##
## Critical values for test statistics:
## 1pct 5pct 10pct
## tau3 -4.04 -3.45 -3.15
## phi2 6.50 4.88 4.16
## phi3 8.73 6.49 5.47
adf_test_X = ur.df(Y[, 2], type = "trend", lags = 5, selectlags = "BIC")
summary(adf_test_X)
##
## ###############################################
## # Augmented Dickey-Fuller Test Unit Root Test #
## ###############################################
##
## Test regression trend
##
##
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + tt + z.diff.lag)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.8809 -1.8735 0.1033 1.6983 7.0464
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.46314 0.67402 0.687 0.49376
## z.lag.1 -0.15812 0.06736 -2.347 0.02110 *
## tt 0.23476 0.09423 2.491 0.01456 *
## z.diff.lag -0.33237 0.09962 -3.336 0.00124 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.632 on 90 degrees of freedom
## Multiple R-squared: 0.2325, Adjusted R-squared: 0.2069
## F-statistic: 9.086 on 3 and 90 DF, p-value: 2.571e-05
##
##
## Value of test-statistic is: -2.3474 16.4931 3.843
##
## Critical values for test statistics:
## 1pct 5pct 10pct
## tau3 -4.04 -3.45 -3.15
## phi2 6.50 4.88 4.16
## phi3 8.73 6.49 5.47
In both cases the t value
of z.lag.1
is greater than the 5%
critical value of -3.45
, so we do not reject the null hypothesis \(H_0: \rho = 0\). So, both \(X_t\) and \(Y_t\) have a unit root. Next, we examine the differences:
colMeans(diff(Y))
## Y X
## 1.447436 1.435706
is not zero, so we will use type = "drift"
:
adf_test_Y = ur.df(diff(Y[, 1]), type = "drift", lags = 5, selectlags = "BIC")
summary(adf_test_Y)
##
## ###############################################
## # Augmented Dickey-Fuller Test Unit Root Test #
## ###############################################
##
## Test regression drift
##
##
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + z.diff.lag)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.9505 -2.0828 0.2524 2.3136 7.0299
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.3554 0.4230 5.568 2.64e-07 ***
## z.lag.1 -1.6680 0.1773 -9.407 4.88e-15 ***
## z.diff.lag 0.1622 0.1045 1.552 0.124
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.246 on 90 degrees of freedom
## Multiple R-squared: 0.7236, Adjusted R-squared: 0.7175
## F-statistic: 117.8 on 2 and 90 DF, p-value: < 2.2e-16
##
##
## Value of test-statistic is: -9.4073 44.2624
##
## Critical values for test statistics:
## 1pct 5pct 10pct
## tau2 -3.51 -2.89 -2.58
## phi1 6.70 4.71 3.86
adf_test_X = ur.df(diff(Y[, 2]), type = "drift", lags = 5, selectlags = "BIC")
summary(adf_test_X)
##
## ###############################################
## # Augmented Dickey-Fuller Test Unit Root Test #
## ###############################################
##
## Test regression drift
##
##
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + z.diff.lag)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.7456 -1.7111 0.2006 1.8310 7.5214
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.4076 0.3735 6.447 5.54e-09 ***
## z.lag.1 -1.7191 0.1744 -9.858 5.63e-16 ***
## z.diff.lag 0.2196 0.1030 2.133 0.0357 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.675 on 90 degrees of freedom
## Multiple R-squared: 0.7146, Adjusted R-squared: 0.7082
## F-statistic: 112.6 on 2 and 90 DF, p-value: < 2.2e-16
##
##
## Value of test-statistic is: -9.8582 48.6133
##
## Critical values for test statistics:
## 1pct 5pct 10pct
## tau2 -3.51 -2.89 -2.58
## phi1 6.70 4.71 3.86
Because the t value
of z.lag.1
is less than -2.89
in both variable models, we reject the null hypothesis of a unit root. We conclude that both \(Y_t\) and \(X_t\) are \(I(1)\).
We begin with the rank.select
function:
suppressWarnings({
print(rank.select(Y, lag.max = 10, include = "const"))
})
## Best AIC: rank= 1 lag= 1
## Best BIC: rank= 1 lag= 1
## Best HQ : rank= 1 lag= 1
suppressWarnings({
print(rank.select(Y, lag.max = 10, include = "trend"))
})
## Best AIC: rank= 1 lag= 1
## Best BIC: rank= 1 lag= 1
## Best HQ : rank= 1 lag= 1
suppressWarnings({
print(rank.select(Y, lag.max = 10, include = "both"))
})
## Best AIC: rank= 2 lag= 1
## Best BIC: rank= 2 lag= 1
## Best HQ : rank= 2 lag= 1
We see that selecting to include either a constant or a trend results in a suggested \(VECM(1)\) with one cointegration rank is selected (again note that this is not the Johansen procedure). If we choose to include = "both"
, then we would get a \(VECM(1)\) wit ha full rank, which would indicate a stationary series. Since we did not reject the unit root hypothesis for either variable, this result is misleading. Next, we carry out the Johansen procedure:
suppressWarnings({
temp_mdl <- VECM(Y, include = "const", lag = 1, estim = "ML")
summary(rank.test(temp_mdl))
})
## r trace trace_pval trace_pval_T eigen eigen_pval
## 1 0 62.9118156 <0.001 <0.001 62.5076349 <0.001
## 2 1 0.4041808 0.5249 0.531 0.4041808 0.5249
In this case, we see that both the trace
and the eigen
tests reject the null hypothesis of \(H_0: r = 0\). Because the pvalue is greater than 0.05
in both tests, we do not reject the null hypothesis \(H_0: r = 1\).
suppressWarnings({
temp_mdl <- VECM(Y, include = "trend", lag = 1, estim = "ML")
summary(rank.test(temp_mdl))
})
## r trace trace_pval trace_pval_T eigen eigen_pval
## 1 0 44.270550 < 0.001 < 0.001 38.572636 < 0.001
## 2 1 5.697913 0.01883 0.01994 5.697913 0.02011
If we specify a VECM with a trend component, then we reject the null hypothesis of no cointegration (\(r = 0\)), and we reject the null hypothesis \(H_0:r = 1\). Because our series is \(I(1)\), we know that \(\Pi\) cannot have a full rank (i.e., if \(\Pi\) has a full rank, then \(Y_t\) and \(X_t\) cannot be \(I(1)\) but stationary). Because we specified include = "trend"
, the resulting model would indicate that we have a stationary series with a trend component, but we know that our series are \(I(1)\).
Note that if we wish to include a constant or trend in the long-run (i.e. cointegration) relationship, we can do this by specifying LRinclude = ...
in the VECM()
model (see the lecture slides for the different cases and their specifications in VECM()
).
suppressWarnings({
temp_mdl <- VECM(Y, include = "both", lag = 1, estim = "ML")
summary(rank.test(temp_mdl))
})
## r trace trace_pval trace_pval_T eigen eigen_pval
## 1 0 70.812513 < 0.001 < 0.001 66.807338 < 0.001
## 2 1 4.005175 0.04536 0.04822 4.005175 0.04537
In this case, we see that both the trace
and the eigen
tests reject the null hypothesis of \(H_0: r = 0\). The p-value is very close to 0.05
, so we will not reject the null hypothesis \(H_0: r = 1\).
So, our tests strongly indicate that the series has one cointegration relation.
We can also test via ca.jo
:
temp_tst = ca.jo(Y, K = 2, type = "eigen", ecdet = "none", spec = "transitory")
paste0(temp_tst@type, ", ", temp_tst@model)
## [1] "maximal eigenvalue statistic (lambda max), with linear trend"
print(cbind(capture.output(summary(temp_tst))[11:16]))
## [,1]
## [1,] "Values of teststatistic and critical values of test:"
## [2,] ""
## [3,] " test 10pct 5pct 1pct"
## [4,] "r <= 1 | 0.40 6.50 8.18 11.65"
## [5,] "r = 0 | 62.51 12.91 14.90 19.19"
## [6,] ""
Note that ecdet
is used to specify the additional terms in cointegration itself. As we have not included any deterministic components in the cointegrating relation equation (see the underlying model specification). We selected ecdet = "none"
. Looking at the results: because 62.51 > 14.90
, we reject \(H_0: r = 0\); because 0.40 < 8.18
, we do not reject \(H_0: r \leq 1\). So, we have one cointegrating relation in the series.
Note that setting ecdet
to other value may have different results. It is important to remember that our series is \(I(1)\), therefore \(\Pi\) cannot have a full rank. To determine, whether the cointegration (i.e. long-run) relation has a constant or a trend - examine the data plots - if \(Y_t\) and \(X_t\) series are always different by a similar value at each \(t\), then it may be worth to include a constant in the cointegration equation itself. If the difference between the series increases as time increases, then a trend may also be present in the cointegration relation equation as well:
par(mfrow = c(2, 1))
plot(ts(Y[, 1]))
lines(ts(Y[, 2]), col = "red")
legend(0, 120, c("Y", "X"), col = c(1, 2), lty = c(1, 1))
plot.ts(Y[,1] - Y[, 2], main = "Y - X")
As there is no clear indication of a constant or a trend between the series as time increases, we used ecdet = "none"
.
We conclude that there is strong evidence to suggest a single cointegration relation in our \(I(1)\) series.
We will now estimate a \(VECM\) model. Note that the series are both non zero, and we already saw that their differences:
colMeans(Y)
## Y X
## 67.89147 66.67245
colMeans(diff(Y))
## Y X
## 1.447436 1.435706
are also non-zero. We now select the \(VAR(p)\) model with a contant
type:
VARselect(Y, lag.max = 10, type = "const")
## $selection
## AIC(n) HQ(n) SC(n) FPE(n)
## 2 2 2 2
##
## $criteria
## 1 2 3 4 5 6
## AIC(n) 3.711035 3.494265 3.507156 3.538267 3.610157 3.634016
## HQ(n) 3.778240 3.606273 3.663967 3.739881 3.856574 3.925237
## SC(n) 3.877689 3.772022 3.896015 4.038229 4.221221 4.356184
## FPE(n) 40.898145 32.933626 33.374284 34.453412 37.062773 38.018993
## 7 8 9 10
## AIC(n) 3.677280 3.686689 3.721766 3.782606
## HQ(n) 4.013304 4.067516 4.147396 4.253039
## SC(n) 4.510550 4.631062 4.777242 4.949184
## FPE(n) 39.787718 40.280698 41.873700 44.706549
tsDyn
packageAll information criterions suggest a \(VAR(2)\) model for the levels \(Y_t\). So, for the differences, \(\Delta Y_t\), we will create a \(VECM(1)\) model with a drift
and 1
cointegrating relation (based on Johansens procedure test results):
Y_VECM <- VECM(Y, lag = 1, r = 1, include = "const", LRinclude = "none")
summary(Y_VECM)
## #############
## ###Model VECM
## #############
## Full sample size: 100 End sample size: 98
## Number of variables: 2 Number of estimated slope parameters 8
## AIC 349.3118 BIC 372.5765 SSR 1417.039
## Cointegrating vector (estimated by 2OLS):
## Y X
## r1 1 -1.011868
##
##
## ECT Intercept Y -1
## Equation Y -1.0968(0.1784)*** 2.6362(0.3271)*** 0.0839(0.1251)
## Equation X -0.0388(0.1778) 1.9989(0.3260)*** 0.1092(0.1247)
## X -1
## Equation Y -0.5895(0.1477)***
## Equation X -0.4833(0.1473)**
So, the evaluated model is therefore: \[ \begin{cases} \Delta Y_t &= 2.6362 - 1.0968 \text{ECT}_{t-1} + 0.0839 \Delta Y_{t-1} -0.5895 \Delta X_{t-1} \\ \Delta X_t &= 1.9989 - 0.0388 \text{ECT}_{t-1} + 0.1092 \Delta Y_{t-1} -0.4833 \Delta X_{t-1} \end{cases} \]
And the cointegrating relations are \(1 \cdot Y_t - 1.011868 \cdot X_t\). So, \(\text{ECT}_{t-1} = Y_{t-1} - 1.011868 \cdot X_{t-1}\). Note that the true cointegration relation is \(\Pi \mathbf{Y}_{t-1} =\begin{pmatrix} -1 \cdot Y_{t-1} + 1 \cdot X_{t-1} \\ 0\end{pmatrix}\).
If we normalize so that the coefficient of \(Y_{t-1}\) is \(1\), then the cointegration relation is \(1\cdot Y_{t-1} - 1 \cdot X_{t-1}\), which is close to the estimated value.
Note that the representation \(\Pi = \alpha \beta^\top\) is not unique, so, as a matter of convention, we typically normalize the coefficients of \(\beta\) (i.e. the cointegrating matrix) such that some of the elements of \(\beta\) are equal to one, or \(\mathbf{I}\), depending on the number of cointegrating relations.
Alternatively, we can also extract the \(\Pi\) matrix:
coefPI(Y_VECM)
## Y X
## Equation Y -1.09676324 1.10977999
## Equation X -0.03882687 0.03928768
which is made up of the adjustment coefficients, \(\alpha\):
coefA(Y_VECM)
## ECT
## Equation Y -1.09676324
## Equation X -0.03882687
and the cointegrating coefficients \(\beta\):
coefB(Y_VECM)
## r1
## Y 1.000000
## X -1.011868
If we combine the two results, we can see how the coefficients of \(\text{ECT}_{t-1}\) came to be: \[ \begin{aligned} \widehat{\alpha} \widehat{\beta}^\top \mathbf{Y}_{t-1} &= \begin{bmatrix} -1.09676324 \\ -0.03882687 \end{bmatrix} \begin{bmatrix} 1.00 & -1.011868 \end{bmatrix} \begin{bmatrix} Y_{t-1} \\ X_{t-1} \end{bmatrix} \\ &= \begin{bmatrix} -1.09676324 \\ -0.03882687 \end{bmatrix} \begin{bmatrix} 1.00 Y_{t-1} - 1.011868 X_{t-1} \end{bmatrix} \\ &= \begin{bmatrix} -1.09676324 \cdot (1.00 Y_{t-1} - 1.011868 X_{t-1}) \\ -0.03882687 \cdot (1.00 Y_{t-1} - 1.011868 X_{t-1}) \end{bmatrix} \\ &= \begin{bmatrix} -1.09676324 \cdot \text{ECT}_{t-1} \\ -0.03882687 \cdot \text{ECT}_{t-1} \end{bmatrix} \end{aligned} \]
In other words, the cointegration relation \(1.00 Y_{t-1} - 1.011868 X_{t-1}\) is adjusted via the parameters in \(\alpha\). Because the cointegration relation is normalized, we can think of the \(\text{ECT}\) term as the residual from the equation \(Y_{t} = 1.011868 \cdot X_{t} + \text{ECT}_t\).
Furthermore, instead of using \(\text{ECT}\), we can rewrite the \(VECM\) with the \(\Pi \mathbf{Y}_{t-1}\) component:
\[ \begin{aligned} \widehat{\alpha} \widehat{\beta}^\top \mathbf{Y}_{t-1} &= \begin{bmatrix} -1.09676324 \\ -0.03882687 \end{bmatrix} \begin{bmatrix} 1.00 & -1.011868 \end{bmatrix} \begin{bmatrix} Y_{t-1} \\ X_{t-1} \end{bmatrix} \\ &= \begin{bmatrix} -1.09676324 & 1.10978 \\ -0.03882687 & 0.03928767 \end{bmatrix} \begin{bmatrix} Y_{t-1} \\ X_{t-1} \end{bmatrix}\\ &= \widehat{\Pi} \mathbf{Y}_{t-1}\\ &= \begin{bmatrix} -1.09676324 \cdot Y_{t-1} + 1.10978 \cdot X_{t-1}\\ -0.03882687 \cdot Y_{t-1} + 0.03928767 \cdot X_{t-1} \end{bmatrix} \end{aligned} \]
Further note that the error correction terms (ECT’s) are negative - this is a requirement!!. Positive ECT implies that the process is not converging in the long run, thus there would be some instabilities - either model specification problems, or errors in data. It could also indicate the presence of structural changes in the data, which would require testing and specifying structural breaks in the VECM model.
Note that the coefficient for \(\text{ECT}_{t-1}\) in \(\Delta X_t\) equation is insignificant as well as the coefficients of \(\Delta Y_{t-1}\) in both equations, which fall in line with the underlying (actual) model (see the true \(VAR\) equation and the true \(\Pi\) matrix), which we used to generate the data. The significant coefficients of \(\Delta X_{t-1}\) from both equations are close to the true values in \(\Gamma_1 = -\Theta_2\) matrix.
urca
packageWe can also estimate the \(VECM\) model via ca.jo()
function. We will estimate the two different representations:
VECM_Y_v2 <- ca.jo(Y, K = 2, spec = "longrun")
summary(cajools(VECM_Y_v2))
## Response Y.d :
##
## Call:
## lm(formula = Y.d ~ constant + Y.dl1 + X.dl1 + Y.l2 + X.l2 - 1,
## data = data.mat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.5706 -1.6695 -0.2716 1.5272 7.4551
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## constant 3.7749 0.6366 5.930 5.12e-08 ***
## Y.dl1 -1.1025 0.1356 -8.130 1.81e-12 ***
## X.dl1 0.6117 0.1431 4.273 4.65e-05 ***
## Y.l2 -1.2878 0.1981 -6.501 3.93e-09 ***
## X.l2 1.2867 0.1969 6.535 3.37e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.703 on 93 degrees of freedom
## Multiple R-squared: 0.5282, Adjusted R-squared: 0.5028
## F-statistic: 20.82 on 5 and 93 DF, p-value: 6.651e-14
##
##
## Response X.d :
##
## Call:
## lm(formula = X.d ~ constant + Y.dl1 + X.dl1 + Y.l2 + X.l2 - 1,
## data = data.mat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.3835 -1.8211 0.1655 1.5859 7.1288
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## constant 1.67411 0.64790 2.584 0.01133 *
## Y.dl1 0.09597 0.13801 0.695 0.48856
## X.dl1 -0.47005 0.14569 -3.226 0.00173 **
## Y.l2 0.01565 0.20162 0.078 0.93828
## X.l2 -0.01118 0.20041 -0.056 0.95564
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.751 on 93 degrees of freedom
## Multiple R-squared: 0.3314, Adjusted R-squared: 0.2955
## F-statistic: 9.221 on 5 and 93 DF, p-value: 3.73e-07
VECM_Y_v2 <- ca.jo(Y, K = 2, spec = "transitory")
summary(cajools(VECM_Y_v2))
## Response Y.d :
##
## Call:
## lm(formula = Y.d ~ constant + Y.dl1 + X.dl1 + Y.l1 + X.l1 - 1,
## data = data.mat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.5706 -1.6695 -0.2716 1.5272 7.4551
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## constant 3.7749 0.6366 5.930 5.12e-08 ***
## Y.dl1 0.1853 0.1323 1.400 0.165
## X.dl1 -0.6750 0.1510 -4.472 2.19e-05 ***
## Y.l1 -1.2878 0.1981 -6.501 3.93e-09 ***
## X.l1 1.2867 0.1969 6.535 3.37e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.703 on 93 degrees of freedom
## Multiple R-squared: 0.5282, Adjusted R-squared: 0.5028
## F-statistic: 20.82 on 5 and 93 DF, p-value: 6.651e-14
##
##
## Response X.d :
##
## Call:
## lm(formula = X.d ~ constant + Y.dl1 + X.dl1 + Y.l1 + X.l1 - 1,
## data = data.mat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.3835 -1.8211 0.1655 1.5859 7.1288
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## constant 1.67411 0.64790 2.584 0.0113 *
## Y.dl1 0.08031 0.13468 0.596 0.5524
## X.dl1 -0.45887 0.15363 -2.987 0.0036 **
## Y.l1 0.01565 0.20162 0.078 0.9383
## X.l1 -0.01118 0.20041 -0.056 0.9556
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.751 on 93 degrees of freedom
## Multiple R-squared: 0.3314, Adjusted R-squared: 0.2955
## F-statistic: 9.221 on 5 and 93 DF, p-value: 3.73e-07
Because we are focusing on the second representation, so we will keep the equation from spec = "transitory"
.
We can see that this representation is a bit easier to compare to the true model. The estimated system is:
\[ \begin{cases} \Delta Y_t &= 3.7749 - 1.2878 Y_{t-1} + 1.2867 X_{t-1} + 0.1853 \Delta Y_{t-1} - 0.6750 \Delta X_{t-1} \\ \Delta X_t &= 1.67411 + 0.01565 Y_{t-1} - 0.01118 X_{t-1} + 0.08031 \Delta Y_{t-1} - 0.45887 \Delta X_{t-1} \end{cases} \]
We see that the coefficients of \(Y_{t-1}\) and \(X_{t-1}\) in \(\Delta X_{t}\) equation are insignificant, while the ones in \(\Delta Y_{t}\) are significant and are very similar in absolute value, which is inline with our true model. The insignificant coefficients are also inline with our true model.
The normalized cointegration relations (\(\beta^\top\)) as well as the representation with \(\text{ECT}_{t-1}\) variables can be calculated with cajorls()
:
cajorls(VECM_Y_v2, r = 1)
## $rlm
##
## Call:
## lm(formula = substitute(form1), data = data.mat)
##
## Coefficients:
## Y.d X.d
## ect1 -1.296161 0.003323
## constant 3.979508 1.975015
## Y.dl1 0.191262 0.089075
## X.dl1 -0.675996 -0.460283
##
##
## $beta
## ect1
## Y.l1 1.0000000
## X.l1 -0.9968454
The long run equilibrium equation is given by the output under $beta
. They are lagged here, but for interpretation of a long run equation we have to forward those equations by one period. So, we see that \(\text{ECT}_{t-1} = 1\cdot Y_{t-1} - 0.9968454 \cdot X_{t-1}\), which is close to the results we obtained via the VECM()
function. The estimated model is:
\[ \begin{cases} \Delta Y_t &= 3.979508 - 1.296161 \text{ECT}_{t-1} + 0.191262 \Delta Y_{t-1} - 0.675996 \Delta X_{t-1} \\ \Delta X_t &= 1.975015 + 0.003323 \text{ECT}_{t-1} + 0.089075 \Delta Y_{t-1} - 0.460283 \Delta X_{t-1} \\ \end{cases} \]
To see which coefficients are significant, we can use:
summary(cajorls(VECM_Y_v2, r = 1)$rlm)
## Response Y.d :
##
## Call:
## lm(formula = Y.d ~ ect1 + constant + Y.dl1 + X.dl1 - 1, data = data.mat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.4681 -1.6895 -0.2287 1.6501 7.4377
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## ect1 -1.2962 0.1963 -6.604 2.37e-09 ***
## constant 3.9795 0.4198 9.479 2.36e-15 ***
## Y.dl1 0.1913 0.1310 1.460 0.148
## X.dl1 -0.6760 0.1503 -4.498 1.96e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.691 on 94 degrees of freedom
## Multiple R-squared: 0.5272, Adjusted R-squared: 0.5071
## F-statistic: 26.21 on 4 and 94 DF, p-value: 1.319e-14
##
##
## Response X.d :
##
## Call:
## lm(formula = X.d ~ ect1 + constant + Y.dl1 + X.dl1 - 1, data = data.mat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.2327 -1.7272 -0.0016 1.6090 7.4487
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## ect1 0.003323 0.199978 0.017 0.98678
## constant 1.975015 0.427765 4.617 1.23e-05 ***
## Y.dl1 0.089075 0.133500 0.667 0.50626
## X.dl1 -0.460283 0.153112 -3.006 0.00339 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.742 on 94 degrees of freedom
## Multiple R-squared: 0.3287, Adjusted R-squared: 0.3001
## F-statistic: 11.51 on 4 and 94 DF, p-value: 1.207e-07
We see that the second equation has an insignificant error correction term, which is desirable, as the error correction term must be negative.
We can also transform the \(VECM\) into a \(VAR\) representation by specifying the number of cointegrating relations:
vec2var(VECM_Y_v2, r = 1)
##
## Coefficient matrix of lagged endogenous variables:
##
## A1:
## Y.l1 X.l1
## Y -0.10489995 0.6160762
## X 0.09239844 0.5364047
##
##
## A2:
## Y.l2 X.l2
## Y -0.19126151 0.6759964
## X -0.08907538 0.4602827
##
##
## Coefficient matrix of deterministic regressor(s).
##
## constant
## Y 3.979508
## X 1.975015
The coefficients of \(X_{t-1}\) and \(X_{t-2}\) and the constants appear to be close to the true values.
If we look at the residuals:
plot.ts(cajorls(VECM_Y_v2, r = 1)$rlm$residuals)
Acf(cajorls(VECM_Y_v2, r = 1)$rlm$residuals)
We see that the residuals appear to be white noise - they are neither autocorrelated, nor serially correlated.
Once you have the cointegrating vectors produced by function ca.jo()
in urca
and have determined the lag order, you can run the estimation by OLS where you construct the right-hand side variables manually. For example, let’s extract all the data, which was used to estimate the models:
temp_data <- cajorls(VECM_Y_v2, r = 1)$rlm$model
temp_data <- data.frame(temp_data$`z@Z0`, temp_data[, -1])
colnames(temp_data) <- c("dY", "dX", "ECT1", "const", "dY.L1", "dX.L1")
and estimate the first equation via OLS:
dY_ols <- lm(dY ~ -1 + const + ECT1 + dY.L1 + dX.L1, data = temp_data)
round(summary(dY_ols)$coefficients, 5)
## Estimate Std. Error t value Pr(>|t|)
## const 3.97951 0.41984 9.47852 0.00000
## ECT1 -1.29616 0.19628 -6.60379 0.00000
## dY.L1 0.19126 0.13103 1.45970 0.14771
## dX.L1 -0.67600 0.15028 -4.49832 0.00002
and do the same with the second one:
dX_ols <- lm(dX ~ -1 + const + ECT1 + dY.L1 + dX.L1, data = temp_data)
round(summary(dX_ols)$coefficients, 5)
## Estimate Std. Error t value Pr(>|t|)
## const 1.97501 0.42776 4.61706 0.00001
## ECT1 0.00332 0.19998 0.01662 0.98678
## dY.L1 0.08908 0.13350 0.66723 0.50626
## dX.L1 -0.46028 0.15311 -3.00618 0.00339
We see that the results are identical to the \(VECM\) ones in cajorls()
!
Alternatively we do not need to use the data itself but we need to construct the new variable(s) for each cointegration relation:
beta_vec_est <- cajorls(VECM_Y_v2, r = 1)$beta
coint_r_1 <- Y[, 1] * beta_vec_est[1] + Y[, 2] * beta_vec_est[2]
new_data <- ts(cbind(Y, coint_r_1))
and estimate the models as before, but his time using dynlm
:
suppressPackageStartupMessages({library(dynlm)})
dY_ols <- dynlm(d(Y) ~ 1 + L(coint_r_1, 1) + L(d(Y), 1) + L(d(X), 1), data = new_data)
round(summary(dY_ols)$coefficients, 5)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.97951 0.41984 9.47852 0.00000
## L(coint_r_1, 1) -1.29616 0.19628 -6.60379 0.00000
## L(d(Y), 1) 0.19126 0.13103 1.45970 0.14771
## L(d(X), 1) -0.67600 0.15028 -4.49832 0.00002
dX_ols <- dynlm(d(X) ~ 1 + L(coint_r_1, 1) + L(d(Y), 1) + L(d(X), 1), data = new_data)
round(summary(dX_ols)$coefficients, 5)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.97501 0.42776 4.61706 0.00001
## L(coint_r_1, 1) 0.00332 0.19998 0.01662 0.98678
## L(d(Y), 1) 0.08908 0.13350 0.66723 0.50626
## L(d(X), 1) -0.46028 0.15311 -3.00618 0.00339
The results are identical. Now, if we want to remove the insignificant variables (we start by removing the variables with the largest p-values
from each equation and examine if there are any insignificant variables left; if there are - we repeat the process), we are left with the following equations:
dY_ols <- dynlm(d(Y) ~ 1 + L(coint_r_1, 1) + L(d(X), 1), data = new_data)
round(summary(dY_ols)$coefficients, 5)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.72570 0.38442 9.69186 0
## L(coint_r_1, 1) -1.08730 0.13515 -8.04526 0
## L(d(X), 1) -0.50945 0.09838 -5.17823 0
dX_ols <- dynlm(d(X) ~ 1 + L(d(X), 1), data = new_data)
round(summary(dX_ols)$coefficients, 5)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.02974 0.30856 6.57812 0e+00
## L(d(X), 1) -0.40538 0.09425 -4.30100 4e-05
which we can write in a matrix notation (we also use the evaluated normalized cointegration relation equation to transform from \(\text{ECT}_{t-1}\) to \(\beta^\top \mathbf{Y}_{t-1}\)):
beta_vec_est
## ect1
## Y.l1 1.0000000
## X.l1 -0.9968454
#The PI matrix:
PI_est = cbind(c(dY_ols$coefficients["L(coint_r_1, 1)"], 0)) %*% t(beta_vec_est)
PI_est
## Y.l1 X.l1
## L(coint_r_1, 1) -1.087297 1.083867
## 0.000000 0.000000
\[ \begin{aligned} \begin{bmatrix} \Delta Y_t \\ \Delta X_t \end{bmatrix} &= \begin{bmatrix} 3.72570 \\ 2.02974 \end{bmatrix} + \begin{bmatrix} -1.08730\\ 0 \end{bmatrix} \text{ECT}_{t-1} + \begin{bmatrix} 0 & -0.50945\\ 0 & -0.40538 \end{bmatrix} \begin{bmatrix} \Delta Y_{t-1}\\ \Delta X_{t-1} \end{bmatrix} \\\\ &= \begin{bmatrix} 3.72570 \\ 2.02974 \end{bmatrix} + \begin{bmatrix} -1.08730\\ 0 \end{bmatrix} \begin{bmatrix} 1 & -0.9968454 \end{bmatrix} \begin{bmatrix} Y_{t-1}\\ X_{t-1} \end{bmatrix} + \begin{bmatrix} 0 & -0.50945\\ 0 & -0.40538 \end{bmatrix} \begin{bmatrix} \Delta Y_{t-1}\\ \Delta X_{t-1} \end{bmatrix} \\ \\ &= \begin{bmatrix} 3.72570 \\ 2.02974 \end{bmatrix} + \begin{bmatrix} -1.08730 & 1.08387\\ 0 & 0 \end{bmatrix} \begin{bmatrix} Y_{t-1}\\ X_{t-1} \end{bmatrix} + \begin{bmatrix} 0 & -0.50945\\ 0 & -0.40538 \end{bmatrix} \begin{bmatrix} \Delta Y_{t-1}\\ \Delta X_{t-1} \end{bmatrix} \end{aligned} \]
For completeness sake, let’s assume that we also want to transform the model back to a \(VAR\) representation. Then, following the definition of \(VECM\), we get:
Theta_2_est = (-1) * cbind(c(0, 0),
c(dY_ols$coefficients["L(d(X), 1)"],
dX_ols$coefficients["L(d(X), 1)"]))
Theta_1_est = PI_est + diag(c(1, 1)) - Theta_2_est
print(Theta_1_est)
## Y.l1 X.l1
## L(coint_r_1, 1) -0.08729743 0.5744168
## 0.00000000 0.5946174
print(Theta_2_est)
## [,1] [,2]
## L(d(X), 1) 0 0.5094507
## L(d(X), 1) 0 0.4053826
\[ \begin{aligned} \begin{bmatrix} Y_t \\ X_t \end{bmatrix} &= \begin{bmatrix} 3.72570 \\ 2.02974 \end{bmatrix} + \begin{bmatrix} -0.08729743 & 0.5744168\\ 0 & 0.5946174 \end{bmatrix} \begin{bmatrix} Y_{t-1}\\ X_{t-1} \end{bmatrix} + \begin{bmatrix} 0 & 0.5094507\\ 0 & 0.4053826 \end{bmatrix} \begin{bmatrix} Y_{t-2}\\ X_{t-2} \end{bmatrix} \end{aligned} \]
We see that the estimated coefficients are similar to the true model parameter values.
To prove that the estimated steps are correct, it is always recommended to increase the sample size of the simulation, and examine if the model estimates approach the true parameter values. We will only provide one more summarized example, with a sample size of \(T = 10000\). we will not repeat each step, but only provide the main results of the \(VAR\) model. The previously presented tests and estimation methods should also be applied to this larger sample, to examine any differences, or similarities, with the smaller sample case:
set.seed(123)
nsample = 10000
alpha = c(3, 2)
Theta_1 = matrix(c(0, 0.5, 0, 0.5), ncol = 2, byrow = TRUE)
Theta_2 = matrix(c(0, 0.5, 0, 0.5), ncol = 2, byrow = TRUE)
Sigma = matrix(c(8, 6, 6, 9), ncol = 2, byrow = TRUE)
e = MASS::mvrnorm(n = nsample, mu = c(0, 0), Sigma = Sigma)
Y <- matrix(data = 0, nrow = nsample, ncol = 2)
colnames(Y) <- c("Y", "X")
Y[1, ] = alpha + e[1, ]
Y[2, ] = alpha + Theta_1 %*% Y[1, ] + e[2, ]
for(j in 3:nsample){
Y[j, ] = alpha + Theta_1 %*% Y[j-1, ] + Theta_2 %*% Y[j-2, ] + e[j, ]
}
VECM_Y_large_smpl <- ca.jo(Y, K = 2, spec = "transitory")
cajorls()
:cajorls(VECM_Y_large_smpl, r = 1)
## $rlm
##
## Call:
## lm(formula = substitute(form1), data = data.mat)
##
## Coefficients:
## Y.d X.d
## ect1 -0.9916998 0.0003638
## constant 2.9658757 1.9978561
## Y.dl1 -0.0048090 -0.0002057
## X.dl1 -0.5079258 -0.5108232
##
##
## $beta
## ect1
## Y.l1 1.000000
## X.l1 -1.000007
#summary(cajorls(VECM_Y_large_smpl, r = 1)$rlm)
Initial models re-estimated via dynlm
:
beta_vec_est <- cajorls(VECM_Y_large_smpl, r = 1)$beta
coint_r_1 <- Y[, 1] * beta_vec_est[1] + Y[, 2] * beta_vec_est[2]
new_data <- ts(cbind(Y, coint_r_1))
dY_ols <- dynlm(d(Y) ~ 1 + L(coint_r_1, 1) + L(d(Y), 1) + L(d(X), 1), data = new_data)
round(summary(dY_ols)$coefficients, 5)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.96588 0.03610 82.16757 0.00000
## L(coint_r_1, 1) -0.99170 0.01796 -55.20926 0.00000
## L(d(Y), 1) -0.00481 0.01290 -0.37293 0.70921
## L(d(X), 1) -0.50793 0.01402 -36.22036 0.00000
dX_ols <- dynlm(d(X) ~ 1 + L(coint_r_1, 1) + L(d(Y), 1) + L(d(X), 1), data = new_data)
round(summary(dX_ols)$coefficients, 5)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.99786 0.03845 51.96330 0.00000
## L(coint_r_1, 1) 0.00036 0.01913 0.01902 0.98483
## L(d(Y), 1) -0.00021 0.01374 -0.01498 0.98805
## L(d(X), 1) -0.51082 0.01494 -34.19859 0.00000
After removing the insignificant variables sequentially:
dY_ols <- dynlm(d(Y) ~ 1 + L(coint_r_1, 1) + L(d(X), 1), data = new_data)
round(summary(dY_ols)$coefficients, 5)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.96915 0.03501 84.81625 0
## L(coint_r_1, 1) -0.99605 0.01366 -72.93637 0
## L(d(X), 1) -0.51201 0.00876 -58.47284 0
dX_ols <- dynlm(d(X) ~ 1 + L(d(X), 1), data = new_data)
round(summary(dX_ols)$coefficients, 5)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.99823 0.03212 62.22025 0
## L(d(X), 1) -0.51104 0.00860 -59.43911 0
PI_est = cbind(c(dY_ols$coefficients["L(coint_r_1, 1)"], 0)) %*% t(beta_vec_est)
PI_est
## Y.l1 X.l1
## L(coint_r_1, 1) -0.9960511 0.9960582
## 0.0000000 0.0000000
Gamma_est = cbind(c(0, 0),
c(dY_ols$coefficients["L(d(X), 1)"],
dX_ols$coefficients["L(d(X), 1)"]))
Gamma_est
## [,1] [,2]
## L(d(X), 1) 0 -0.5120105
## L(d(X), 1) 0 -0.5110421
cbind(c(dY_ols$coefficients["(Intercept)"], dX_ols$coefficients["(Intercept)"]))
## [,1]
## (Intercept) 2.969154
## (Intercept) 1.998228
Theta_2_est = (-1) * Gamma_est
Theta_1_est = PI_est + diag(c(1, 1)) - Theta_2_est
print(Theta_1_est)
## Y.l1 X.l1
## L(coint_r_1, 1) 0.003948924 0.4840477
## 0.000000000 0.4889579
print(Theta_2_est)
## [,1] [,2]
## L(d(X), 1) 0 0.5120105
## L(d(X), 1) 0 0.5110421
We see that \(\widehat{\Theta}_1\) and \(\widehat{\Theta}_2\) are much closer to the true values with the larger sample size. The element in (1,1)
in \(\widehat{\Theta}_1\) is \((\widehat{\Theta}_1)_{1,1} = 0.003948924\) and it is much closer to zero (i.e. the true value), compared to the estimated value in the smaller sample size, where we got \((\widehat{\Theta}_1)_{1,1} = -0.08729743\).