General outline for Multivariate Time Series VAR model building

Below we provide a general outline on building multivariate time series models based on the methods that were presented in the lectures along with some additional functions for a broader perspective:

Variable selection & purpose
The initial step is to understand the task at hand - what kind of variables should be included, what maximum lag order would make sense based on the frequency and description of the data. What is the underlying economic or financial theory that is assumed beforehand?
Data analysis and transformations
After selecting the data, the second step would be examine the data plots and summary statistics and answer these questions:
- Are there any outliers in the data?
- Is there any missing data?
- Do you need to transform the data, e.g. take logarithms?
- Do you need to create any new variables - e.g. GDP per capita, number of children per household etc.?
Unit root tests
After determining the final set of variables, $Y_t$, we need to test, whether they have a unit root ($I(1)$, $I(2)$, …) or are stationary ($I(0)$). To do this, use the ADF test - the ur.df() form the urca package allows to do just this with type = c("none", "drift", "trend") for different model types:
- none - no constant and no trend in the model,
- drift - constant in the model,
- trend - constant + trend in the model.
The model selection can also be automated by including lags = 10, selectlags = "BIC" to estimate different $AR(p)$ models up to $p = 10$ and select the best one in terms of their $BIC$ value. Note that the null hypothesis of the $ADF$ test is that the data has a unit root Other tests are also available and should be used as well:
- We can also use pp.test() from the tseries package to carry out the Phillips-Perron (PP) test for the null hypothesis that the data has a unit root. Note that the regression equation in this test incorporates a constant and a linear trend, so we cannot specify different model types like with the ur.df() function.
- We can use the kpss.test() from the tseries package to carry out the Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test for the null hypothesis that the data is stationary.
- There is also the adf.test() from the tseries package which uses the regression equation with a constant and a linear trend for the Augmented Dickey-Fuller (ADF) test for the null hypothesis that the data has a unit root. Again, we cannot specify different model types like in ur.df().
If we determine that the data is not $I(0)$, then we need to check if it is $I(1)$. To do this, carry out the unit root tests on the differences of the data. If the differences do not have a unit root (i.e. $\Delta Y_t$ is $I(0)$), then the series $Y_t$ is $I(1)$. If the differences do have a unit root, then we need to either differentiate the differences, and check if the series is $I(2)$. If the series are $I(2)$, then the interpretation of the data will change - e.g. $\Delta \Delta Y_t$ could be interpreted as differences of the change in $Y_t$. If the $\Delta \Delta Y_t$ is still not $I(0)$, then you should re-examine your data for any outliers or errors as higher order of differences will be harder to interpret.
VAR(p) model order selection
Usually model selection is done based on some kind of information criteria - usually $AIC$ or $BIC$ sometimes other information criteria. This can be done using VARselect() from the vars package. $BIC$ penalizes model complexity more heavily.
Cointegrating relationship test for unit root series
If the time series has a unit root, we should check, whether there are any cointegrating relationships between the series. There are three types of tests, which can be performed:
- The Engle-Granger procedure - it states that if two variables $Y_t$ and $X_t$ are non-stationary and cointegrated, then a linear combination of them must be stationary. In other words, the test is done in two steps:
  1. Step 1: estimate the long-run (equilibrium) equation: \[Y_t = \alpha + \gamma \cdot t + \beta X_t + u_t\] In order to test for cointegration, we need to test if $\widehat{u}_t$ is stationary. This is determined via ADF (ur.df()) tests on the model residuals (the residuals have a zero-mean, so we exclude the constant term): \[\Delta \widehat{u}_t = \rho \widehat{u}_{t-1} + \sum_{i = 1}^p c_i \Delta \widehat{u}_{t-i} + w_t\] using critical values adjusted for the number of variables. The null hypothesis of ADF test is $H_0: \rho = 0$, i.e. the residuals have a unit root. If we reject the null hypothesis, then the residuals are stationary and we conclude that $Y_t$ and $X_t$ are cointegrated. Note that in this step the only important question is the stationarity or otherwise of the residuals, i.e. traditional diagnostic tests are unimportant.
  2. Step 2: in the second step, the lagged residuals are included in the regression to create a $ECM$: \[\Delta Y_t = \phi + \delta t + \lambda \hat{u}_{t-1} + \gamma_1 \Delta Y_{t-1} + ... + \gamma_p \Delta Y_{t-p} + \omega_0 \Delta X_t + ... + \omega_q \Delta X_{t-q} + \epsilon_t\] The coefficient $\lambda$ must be negative. (The same applies to the $\Delta X_t$ equation).
- The Phillips-Ouliaris cointegration test - tests the null hypothesis $H_0: Y_t \text{ and } X_t \text{ are not cointegrated}$. this test can be performed via ca.po() from the urca package, or po.test() from the tseries package. Phillips and Ouliaris showed that the ADF and PP unit root tests applied to the estimated cointegrating residual do not have the usual DF distributions under the null $H_0: \text{ no cointegration}$. Note that the test type type = "Pz", compared to the test type = "Pu", has the advantage that it is invariant to the normalization of the cointegration vector, i.e. it does not matter which variable is on the left hand side of the equation.
- The Johansen test - a test for cointegration that allows for more than one cointegrating relationship, unlike the Engle-Granger method. If the sample size is too small then the results will not be reliable and one should use Auto Regressive Distributed Lags (ADL). The ca.jo() function from the package urca can be used to carry out this test, which is based on the following $VAR$ and $VECM$ specifications:
  Given a general VAR model: \[\mathbf{Y}_t = \mathbf{\mu} + \Phi \mathbf{D}_t + \Theta_1 \mathbf{Y}_{t-1} + ... + \Theta_p \mathbf{Y}_{t-p} + \mathbf{\epsilon}_t,\quad t = 1,...,T\] where $\mu$ is the constant and $\mathbf{D}_t$ is the trend matrix. There are two possible specifications of a VECM.
  1. The first form: \[\Delta \mathbf{Y}_t = \mathbf{\mu} + \Phi \mathbf{D}_t + \Gamma_1 \Delta \mathbf{Y}_{t-1} + ... + \Gamma_{p-1} \Delta \mathbf{Y}_{t-p+1} + \Pi \mathbf{Y}_{t-p} + \mathbf{\epsilon}_t\] where: \[ \begin{aligned} \Gamma_i &= \sum_{j = 1}^i \Theta_j - \mathbf{I},\quad i = 1,...,p-1 \\ \Pi &= \sum_{k = 1}^p \Theta_j - \mathbf{I} \end{aligned} \] The $\Gamma_i$ matrices contain the cumulative long-run impacts. If the above $VECM$ specification is required, then spec = "longrun" must be specified when calling the ca.jo() function.
  2. The second form: \[\Delta \mathbf{Y}_t = \mathbf{\mu} + \Phi \mathbf{D}_t + \Gamma_1 \Delta \mathbf{Y}_{t-1} + ... + \Gamma_{p-1} \Delta \mathbf{Y}_{t-p+1} + \Pi \mathbf{Y}_{t-1} + \mathbf{\epsilon}_t\] where: \[ \begin{aligned} \Gamma_i &= -\sum_{j = i+1}^p \Theta_j,\quad i = 1,...,p-1 \\ \Pi &= \sum_{k = 1}^p \Theta_j - \mathbf{I} \end{aligned} \] The $\Pi$ matrix is the same as in the first specification. However, $\Gamma_i$ matrices now measure the transitory effects. By setting spec = "transitory" when calling ca.jo(), the VECM of the second specification form will be estimated.
  Note that the results for $\Pi$ will be the same, regardless of the chosen specification. The explanatory power will be the same as well.
  
  Johansen test has two variations, both of which are likelihood-ratio tests. For both test statistics, the null hypothesis if \[H_0:\text{ no cointegration}\] The tests differ in terms of the alternative hypothesis:
  1. Maximum Eigenvalue Test
    The maximum eigenvalue test examines whether the largest eigenvalue is zero relative to the alternative that the next largest eigenvalue is zero. The outline of the test is as follows:
    - The first test is done for \[H_0: rank(\Pi) = 0\] with the alternative \[H_1: rank(\Pi) = 1\]. If the rank of the matrix is zero, the largest eigenvalue is zero and there is no cointegration - the test is done;
    - Further tests (if we rejected $H_0: rank(\Pi) = 0$) check \[H_0: rank(\Pi) = r\] with \[H_1: rank(\Pi) = r+1\] for $r = 1, 2, ...$. If at any point we do not reject $H_0$ - the test is completed and the rank of $\Pi$ (i.e. the number of cointegration relations) is equal to $H_0: rank(\Pi) = r$.
  2. Trace Test
    The trace test is a test whether the rank of the matrix $\Pi$ is $r$ - the null is \[H_0: rank(\Pi) = r\] the alternative is \[H_1: r < rank(\Pi) \leq K\] where $K$ is the maximum possible cointegrating vectors (or the number of rows in $\Pi$). If we reject $H_0$, then further test involves \[H_0: rank(\Pi) = r + 1\] with the alternative \[H_1: r + 1 < rank(\Pi) \leq K\] $r = 0,...,K$ Note that, contrary to the name, this test is not based on the trace of $\Pi$.
Estimating the model
If the series is not cointegrated, we can estimate the model via $VAR()$ function from package vars for the differences of the series, $\Delta Y_t$ (if a unit root is present). If the series are cointegrated, we need to consider the long-run relationship by estimating a $VECM$ using either VECM() from package tsDyn, or cajorls() from package urca and specifying the number of cointegrating relations, which we found from the previous step. Depending on the function, we may also need to specify the lag order of the $VECM$ representation.
Model diagnostics tests
Now that we have the model estimated, we need to verify if it is well specified. This is usually done by examining the residuals from the model. The most important test for time series data are tests for autocorrelations (or serial correlations) of the residuals, also known as Portmanteu test. Two most well-known versions of this test are the Ljung-Box and the Box-Pierce tests, which are implemented in the Box.test() function from the stats package. For multivariate time series alongside autocorrelation, another problem is the cross-correlation of the residuals, i.e., when $cor(\epsilon_{1,t}, \epsilon_{2, t + s}) \neq 0,\quad s > 0$. For this reason, we may use the serial.test() function from the vars package, which computes the multivariate test for serially correlated errors. A multivariate Ljung-Box test is implemented in the mq() function from the MTS package.
Results & conclusions
After we verify that the model is adequate, we can either predict() future values, or examine the impulse-response functions via ifr() from the vars package in order to check how the variables respond to a particular shock.

It is also strongly recommended to partition the dataset into training and test data subsets and evaluate the model on the training data. Then we can compare the forecasted values with the test data to measure the forecasting strength of our model by calculating the forecast error, like the $MAPE$ or $RMSE$. The size of the test dataset is usually up to 20% of the total sample. For time series data, the test data can be the last 20% of the time series observations. We can also repeat this process by estimating the model on the first T - j time series and evaluating the forecasts $h$ periods ahead, with different values of $j = T - k, T - k - 1, ..., h$, where $T-k$ is around 20% of the sample. We can then evaluate how our forecast error changes as we increase the sample size.

Based on this answer at stackoverflow.com

VAR(2) - stationary example

We will generate the following VAR(2) model:

\[ \begin{pmatrix} Y_{t} \\ X_{t} \end{pmatrix} = \begin{pmatrix} 3 \\ 2 \end{pmatrix} + \begin{pmatrix} 0.5 & 0.4 \\ -0.2 & 0.6 \end{pmatrix} \begin{pmatrix} Y_{t-1} \\ X_{t-1} \end{pmatrix} + \begin{pmatrix} -0.2 & -0.1 \\ 0.3 & 0.1 \end{pmatrix} \begin{pmatrix} Y_{t-2} \\ X_{t-2} \end{pmatrix} + \begin{pmatrix} \epsilon_{1,t} \\ \epsilon_{2,t} \end{pmatrix} ,\quad \begin{pmatrix} \epsilon_{1,t} \\ \epsilon_{2,t} \end{pmatrix} \sim \mathcal{N} \left( \begin{bmatrix} 0 \\ 0 \end{bmatrix} , \begin{bmatrix} 8 & 6 \\ 6 & 9 \end{bmatrix} \right) \]

set.seed(123)

nsample = 500

alpha = c(3, 2)
Theta_1 = matrix(c(0.5, 0.4, -0.2, 0.6), ncol = 2, byrow = TRUE)
Theta_2 = matrix(c(-0.2, -0.1, 0.3, 0.1), ncol = 2, byrow = TRUE)

Sigma = matrix(c(8, 6, 6, 9), ncol = 2, byrow = TRUE)

e = MASS::mvrnorm(n = nsample, mu = c(0, 0), Sigma = Sigma)

Y <- matrix(data = 0, nrow = nsample, ncol = 2)
colnames(Y) <- c("Y", "X")

Y[1, ] = alpha + e[1, ]
Y[2, ] = alpha + Theta_1 %*% Y[1, ] + e[2, ]
for(j in 3:nsample){
  Y[j, ] = alpha + Theta_1 %*% Y[j-1, ] + Theta_2 %*% Y[j-2, ] + e[j, ]
}


plot.ts(Y, plot.type = "multiple")

Testing for unit root

We could test for a unit root any number of ways: either sequentially by creating dynlm equations, using ur.df, or adf.test:

suppressPackageStartupMessages({
  suppressWarnings({
    library(urca)
  })
})

adf_test_Y = ur.df(Y[, 1], type = "drift", lags = 5, selectlags = "BIC")
summary(adf_test_Y)

## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression drift 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + z.diff.lag)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -7.928 -2.091 -0.126  1.894  9.443 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  4.14957    0.43437   9.553  < 2e-16 ***
## z.lag.1     -0.48234    0.04807 -10.035  < 2e-16 ***
## z.diff.lag1  0.23322    0.04467   5.221 2.64e-07 ***
## z.diff.lag2 -0.16735    0.04450  -3.761  0.00019 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.939 on 490 degrees of freedom
## Multiple R-squared:  0.3082, Adjusted R-squared:  0.304 
## F-statistic: 72.78 on 3 and 490 DF,  p-value: < 2.2e-16
## 
## 
## Value of test-statistic is: -10.0349 50.3495 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau2 -3.44 -2.87 -2.57
## phi1  6.47  4.61  3.79

adf_test_X = ur.df(Y[, 2], type = "drift", lags = 5, selectlags = "BIC")
summary(adf_test_X)

## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression drift 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + z.diff.lag)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.3332 -1.8041 -0.1827  1.9078  9.5116 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.96144    0.45762   6.471 2.36e-10 ***
## z.lag.1     -0.30167    0.04459  -6.765 3.81e-11 ***
## z.diff.lag1 -0.30417    0.05038  -6.038 3.09e-09 ***
## z.diff.lag2 -0.16015    0.04448  -3.600  0.00035 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.951 on 490 degrees of freedom
## Multiple R-squared:  0.2824, Adjusted R-squared:  0.278 
## F-statistic: 64.28 on 3 and 490 DF,  p-value: < 2.2e-16
## 
## 
## Value of test-statistic is: -6.765 22.8827 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau2 -3.44 -2.87 -2.57
## phi1  6.47  4.61  3.79

Because the t value of z.lag.1 is -10.035 < -2.87 - we reject the null hypothesis of a unit root in $Y_t$. Since -6.765 < -2.87 - we reject the null hypothesis of a unit root in $X_t$.

So, both $Y_t$ and $X_t$ are $I(0)$.

Selecting the VAR model order:

We can automate the order selection via VARselect using information criteria. Note that we can specify type = c("const", "trend", "both", "none") if we want to include a constant or a trend (or both). Because we do not see any patterns of a trend, but we do see a variance around a non-zero mean, we select to include a constant:

suppressPackageStartupMessages({
  suppressWarnings({
    library(vars)
  })
})

VARselect(Y, lag.max = 5, type = "const")

## $selection
## AIC(n)  HQ(n)  SC(n) FPE(n) 
##      2      2      2      2 
## 
## $criteria
##                1         2         3         4         5
## AIC(n)  4.112833  3.572596  3.585883  3.596214  3.606383
## HQ(n)   4.132840  3.605941  3.632565  3.656235  3.679741
## SC(n)   4.163797  3.657537  3.704799  3.749107  3.793252
## FPE(n) 61.119624 35.608981 36.085331 36.460226 36.833118

We see that a $VAR(2)$ model is recommended.

Johansens procedure for stationary processes

Regarding Johansens procedure - the matrix $\Pi$ has full rank when the process is stationary. Johansen procedure requires that the time series are checked for unit roots first, since it was not designed to be used for unit root checks. We may still try to carry out the test.

Note that we can examine an automated procedure which chooses the “best” VEC model (that is, its rank and lag) according to the information criteria (this approach is not based on the Johansen test):

suppressPackageStartupMessages({
  suppressWarnings({
    library(tsDyn)
  })
})
#Suppress function warnings:
suppressWarnings({
  rank.select(Y, lag.max = 10, include = "const")
})

## Best AIC: rank= 2  lag= 1 
## Best BIC: rank= 2  lag= 1 
## Best HQ : rank= 2  lag= 1

This suggests that the $\Pi_{2 \times 2}$ matrix would be full rank, indicating that both of our time series are stationary. If we look at the Johansen-Procedure - since we have a lag = 2 in VAR via VARselect() and the suggested lag = 1 in VECM via rank.select(), we choose lag = 1 for the VECM order:

suppressWarnings({
  temp_mdl <- VECM(Y, include = "const", lag = 1, estim = "ML")
  summary(rank.test(temp_mdl))
})

##   r     trace trace_pval trace_pval_T     eigen eigen_pval
## 1 0 342.81505    < 0.001      < 0.001 277.60295    < 0.001
## 2 1  65.21211    < 0.001      < 0.001  65.21211    < 0.001

Note that once we specify the $VECM$ model itself, we cannot directly test for a full rank as the VECM model itself assumes that the data has a unit root (and thus the $\Pi$ matrix does not have the full rank). Thankfully the output provides two tests:

trace - tests the null hypothesis of $rank = r$ against the alternative: $r < rank \leq K$ (where K is the number of rows in the $\Pi$ matrix, i.e. the full rank), i.e. against the alternative that the system is stationary;
eigenvalue - tests the null hypothesis of $rank = r$ against the alternative: $rank = r + 1$.

We see that for $r = 1$ the trace_pval < 0.05, so we reject the null hypothesis of $r = 1$ (and that $r = 0$), which means that the system is stationary (because the only other option is $r = 2$ - the full rank). Furthermore, eigen_pval < 0.05, so we reject the null hypothesis, that $r = 1$, against our alternative, that $r = 1 + 1 = 2$.

Similarly, with ca.jo (note that K from ca.jo corresponds to lag + 1 in rank.test). Because we specified a $VECM(1)$ for our rank.test, we set K = 2:

temp_tst = ca.jo(Y, K = 2, type = "trace")
paste0(temp_tst@type, ", ", temp_tst@model)

## [1] "trace statistic, with linear trend"

print(cbind(capture.output(summary(temp_tst))[11:16]))

##      [,1]                                                  
## [1,] "Values of teststatistic and critical values of test:"
## [2,] ""                                                    
## [3,] "           test 10pct  5pct  1pct"                   
## [4,] "r <= 1 |  65.21  6.50  8.18 11.65"                   
## [5,] "r = 0  | 342.82 15.66 17.95 23.52"                   
## [6,] ""

temp_tst = ca.jo(Y, K = 2, type = "eigen")
paste0(temp_tst@type, ", ", temp_tst@model)

## [1] "maximal eigenvalue statistic (lambda max), with linear trend"

print(cbind(capture.output(summary(temp_tst))[11:16]))

##      [,1]                                                  
## [1,] "Values of teststatistic and critical values of test:"
## [2,] ""                                                    
## [3,] "           test 10pct  5pct  1pct"                   
## [4,] "r <= 1 |  65.21  6.50  8.18 11.65"                   
## [5,] "r = 0  | 277.60 12.91 14.90 19.19"                   
## [6,] ""

We see that the results are the same - we reject $H_0: r = 0$ and $H_0: r \leq 1$ in both trace and eigenvalue tests, so our cointegration matrix has the full rank, i.e. the system is stationary.

VAR estimation and output

Now that we have established that both of our time series are $I(0)$, we can estimate a VAR model. Based on VARselect, we have chosen a $VAR(2)$ model with a constant:

VAR_mdl = VAR(Y, p = 2, type = "const")
summary(VAR_mdl)

## 
## VAR Estimation Results:
## ========================= 
## Endogenous variables: Y, X 
## Deterministic variables: const 
## Sample size: 498 
## Log Likelihood: -2292.506 
## Roots of the characteristic polynomial:
## 0.7214 0.4899 0.4899 0.07252
## Call:
## VAR(y = Y, p = 2, type = "const")
## 
## 
## Estimation results for equation Y: 
## ================================== 
## Y = Y.l1 + X.l1 + Y.l2 + X.l2 + const 
## 
##       Estimate Std. Error t value Pr(>|t|)    
## Y.l1   0.51225    0.05000  10.246  < 2e-16 ***
## X.l1   0.34573    0.04846   7.135 3.48e-12 ***
## Y.l2  -0.22310    0.04790  -4.658 4.12e-06 ***
## X.l2  -0.10460    0.05046  -2.073   0.0387 *  
## const  3.76166    0.42008   8.955  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Residual standard error: 2.832 on 493 degrees of freedom
## Multiple R-Squared: 0.418,   Adjusted R-squared: 0.4133 
## F-statistic: 88.51 on 4 and 493 DF,  p-value: < 2.2e-16 
## 
## 
## Estimation results for equation X: 
## ================================== 
## X = Y.l1 + X.l1 + Y.l2 + X.l2 + const 
## 
##       Estimate Std. Error t value Pr(>|t|)    
## Y.l1  -0.22639    0.05072  -4.463 1.00e-05 ***
## X.l1   0.54255    0.04916  11.037  < 2e-16 ***
## Y.l2   0.29039    0.04859   5.976 4.39e-09 ***
## X.l2   0.07986    0.05119   1.560    0.119    
## const  3.17114    0.42616   7.441 4.47e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Residual standard error: 2.873 on 493 degrees of freedom
## Multiple R-Squared: 0.3778,  Adjusted R-squared: 0.3727 
## F-statistic: 74.83 on 4 and 493 DF,  p-value: < 2.2e-16 
## 
## 
## 
## Covariance matrix of residuals:
##       Y     X
## Y 8.023 5.602
## X 5.602 8.257
## 
## Correlation matrix of residuals:
##        Y      X
## Y 1.0000 0.6883
## X 0.6883 1.0000

Now we can write the model:

\[ \begin{cases} Y_t &= 3.76166 + 0.51225 Y_{t-1} + 0.34573 X_{t-1} - 0.22310 Y_{t-2} - 0.10460 X_{t-2}\\ X_t &= 3.17114 - 0.22639 Y_{t-1} + 0.54255 X_{t-1} + 0.29039 Y_{t-2} + 0.07986 X_{t-2} \end{cases} \]

We could also re-write the system in a matrix form for easier comparison with the actual model (try it!). We see that the second equation is not as accurately estimated as the first one (note the insignificant $X_{t-2}$ coefficient).

We can estimate a restricted VAR, either by manually imposing zero restrictions, or automatically, based on coefficient significance:

VAR_mdl_restricted = restrict(VAR_mdl, method = "ser", thresh = 2.0, resmat = NULL)
summary(VAR_mdl_restricted)

## 
## VAR Estimation Results:
## ========================= 
## Endogenous variables: Y, X 
## Deterministic variables: const 
## Sample size: 498 
## Log Likelihood: -2294.831 
## Roots of the characteristic polynomial:
## 0.6746 0.5249 0.5249 0.1855
## Call:
## VAR(y = Y, p = 2, type = "const")
## 
## 
## Estimation results for equation Y: 
## ================================== 
## Y = Y.l1 + X.l1 + Y.l2 + X.l2 + const 
## 
##       Estimate Std. Error t value Pr(>|t|)    
## Y.l1   0.51225    0.05000  10.246  < 2e-16 ***
## X.l1   0.34573    0.04846   7.135 3.48e-12 ***
## Y.l2  -0.22310    0.04790  -4.658 4.12e-06 ***
## X.l2  -0.10460    0.05046  -2.073   0.0387 *  
## const  3.76166    0.42008   8.955  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Residual standard error: 2.832 on 493 degrees of freedom
## Multiple R-Squared: 0.418,   Adjusted R-squared: 0.4133 
## F-statistic: 88.51 on 4 and 493 DF,  p-value: < 2.2e-16 
## 
## 
## Estimation results for equation X: 
## ================================== 
## X = Y.l1 + X.l1 + Y.l2 + const 
## 
##       Estimate Std. Error t value Pr(>|t|)    
## Y.l1  -0.22379    0.05077  -4.408 1.28e-05 ***
## X.l1   0.57220    0.04540  12.603  < 2e-16 ***
## Y.l2   0.32956    0.04166   7.910 1.70e-14 ***
## const  3.30262    0.41835   7.894 1.90e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Residual standard error: 2.878 on 494 degrees of freedom
## Multiple R-Squared: 0.9248,  Adjusted R-squared: 0.9242 
## F-statistic:  1520 on 4 and 494 DF,  p-value: < 2.2e-16 
## 
## 
## 
## Covariance matrix of residuals:
##       Y     X
## Y 8.023 5.602
## X 5.602 8.297
## 
## Correlation matrix of residuals:
##        Y      X
## Y 1.0000 0.6866
## X 0.6866 1.0000

We now see our restricted VAR model:

\[ \begin{cases} Y_t &= 3.76166 + 0.51225 Y_{t-1} + 0.34573 X_{t-1} - 0.22310 Y_{t-2} - 0.10460 X_{t-2}\\ X_t &= 3.30262 - 0.22379 Y_{t-1} + 0.57220 X_{t-1} + 0.32956 Y_{t-2} \end{cases} \]

We also note the estimated Covariance matrix of residuals:

\[ \widehat{\Sigma}_\epsilon = \begin{pmatrix} 8.023 & 5.602 \\ 5.602 & 8.297 \end{pmatrix} \]

which is quite close to the true $\Sigma_\epsilon$.

VAR fit plots

We can also look at the fitted values:

plot(VAR_mdl_restricted, names = "Y")

plot(VAR_mdl_restricted, names = "X")

Which shows us - the actual and fitted values, residuals and zero-mean and residual ACF and PACF. The residuals appear to be white noise (though, the 12-th lag seems to be close to insignificance, it is small and at a very high lag order so we assume that the residuals are WN.)

Impulse-Response Function

We can quickly plot the IRF (for completeness, we will work with the latest model - the restricted VAR):

plot(irf(VAR_mdl_restricted, impulse = "Y", boot = FALSE))

plot(irf(VAR_mdl_restricted, impulse = "X", boot = FALSE))

VAR model forecasts

Finally, we can forecast the model 12 periods ahead. We will include the last ~50 historical observations in order to see the forecasts clearly:

VAR_mdl_restricted <- predict(VAR_mdl_restricted, n.ahead = 12, ci = 0.95)
plot(VAR_mdl_restricted, xlim = c(450, 520))

VAR(2) with unit roots and no cointegration

\[ \begin{pmatrix} Y_{t} \\ X_{t} \end{pmatrix} = \begin{pmatrix} 3 \\ 2 \end{pmatrix} + \begin{pmatrix} 0.4 & 0.5 \\ -0.3 & 0.8 \end{pmatrix} \begin{pmatrix} Y_{t-1} \\ X_{t-1} \end{pmatrix} + \begin{pmatrix} 0.6 & -0.5 \\ 0.3 & 0.2 \end{pmatrix} \begin{pmatrix} Y_{t-2} \\ X_{t-2} \end{pmatrix} + \begin{pmatrix} \epsilon_{1,t} \\ \epsilon_{2,t} \end{pmatrix} ,\quad \begin{pmatrix} \epsilon_{1,t} \\ \epsilon_{2,t} \end{pmatrix} \sim \mathcal{N} \left( \begin{bmatrix} 0 \\ 0 \end{bmatrix} , \begin{bmatrix} 8 & 6 \\ 6 & 9 \end{bmatrix} \right) \]

To verify that this VAR process has a unit root, we can use Python:

import numpy as np
from sympy import *
#
Theta_1 = np.array([[0.4, 0.5], [-0.3, 0.8]])
Theta_2 = np.array([[0.6, -0.5], [0.3, 0.2]])
#
z = Symbol('z')
D = Matrix([[1, 0], [0, 1]]) - Matrix(Theta_1) * z - Matrix(Theta_2) * (z**2)
#
temp_roots = roots(D.det(), z)
temp_roots = [*temp_roots]
#
for j in temp_roots:
    print("z = ", N(j, 4), "=> |z| = ", N(Abs(j), 4))

## z =  1.000 => |z| =  1.000
## z =  1.000 => |z| =  1.000
## z =  -1.481 - 1.228*I => |z| =  1.924
## z =  -1.481 + 1.228*I => |z| =  1.924

We see that this VAR specification has a couple of unit roots (as we would expect, since we will later see that both $Y$ and $X$ have unit roots). Furthermore, if we rewrite the model in differences, then:

\[ \Pi = \Theta_1 + \Theta_2 - I = \begin{pmatrix} 0.4 + 0.6 - 1 & 0.5 - 0.5\\ -0.3 +0.3 & 0.8 +0.2 - 1 \end{pmatrix} = \begin{pmatrix} 0 & 0 \\ 0 & 0 \end{pmatrix} \] i.e. we have no cointegration relations.

set.seed(123)

nsample = 100

alpha = c(3, 2)
Theta_1 = matrix(c(0.4, 0.5, -0.3, 0.8), ncol = 2, byrow = TRUE)
Theta_2 = matrix(c(0.6, -0.5, 0.3, 0.2), ncol = 2, byrow = TRUE)

Sigma = matrix(c(8, 6, 6, 9), ncol = 2, byrow = TRUE)

e = MASS::mvrnorm(n = nsample, mu = c(0, 0), Sigma = Sigma)

Y <- matrix(data = 0, nrow = nsample, ncol = 2)
colnames(Y) <- c("Y", "X")

Y[1, ] = alpha + e[1, ]
Y[2, ] = alpha + Theta_1 %*% Y[1, ] + e[2, ]
for(j in 3:nsample){
  Y[j, ] = alpha + Theta_1 %*% Y[j-1, ] + Theta_2 %*% Y[j-2, ] + e[j, ]
}


plot.ts(Y, plot.type = "multiple")

Testing for unit root

We repeat the unit root test. Note that from the plots it is difficult to say if there is a trend in the data, or if it is because of a unit root. So, we present the results of including type = trend in the ADF test equation:

suppressPackageStartupMessages({
  suppressWarnings({
    library(urca)
  })
})

adf_test_Y = ur.df(Y[, 1], type = "trend", lags = 5, selectlags = "BIC")
summary(adf_test_Y)

## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression trend 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + tt + z.diff.lag)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7.5726 -1.9854 -0.1676  1.4915  7.8285 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.99419    0.70613   4.240  5.4e-05 ***
## z.lag.1     -0.16074    0.07512  -2.140   0.0351 *  
## tt           0.38736    0.17599   2.201   0.0303 *  
## z.diff.lag  -0.24840    0.10450  -2.377   0.0196 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.918 on 90 degrees of freedom
## Multiple R-squared:  0.165,  Adjusted R-squared:  0.1372 
## F-statistic:  5.93 on 3 and 90 DF,  p-value: 0.0009742
## 
## 
## Value of test-statistic is: -2.1397 25.6513 2.8283 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau3 -4.04 -3.45 -3.15
## phi2  6.50  4.88  4.16
## phi3  8.73  6.49  5.47

adf_test_X = ur.df(Y[, 2], type = "trend", lags = 5, selectlags = "BIC")
summary(adf_test_X)

## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression trend 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + tt + z.diff.lag)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.0668 -1.9322 -0.1072  1.7905  8.3379 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.19265    0.75749   0.254 0.799827    
## z.lag.1     -0.12387    0.06785  -1.826 0.071268 .  
## tt           0.16200    0.07850   2.064 0.041958 *  
## z.diff.lag1 -0.38782    0.10964  -3.537 0.000645 ***
## z.diff.lag2 -0.24106    0.10422  -2.313 0.023021 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.702 on 89 degrees of freedom
## Multiple R-squared:  0.2485, Adjusted R-squared:  0.2147 
## F-statistic: 7.356 on 4 and 89 DF,  p-value: 3.639e-05
## 
## 
## Value of test-statistic is: -1.8256 13.6211 3.497 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau3 -4.04 -3.45 -3.15
## phi2  6.50  4.88  4.16
## phi3  8.73  6.49  5.47

In both cases the t value of z.lag.1 is greater than the 5% critical value. The same results will be present when selecting type = "drift".

To be sure that the data is $I(1)$, i.e. that the differences are stationary, we carry out the ADF on the differenced data as well:

plot.ts(diff(Y), plot.type = "multiple")

par(mfrow = c(2, 2))
forecast::Acf(diff(Y)[, 1], main = "diff(Y)")
forecast::Pacf(diff(Y)[, 1], main = "diff(Y)")
forecast::Acf(diff(Y)[, 2], main = "diff(X)")
forecast::Pacf(diff(Y)[, 2], main = "diff(X)")

Note that the average of the differenced data:

colMeans(diff(Y))

##        Y        X 
## 2.474767 1.180927

is not zero, so we will use type = "drift":

adf_test_Y = ur.df(diff(Y[, 1]), type = "drift", lags = 5, selectlags = "BIC")
summary(adf_test_Y)

## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression drift 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + z.diff.lag)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7.2217 -2.0774  0.1334  1.6366  7.1693 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   3.7757     0.5163   7.313 1.04e-10 ***
## z.lag.1      -1.5701     0.1695  -9.263 9.78e-15 ***
## z.diff.lag    0.1669     0.1034   1.613     0.11    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.948 on 90 degrees of freedom
## Multiple R-squared:  0.6829, Adjusted R-squared:  0.6759 
## F-statistic: 96.91 on 2 and 90 DF,  p-value: < 2.2e-16
## 
## 
## Value of test-statistic is: -9.2626 42.9299 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau2 -3.51 -2.89 -2.58
## phi1  6.70  4.71  3.86

adf_test_X = ur.df(diff(Y[, 2]), type = "drift", lags = 5, selectlags = "BIC")
summary(adf_test_X)

## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression drift 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + z.diff.lag)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.1243 -1.7623  0.1669  1.8911  8.6072 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   1.9798     0.3514   5.634 1.99e-07 ***
## z.lag.1      -1.7205     0.1685 -10.210  < 2e-16 ***
## z.diff.lag    0.2703     0.1016   2.661  0.00923 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.791 on 90 degrees of freedom
## Multiple R-squared:  0.6981, Adjusted R-squared:  0.6914 
## F-statistic: 104.1 on 2 and 90 DF,  p-value: < 2.2e-16
## 
## 
## Value of test-statistic is: -10.2096 52.1362 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau2 -3.51 -2.89 -2.58
## phi1  6.70  4.71  3.86

Because z.lag.1 has t value < -2.89, we reject the null hypothesis of a unit root in both cases. So, the differences do not have a unit root.

We have confirmed that $Y_t$ and $X_t$ are $I(1)$.

Johansens procedure for non-cointegrated processes

If $rank(\Pi) = 0$ - then there are no cointegration relations in the system.

We begin with the approach, which is not based on the Johansen test. Note that depending on the options of our choice, we may get different results:

If the VECM should include a constant:

suppressWarnings({
  rank.select(Y, lag.max = 10, include = "const")
})

## Best AIC: rank= 1  lag= 1 
## Best BIC: rank= 0  lag= 1 
## Best HQ : rank= 0  lag= 1

We may get a VECM(1) without any cointegration relations (via BIC).

If the VECM should include a trend:

suppressWarnings({
  rank.select(Y, lag.max = 10, include = "trend")
})

## Best AIC: rank= 2  lag= 1 
## Best BIC: rank= 2  lag= 1 
## Best HQ : rank= 2  lag= 1

We may get a full rank matrix - i.e. a stationary VAR.

If the VECM would include both a const and a trend:

suppressWarnings({
  rank.select(Y, lag.max = 10, include = "both")
})

## Best AIC: rank= 2  lag= 1 
## Best BIC: rank= 0  lag= 1 
## Best HQ : rank= 0  lag= 1

Then we again get a $VECM(1)$ without any cointegration relations (via BIC).

In the second case, we would have made the incorrect conclusion that our series are stationary, when in fact we know (and have tested via ADF) that both $X$ and $Y$ have unit roots! For this reason, you should always perform a number of different tests to make sure that their results align with each other. In our case: the data are $I(1)$, so $\Pi$ cannot have a full rank!

Let’s use the Johansen procedure:

suppressWarnings({
  temp_mdl <- VECM(Y, include = "const", lag = 1, estim = "ML")
  summary(rank.test(temp_mdl))
})

##   r     trace trace_pval trace_pval_T     eigen eigen_pval
## 1 0 9.7548558     0.3056       0.3149 9.4104045     0.2592
## 2 1 0.3444513     0.5573       0.5631 0.3444513     0.5573

In this case, we see that both the trace and the eigen tests do not reject the null hypothesis of $H_0: r = 0$. We stop after we do not reject the null for the smallest value of $r$, i.e. we do not need to look at $H_0: r = 1$ anymore. We can conclude that there are no cointegration relations in the equation.

suppressWarnings({
  temp_mdl <- VECM(Y, include = "trend", lag = 1, estim = "ML")
  summary(rank.test(temp_mdl))
})

##   r     trace trace_pval trace_pval_T    eigen eigen_pval
## 1 0 12.383172    0.04803      0.05035 7.764731    0.19427
## 2 1  4.618441    0.03588      0.03735 4.618441    0.03758

In this case, we do not reject the null hypothesis that $rank = 0$ (and we reject the null hypothesis that $rank = 1$). The results indicate that there are no cointegration relations in the system.

suppressWarnings({
  temp_mdl <- VECM(Y, include = "both", lag = 1, estim = "ML")
  summary(rank.test(temp_mdl))
})

##   r     trace trace_pval trace_pval_T    eigen eigen_pval
## 1 0 15.431194    0.12413      0.12483 9.842641    0.41933
## 2 1  5.588553    0.01808      0.01964 5.588553    0.01808

We see that we do not reject the null hypothesis that $rank = 0$ (and reject the null hypothesis that $rank = 1$). These results indicate that there are no cointegration relations.

We can also use the ca.jo function:

temp_tst = ca.jo(Y, K = 2, type = "trace", ecdet = "const")
paste0(temp_tst@type, ", ", temp_tst@model)

## [1] "trace statistic, without linear trend and constant in cointegration"

print(cbind(capture.output(summary(temp_tst))[11:16]))

##      [,1]                                                  
## [1,] "Values of teststatistic and critical values of test:"
## [2,] ""                                                    
## [3,] "          test 10pct  5pct  1pct"                    
## [4,] "r <= 1 |  6.30  7.52  9.24 12.97"                    
## [5,] "r = 0  | 80.15 17.85 19.96 24.60"                    
## [6,] ""

We reject we reject $H_0: r = 0$ and do not reject $H_0: r \leq 1$, i.e. we do not reject $H_0: r = 1$. This would indicate tat we have one cointegration relation.

Unfortunately, we do get some conflicting test results. However, most of them suggest a zero-rank for $\Pi$, so we will go with those results (Note: a larger sample size could have more conclusive results).

Phillips-Ouliaris cointegration test

Since we have a bivariate $VAR$ model, we can carry out the EG procedure, or the Phillips-Ouliaris cointegration test.

cy.po <- ca.po(Y, demean = "trend", type="Pz")
summary(cy.po)

Because the test-statistic = 17.2675 < 81.3812, we do not reject the null hypothesis that $Y_t$ and $X_t$ are not cointegrated.

Because the variables are $I(1)$ and there is no cointegration, a $VAR$ model for the differences is appropriate.

VAR model for differenced data:

Because VARselect suggested to use a $VAR(2)$ model for the levels, $\vec{Y}_t$. Furthermore:

dY <- diff(Y)

VARselect(dY, lag.max = 5)

## $selection
## AIC(n)  HQ(n)  SC(n) FPE(n) 
##      3      1      1      3 
## 
## $criteria
##                1         2         3         4         5
## AIC(n)  3.472910  3.491372  3.454494  3.506594  3.582898
## HQ(n)   3.538483  3.600660  3.607497  3.703312  3.823331
## SC(n)   3.635248  3.761935  3.833283  3.993608  4.178138
## FPE(n) 32.231807 32.837559 31.659764 33.373786 36.055261

We do note, that $AIC$ suggests a $VAR(3)$, we will create a $VAR(1)$ model for $\Delta \vec{Y}_t$ (based on $BIC$ and on the VARselect results on Y).

d_VAR <- VAR(dY, p = 1)
summary(d_VAR)

## 
## VAR Estimation Results:
## ========================= 
## Endogenous variables: Y, X 
## Deterministic variables: const 
## Sample size: 98 
## Log Likelihood: -441.282 
## Roots of the characteristic polynomial:
## 0.5252 0.5252
## Call:
## VAR(y = dY, p = 1)
## 
## 
## Estimation results for equation Y: 
## ================================== 
## Y = Y.l1 + X.l1 + const 
## 
##       Estimate Std. Error t value Pr(>|t|)    
## Y.l1   -0.6628     0.1122  -5.909 5.35e-08 ***
## X.l1    0.5356     0.1155   4.637 1.13e-05 ***
## const   3.4936     0.3502   9.975  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Residual standard error: 2.704 on 95 degrees of freedom
## Multiple R-Squared: 0.2765,  Adjusted R-squared: 0.2613 
## F-statistic: 18.15 on 2 and 95 DF,  p-value: 2.107e-07 
## 
## 
## Estimation results for equation X: 
## ================================== 
## X = Y.l1 + X.l1 + const 
## 
##       Estimate Std. Error t value Pr(>|t|)    
## Y.l1   -0.3794     0.1139  -3.330  0.00124 ** 
## X.l1   -0.1095     0.1173  -0.933  0.35298    
## const   2.2571     0.3558   6.345 7.54e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Residual standard error: 2.746 on 95 degrees of freedom
## Multiple R-Squared: 0.2169,  Adjusted R-squared: 0.2004 
## F-statistic: 13.16 on 2 and 95 DF,  p-value: 9.03e-06 
## 
## 
## 
## Covariance matrix of residuals:
##      Y     X
## Y 7.31 5.040
## X 5.04 7.542
## 
## Correlation matrix of residuals:
##        Y      X
## Y 1.0000 0.6788
## X 0.6788 1.0000

The estimated model is:

\[ \begin{cases} \Delta Y_t &= 3.4936 - 0.6628 \Delta Y_{t-1} + 0.5356 \Delta X_{t-1}\\ \Delta X_t &= 2.2571 - 0.3794 \Delta Y_{t-1} - 0.1095 \Delta X_{t-1} \end{cases} \]

Compared to the true model for differences:

\[ \begin{cases} \Delta Y_t &= 3 - 0.6 \Delta Y_{t-1} + 0.5 \Delta X_{t-1} + \epsilon_{1, t}\\ \Delta X_t &= 2 - 0.3 \Delta Y_{t-1} - 0.2 \Delta X_{t-1} + \epsilon_{2, t} \end{cases} \] The coefficients are similar.

Note that this is the same $VECM$ model with $\Pi = \vec{0}$, i.e. without the cointegration matrix.

As before, we can create a reduced VAR by removing the insignificant variables:

dVAR_rest = restrict(d_VAR, method = "ser", thresh = 2.0, resmat = NULL)
summary(dVAR_rest)

## 
## VAR Estimation Results:
## ========================= 
## Endogenous variables: Y, X 
## Deterministic variables: const 
## Sample size: 98 
## Log Likelihood: -442.109 
## Roots of the characteristic polynomial:
## 0.489 0.489
## Call:
## VAR(y = dY, p = 1)
## 
## 
## Estimation results for equation Y: 
## ================================== 
## Y = Y.l1 + X.l1 + const 
## 
##       Estimate Std. Error t value Pr(>|t|)    
## Y.l1   -0.6628     0.1122  -5.909 5.35e-08 ***
## X.l1    0.5356     0.1155   4.637 1.13e-05 ***
## const   3.4936     0.3502   9.975  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Residual standard error: 2.704 on 95 degrees of freedom
## Multiple R-Squared: 0.2765,  Adjusted R-squared: 0.2613 
## F-statistic: 18.15 on 2 and 95 DF,  p-value: 2.107e-07 
## 
## 
## Estimation results for equation X: 
## ================================== 
## X = Y.l1 + const 
## 
##       Estimate Std. Error t value Pr(>|t|)    
## Y.l1  -0.44639    0.08843  -5.048 2.12e-06 ***
## const  2.29121    0.35364   6.479 3.95e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Residual standard error: 2.744 on 96 degrees of freedom
## Multiple R-Squared: 0.3128,  Adjusted R-squared: 0.2984 
## F-statistic: 21.84 on 2 and 96 DF,  p-value: 1.519e-08 
## 
## 
## 
## Covariance matrix of residuals:
##      Y     X
## Y 7.31 5.040
## X 5.04 7.611
## 
## Correlation matrix of residuals:
##        Y      X
## Y 1.0000 0.6757
## X 0.6757 1.0000

print(BIC(d_VAR))

## [1] 910.0747

print(BIC(dVAR_rest))

## [1] 907.1424

Note that the reduced $VAR$ model has a lower $BIC$ value.

The residuals of the unrestricted $VAR$:

suppressPackageStartupMessages({
  suppressWarnings({
    library(forecast)
  })
})

plot.ts(resid(d_VAR))

Acf(resid(d_VAR))

and the restricted $VAR$

suppressPackageStartupMessages({
  suppressWarnings({
    library(forecast)
  })
})
plot.ts(resid(dVAR_rest))

Acf(resid(dVAR_rest))

Appear to be white noise, indicating that the model specification is acceptable.

VAR fit plots

Note that we have modeled the differences of our data. Unlike Arma or auto.arma we cannot specify to differentiate our data when creating the model - instead we already passed the differentiated data to VAR(). We estimated a $VAR(2)$ on $\Delta Y_t$. This means that $\widehat{Y}_{3} = Y_{2} + \Delta \widehat{Y}_{3} = Y_{2} + \widehat{Y}_{3} - \widehat{Y}_{2}$; $\widehat{Y}_{4} = Y_{2} + \Delta \widehat{Y}_{3} + \Delta \widehat{Y}_{4}$, etc.:

fitted_vals = fitted(dVAR_rest)
fitted_vals[, 1] <- Y[2, 1] + cumsum(fitted_vals[, 1])
fitted_vals[, 2] <- Y[2, 2] + cumsum(fitted_vals[, 2])
fitted_vals <- ts(fitted_vals, start = 3)
par(mfrow = c(2, 1))
plot.ts(Y[, 1], main = "Y")
lines(fitted_vals[, 1], col = "red")
plot.ts(Y[, 2], main = "X")
lines(fitted_vals[, 2], col = "red")

VAR forecast plots

The same methodology is applied to the forecasts: \[Y_{T+h} = Y_T + \Delta Y_{T+1} + ... + \Delta Y_{T+h}\]

h = 20

forc_vals <- predict(dVAR_rest, n.ahead = h, ci = 0.95)$fcst
forc_vals <- data.frame(Y = forc_vals$Y[, 1], X = forc_vals$X[, 1])
forc_vals[, 1] <- Y[nrow(Y), 1] + cumsum(forc_vals$Y)
forc_vals[, 2] <- Y[nrow(Y), 2] + cumsum(forc_vals$X)
forc_vals <- ts(forc_vals, start = nrow(Y) + 1)

plot(ts(Y), 
     plot.type = "single", col = c(2, 4), 
     xlim = c(0, nrow(Y) + h), 
     ylim = c(min(Y), max(forc_vals)))
lines(forc_vals[, 1], col = "red", lty = 2)
lines(forc_vals[, 2], col = "blue", lty = 2)

legend(0, 300, c("Y historical", "X historical", "Y forecast", "X forecast"), 
       lty = c(1, 1, 2, 2), col = c(2, 4, 2, 4))

VAR(2) with unit roots and cointegration

The final example concerns the cointegrated VAR with $I(1)$ variables.

\[ \begin{pmatrix} Y_{t} \\ X_{t} \end{pmatrix} = \begin{pmatrix} 3 \\ 2 \end{pmatrix} + \begin{pmatrix} 0.0 & 0.5 \\ 0.0 & 0.5 \end{pmatrix} \begin{pmatrix} Y_{t-1} \\ X_{t-1} \end{pmatrix} + \begin{pmatrix} 0.0 & 0.5 \\ 0.0 & 0.5 \end{pmatrix} \begin{pmatrix} Y_{t-2} \\ X_{t-2} \end{pmatrix} + \begin{pmatrix} \epsilon_{1,t} \\ \epsilon_{2,t} \end{pmatrix} ,\quad \begin{pmatrix} \epsilon_{1,t} \\ \epsilon_{2,t} \end{pmatrix} \sim \mathcal{N} \left( \begin{bmatrix} 0 \\ 0 \end{bmatrix} , \begin{bmatrix} 8 & 6 \\ 6 & 9 \end{bmatrix} \right) \]

The process has a unit root:

import numpy as np
from sympy import *
#
Theta_1 = np.array([[0.0, 0.5], [0.0, 0.5]])
Theta_2 = np.array([[0.0, 0.5], [0.0, 0.5]])
#
z = Symbol('z')
D = Matrix([[1, 0], [0, 1]]) - Matrix(Theta_1) * z - Matrix(Theta_2) * (z**2)
#
temp_roots = roots(D.det(), z)
temp_roots = [*temp_roots]
#
for j in temp_roots:
    print("z = ", N(j, 4), "=> |z| = ", N(Abs(j), 4))

## z =  -2.000 => |z| =  2.000
## z =  1.000 => |z| =  1.000

And the cointegration matrix:

\[ \Pi = \Theta_1 + \Theta_2 - I = \begin{pmatrix} 0.0 + 0.0 - 1 & 0.5 + 0.5\\ 0.0 + 0.0 & 0.5 + 0.5 - 1 \end{pmatrix} = \begin{pmatrix} -1 & 1 \\ 0 & 0 \end{pmatrix} \Rightarrow rank(\Pi) = 1 \]

shows that the system has one cointegration relation.

P.S. From the second form of $VECM$ representation we have $\Gamma_1 = -\Theta_2$..

set.seed(123)

nsample = 100

alpha = c(3, 2)
Theta_1 = matrix(c(0, 0.5, 0, 0.5), ncol = 2, byrow = TRUE)
Theta_2 = matrix(c(0, 0.5, 0, 0.5), ncol = 2, byrow = TRUE)

Sigma = matrix(c(8, 6, 6, 9), ncol = 2, byrow = TRUE)

e = MASS::mvrnorm(n = nsample, mu = c(0, 0), Sigma = Sigma)

Y <- matrix(data = 0, nrow = nsample, ncol = 2)
colnames(Y) <- c("Y", "X")

Y[1, ] = alpha + e[1, ]
Y[2, ] = alpha + Theta_1 %*% Y[1, ] + e[2, ]
for(j in 3:nsample){
  Y[j, ] = alpha + Theta_1 %*% Y[j-1, ] + Theta_2 %*% Y[j-2, ] + e[j, ]
}


plot.ts(Y, plot.type = "multiple")

Testing for unit root

We repeat the unit root test as before:

suppressPackageStartupMessages({
  suppressWarnings({
    library(urca)
  })
})

adf_test_Y = ur.df(Y[, 1], type = "trend", lags = 5, selectlags = "BIC")
summary(adf_test_Y)

## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression trend 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + tt + z.diff.lag)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.1814 -2.0352 -0.1632  1.9533  7.9257 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.61037    0.74849   0.815 0.416958    
## z.lag.1     -0.17328    0.07576  -2.287 0.024528 *  
## tt           0.25686    0.10532   2.439 0.016698 *  
## z.diff.lag  -0.35091    0.10068  -3.486 0.000761 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.164 on 90 degrees of freedom
## Multiple R-squared:  0.2486, Adjusted R-squared:  0.2236 
## F-statistic: 9.928 on 3 and 90 DF,  p-value: 1.016e-05
## 
## 
## Value of test-statistic is: -2.2872 13.4654 3.672 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau3 -4.04 -3.45 -3.15
## phi2  6.50  4.88  4.16
## phi3  8.73  6.49  5.47

adf_test_X = ur.df(Y[, 2], type = "trend", lags = 5, selectlags = "BIC")
summary(adf_test_X)

## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression trend 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + tt + z.diff.lag)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.8809 -1.8735  0.1033  1.6983  7.0464 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)  0.46314    0.67402   0.687  0.49376   
## z.lag.1     -0.15812    0.06736  -2.347  0.02110 * 
## tt           0.23476    0.09423   2.491  0.01456 * 
## z.diff.lag  -0.33237    0.09962  -3.336  0.00124 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.632 on 90 degrees of freedom
## Multiple R-squared:  0.2325, Adjusted R-squared:  0.2069 
## F-statistic: 9.086 on 3 and 90 DF,  p-value: 2.571e-05
## 
## 
## Value of test-statistic is: -2.3474 16.4931 3.843 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau3 -4.04 -3.45 -3.15
## phi2  6.50  4.88  4.16
## phi3  8.73  6.49  5.47

In both cases the t value of z.lag.1 is greater than the 5% critical value of -3.45, so we do not reject the null hypothesis $H_0: \rho = 0$. So, both $X_t$ and $Y_t$ have a unit root. Next, we examine the differences:

colMeans(diff(Y))

##        Y        X 
## 1.447436 1.435706

is not zero, so we will use type = "drift":

adf_test_Y = ur.df(diff(Y[, 1]), type = "drift", lags = 5, selectlags = "BIC")
summary(adf_test_Y)

## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression drift 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + z.diff.lag)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.9505 -2.0828  0.2524  2.3136  7.0299 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   2.3554     0.4230   5.568 2.64e-07 ***
## z.lag.1      -1.6680     0.1773  -9.407 4.88e-15 ***
## z.diff.lag    0.1622     0.1045   1.552    0.124    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.246 on 90 degrees of freedom
## Multiple R-squared:  0.7236, Adjusted R-squared:  0.7175 
## F-statistic: 117.8 on 2 and 90 DF,  p-value: < 2.2e-16
## 
## 
## Value of test-statistic is: -9.4073 44.2624 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau2 -3.51 -2.89 -2.58
## phi1  6.70  4.71  3.86

adf_test_X = ur.df(diff(Y[, 2]), type = "drift", lags = 5, selectlags = "BIC")
summary(adf_test_X)

## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression drift 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + z.diff.lag)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.7456 -1.7111  0.2006  1.8310  7.5214 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   2.4076     0.3735   6.447 5.54e-09 ***
## z.lag.1      -1.7191     0.1744  -9.858 5.63e-16 ***
## z.diff.lag    0.2196     0.1030   2.133   0.0357 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.675 on 90 degrees of freedom
## Multiple R-squared:  0.7146, Adjusted R-squared:  0.7082 
## F-statistic: 112.6 on 2 and 90 DF,  p-value: < 2.2e-16
## 
## 
## Value of test-statistic is: -9.8582 48.6133 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau2 -3.51 -2.89 -2.58
## phi1  6.70  4.71  3.86

Because the t value of z.lag.1 is less than -2.89 in both variable models, we reject the null hypothesis of a unit root. We conclude that both $Y_t$ and $X_t$ are $I(1)$.

Johansens procedure

We begin with the rank.select function:

suppressWarnings({
  print(rank.select(Y, lag.max = 10, include = "const"))
})

## Best AIC: rank= 1  lag= 1 
## Best BIC: rank= 1  lag= 1 
## Best HQ : rank= 1  lag= 1

suppressWarnings({
  print(rank.select(Y, lag.max = 10, include = "trend"))
})

## Best AIC: rank= 1  lag= 1 
## Best BIC: rank= 1  lag= 1 
## Best HQ : rank= 1  lag= 1

suppressWarnings({
  print(rank.select(Y, lag.max = 10, include = "both"))
})

## Best AIC: rank= 2  lag= 1 
## Best BIC: rank= 2  lag= 1 
## Best HQ : rank= 2  lag= 1

We see that selecting to include either a constant or a trend results in a suggested $VECM(1)$ with one cointegration rank is selected (again note that this is not the Johansen procedure). If we choose to include = "both", then we would get a $VECM(1)$ wit ha full rank, which would indicate a stationary series. Since we did not reject the unit root hypothesis for either variable, this result is misleading. Next, we carry out the Johansen procedure:

suppressWarnings({
  temp_mdl <- VECM(Y, include = "const", lag = 1, estim = "ML")
  summary(rank.test(temp_mdl))
})

##   r      trace trace_pval trace_pval_T      eigen eigen_pval
## 1 0 62.9118156     <0.001       <0.001 62.5076349     <0.001
## 2 1  0.4041808     0.5249        0.531  0.4041808     0.5249

In this case, we see that both the trace and the eigen tests reject the null hypothesis of $H_0: r = 0$. Because the pvalue is greater than 0.05 in both tests, we do not reject the null hypothesis $H_0: r = 1$.

suppressWarnings({
  temp_mdl <- VECM(Y, include = "trend", lag = 1, estim = "ML")
  summary(rank.test(temp_mdl))
})

##   r     trace trace_pval trace_pval_T     eigen eigen_pval
## 1 0 44.270550    < 0.001      < 0.001 38.572636    < 0.001
## 2 1  5.697913    0.01883      0.01994  5.697913    0.02011

If we specify a VECM with a trend component, then we reject the null hypothesis of no cointegration ($r = 0$), and we reject the null hypothesis $H_0:r = 1$. Because our series is $I(1)$, we know that $\Pi$ cannot have a full rank (i.e., if $\Pi$ has a full rank, then $Y_t$ and $X_t$ cannot be $I(1)$ but stationary). Because we specified include = "trend", the resulting model would indicate that we have a stationary series with a trend component, but we know that our series are $I(1)$.

Note that if we wish to include a constant or trend in the long-run (i.e. cointegration) relationship, we can do this by specifying LRinclude = ... in the VECM() model (see the lecture slides for the different cases and their specifications in VECM()).

suppressWarnings({
  temp_mdl <- VECM(Y, include = "both", lag = 1, estim = "ML")
  summary(rank.test(temp_mdl))
})

##   r     trace trace_pval trace_pval_T     eigen eigen_pval
## 1 0 70.812513    < 0.001      < 0.001 66.807338    < 0.001
## 2 1  4.005175    0.04536      0.04822  4.005175    0.04537

In this case, we see that both the trace and the eigen tests reject the null hypothesis of $H_0: r = 0$. The p-value is very close to 0.05, so we will not reject the null hypothesis $H_0: r = 1$.

So, our tests strongly indicate that the series has one cointegration relation.

We can also test via ca.jo:

temp_tst = ca.jo(Y, K = 2, type = "eigen", ecdet = "none", spec = "transitory")
paste0(temp_tst@type, ", ", temp_tst@model)

## [1] "maximal eigenvalue statistic (lambda max), with linear trend"

print(cbind(capture.output(summary(temp_tst))[11:16]))

##      [,1]                                                  
## [1,] "Values of teststatistic and critical values of test:"
## [2,] ""                                                    
## [3,] "          test 10pct  5pct  1pct"                    
## [4,] "r <= 1 |  0.40  6.50  8.18 11.65"                    
## [5,] "r = 0  | 62.51 12.91 14.90 19.19"                    
## [6,] ""

Note that ecdet is used to specify the additional terms in cointegration itself. As we have not included any deterministic components in the cointegrating relation equation (see the underlying model specification). We selected ecdet = "none". Looking at the results: because 62.51 > 14.90, we reject $H_0: r = 0$; because 0.40 < 8.18, we do not reject $H_0: r \leq 1$. So, we have one cointegrating relation in the series.

Note that setting ecdet to other value may have different results. It is important to remember that our series is $I(1)$, therefore $\Pi$ cannot have a full rank. To determine, whether the cointegration (i.e. long-run) relation has a constant or a trend - examine the data plots - if $Y_t$ and $X_t$ series are always different by a similar value at each $t$, then it may be worth to include a constant in the cointegration equation itself. If the difference between the series increases as time increases, then a trend may also be present in the cointegration relation equation as well:

par(mfrow = c(2, 1))
plot(ts(Y[, 1]))
lines(ts(Y[, 2]), col = "red")
legend(0, 120, c("Y", "X"), col = c(1, 2), lty = c(1, 1))
plot.ts(Y[,1] - Y[, 2], main = "Y - X")

As there is no clear indication of a constant or a trend between the series as time increases, we used ecdet = "none".

We conclude that there is strong evidence to suggest a single cointegration relation in our $I(1)$ series.

VECM estimation

We will now estimate a $VECM$ model. Note that the series are both non zero, and we already saw that their differences:

colMeans(Y)

##        Y        X 
## 67.89147 66.67245

colMeans(diff(Y))

##        Y        X 
## 1.447436 1.435706

are also non-zero. We now select the $VAR(p)$ model with a contant type:

VARselect(Y, lag.max = 10, type = "const")

## $selection
## AIC(n)  HQ(n)  SC(n) FPE(n) 
##      2      2      2      2 
## 
## $criteria
##                1         2         3         4         5         6         7
## AIC(n)  3.711035  3.494265  3.507156  3.538267  3.610157  3.634016  3.677280
## HQ(n)   3.778240  3.606273  3.663967  3.739881  3.856574  3.925237  4.013304
## SC(n)   3.877689  3.772022  3.896015  4.038229  4.221221  4.356184  4.510550
## FPE(n) 40.898145 32.933626 33.374284 34.453412 37.062773 38.018993 39.787718
##                8         9        10
## AIC(n)  3.686689  3.721766  3.782606
## HQ(n)   4.067516  4.147396  4.253039
## SC(n)   4.631062  4.777242  4.949184
## FPE(n) 40.280698 41.873700 44.706549

Estimation using `tsDyn` package

All information criterions suggest a $VAR(2)$ model for the levels $Y_t$. So, for the differences, $\Delta Y_t$, we will create a $VECM(1)$ model with a drift and 1 cointegrating relation (based on Johansens procedure test results):

Y_VECM <- VECM(Y, lag = 1, r = 1, include = "const", LRinclude = "none")
summary(Y_VECM)

## #############
## ###Model VECM 
## #############
## Full sample size: 100    End sample size: 98
## Number of variables: 2   Number of estimated slope parameters 8
## AIC 349.3118     BIC 372.5765    SSR 1417.039
## Cointegrating vector (estimated by 2OLS):
##    Y         X
## r1 1 -1.011868
## 
## 
##            ECT                 Intercept         Y -1              
## Equation Y -1.0968(0.1784)***  2.6362(0.3271)*** 0.0839(0.1251)    
## Equation X -0.0388(0.1778)     1.9989(0.3260)*** 0.1092(0.1247)    
##            X -1              
## Equation Y -0.5895(0.1477)***
## Equation X -0.4833(0.1473)**

So, the evaluated model is therefore: \[ \begin{cases} \Delta Y_t &= 2.6362 - 1.0968 \text{ECT}_{t-1} + 0.0839 \Delta Y_{t-1} -0.5895 \Delta X_{t-1} \\ \Delta X_t &= 1.9989 - 0.0388 \text{ECT}_{t-1} + 0.1092 \Delta Y_{t-1} -0.4833 \Delta X_{t-1} \end{cases} \]

And the cointegrating relations are $1 \cdot Y_t - 1.011868 \cdot X_t$. So, $\text{ECT}_{t-1} = Y_{t-1} - 1.011868 \cdot X_{t-1}$. Note that the true cointegration relation is $\Pi \mathbf{Y}_{t-1} =\begin{pmatrix} -1 \cdot Y_{t-1} + 1 \cdot X_{t-1} \\ 0\end{pmatrix}$.

If we normalize so that the coefficient of $Y_{t-1}$ is $1$, then the cointegration relation is $1\cdot Y_{t-1} - 1 \cdot X_{t-1}$, which is close to the estimated value.

Note that the representation $\Pi = \alpha \beta^\top$ is not unique, so, as a matter of convention, we typically normalize the coefficients of $\beta$ (i.e. the cointegrating matrix) such that some of the elements of $\beta$ are equal to one, or $\mathbf{I}$, depending on the number of cointegrating relations.

Alternatively, we can also extract the $\Pi$ matrix:

coefPI(Y_VECM)

##                      Y          X
## Equation Y -1.09676324 1.10977999
## Equation X -0.03882687 0.03928768

which is made up of the adjustment coefficients, $\alpha$:

coefA(Y_VECM)

##                    ECT
## Equation Y -1.09676324
## Equation X -0.03882687

and the cointegrating coefficients $\beta$:

coefB(Y_VECM)

##          r1
## Y  1.000000
## X -1.011868

If we combine the two results, we can see how the coefficients of $\text{ECT}_{t-1}$ came to be: \[ \begin{aligned} \widehat{\alpha} \widehat{\beta}^\top \mathbf{Y}_{t-1} &= \begin{bmatrix} -1.09676324 \\ -0.03882687 \end{bmatrix} \begin{bmatrix} 1.00 & -1.011868 \end{bmatrix} \begin{bmatrix} Y_{t-1} \\ X_{t-1} \end{bmatrix} \\ &= \begin{bmatrix} -1.09676324 \\ -0.03882687 \end{bmatrix} \begin{bmatrix} 1.00 Y_{t-1} - 1.011868 X_{t-1} \end{bmatrix} \\ &= \begin{bmatrix} -1.09676324 \cdot (1.00 Y_{t-1} - 1.011868 X_{t-1}) \\ -0.03882687 \cdot (1.00 Y_{t-1} - 1.011868 X_{t-1}) \end{bmatrix} \\ &= \begin{bmatrix} -1.09676324 \cdot \text{ECT}_{t-1} \\ -0.03882687 \cdot \text{ECT}_{t-1} \end{bmatrix} \end{aligned} \]

In other words, the cointegration relation $1.00 Y_{t-1} - 1.011868 X_{t-1}$ is adjusted via the parameters in $\alpha$. Because the cointegration relation is normalized, we can think of the $\text{ECT}$ term as the residual from the equation $Y_{t} = 1.011868 \cdot X_{t} + \text{ECT}_t$.

Furthermore, instead of using $\text{ECT}$, we can rewrite the $VECM$ with the $\Pi \mathbf{Y}_{t-1}$ component:

\[ \begin{aligned} \widehat{\alpha} \widehat{\beta}^\top \mathbf{Y}_{t-1} &= \begin{bmatrix} -1.09676324 \\ -0.03882687 \end{bmatrix} \begin{bmatrix} 1.00 & -1.011868 \end{bmatrix} \begin{bmatrix} Y_{t-1} \\ X_{t-1} \end{bmatrix} \\ &= \begin{bmatrix} -1.09676324 & 1.10978 \\ -0.03882687 & 0.03928767 \end{bmatrix} \begin{bmatrix} Y_{t-1} \\ X_{t-1} \end{bmatrix}\\ &= \widehat{\Pi} \mathbf{Y}_{t-1}\\ &= \begin{bmatrix} -1.09676324 \cdot Y_{t-1} + 1.10978 \cdot X_{t-1}\\ -0.03882687 \cdot Y_{t-1} + 0.03928767 \cdot X_{t-1} \end{bmatrix} \end{aligned} \]

Further note that the error correction terms (ECT’s) are negative - this is a requirement!!. Positive ECT implies that the process is not converging in the long run, thus there would be some instabilities - either model specification problems, or errors in data. It could also indicate the presence of structural changes in the data, which would require testing and specifying structural breaks in the VECM model.

Note that the coefficient for $\text{ECT}_{t-1}$ in $\Delta X_t$ equation is insignificant as well as the coefficients of $\Delta Y_{t-1}$ in both equations, which fall in line with the underlying (actual) model (see the true $VAR$ equation and the true $\Pi$ matrix), which we used to generate the data. The significant coefficients of $\Delta X_{t-1}$ from both equations are close to the true values in $\Gamma_1 = -\Theta_2$ matrix.

Estimation using `urca` package

We can also estimate the $VECM$ model via ca.jo() function. We will estimate the two different representations:

The first representation form:

VECM_Y_v2 <- ca.jo(Y, K = 2, spec = "longrun")
summary(cajools(VECM_Y_v2))

## Response Y.d :
## 
## Call:
## lm(formula = Y.d ~ constant + Y.dl1 + X.dl1 + Y.l2 + X.l2 - 1, 
##     data = data.mat)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.5706 -1.6695 -0.2716  1.5272  7.4551 
## 
## Coefficients:
##          Estimate Std. Error t value Pr(>|t|)    
## constant   3.7749     0.6366   5.930 5.12e-08 ***
## Y.dl1     -1.1025     0.1356  -8.130 1.81e-12 ***
## X.dl1      0.6117     0.1431   4.273 4.65e-05 ***
## Y.l2      -1.2878     0.1981  -6.501 3.93e-09 ***
## X.l2       1.2867     0.1969   6.535 3.37e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.703 on 93 degrees of freedom
## Multiple R-squared:  0.5282, Adjusted R-squared:  0.5028 
## F-statistic: 20.82 on 5 and 93 DF,  p-value: 6.651e-14
## 
## 
## Response X.d :
## 
## Call:
## lm(formula = X.d ~ constant + Y.dl1 + X.dl1 + Y.l2 + X.l2 - 1, 
##     data = data.mat)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.3835 -1.8211  0.1655  1.5859  7.1288 
## 
## Coefficients:
##          Estimate Std. Error t value Pr(>|t|)   
## constant  1.67411    0.64790   2.584  0.01133 * 
## Y.dl1     0.09597    0.13801   0.695  0.48856   
## X.dl1    -0.47005    0.14569  -3.226  0.00173 **
## Y.l2      0.01565    0.20162   0.078  0.93828   
## X.l2     -0.01118    0.20041  -0.056  0.95564   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.751 on 93 degrees of freedom
## Multiple R-squared:  0.3314, Adjusted R-squared:  0.2955 
## F-statistic: 9.221 on 5 and 93 DF,  p-value: 3.73e-07

The second representation form:

VECM_Y_v2 <- ca.jo(Y, K = 2, spec = "transitory")
summary(cajools(VECM_Y_v2))

## Response Y.d :
## 
## Call:
## lm(formula = Y.d ~ constant + Y.dl1 + X.dl1 + Y.l1 + X.l1 - 1, 
##     data = data.mat)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.5706 -1.6695 -0.2716  1.5272  7.4551 
## 
## Coefficients:
##          Estimate Std. Error t value Pr(>|t|)    
## constant   3.7749     0.6366   5.930 5.12e-08 ***
## Y.dl1      0.1853     0.1323   1.400    0.165    
## X.dl1     -0.6750     0.1510  -4.472 2.19e-05 ***
## Y.l1      -1.2878     0.1981  -6.501 3.93e-09 ***
## X.l1       1.2867     0.1969   6.535 3.37e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.703 on 93 degrees of freedom
## Multiple R-squared:  0.5282, Adjusted R-squared:  0.5028 
## F-statistic: 20.82 on 5 and 93 DF,  p-value: 6.651e-14
## 
## 
## Response X.d :
## 
## Call:
## lm(formula = X.d ~ constant + Y.dl1 + X.dl1 + Y.l1 + X.l1 - 1, 
##     data = data.mat)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.3835 -1.8211  0.1655  1.5859  7.1288 
## 
## Coefficients:
##          Estimate Std. Error t value Pr(>|t|)   
## constant  1.67411    0.64790   2.584   0.0113 * 
## Y.dl1     0.08031    0.13468   0.596   0.5524   
## X.dl1    -0.45887    0.15363  -2.987   0.0036 **
## Y.l1      0.01565    0.20162   0.078   0.9383   
## X.l1     -0.01118    0.20041  -0.056   0.9556   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.751 on 93 degrees of freedom
## Multiple R-squared:  0.3314, Adjusted R-squared:  0.2955 
## F-statistic: 9.221 on 5 and 93 DF,  p-value: 3.73e-07

Because we are focusing on the second representation, so we will keep the equation from spec = "transitory".

We can see that this representation is a bit easier to compare to the true model. The estimated system is:

\[ \begin{cases} \Delta Y_t &= 3.7749 - 1.2878 Y_{t-1} + 1.2867 X_{t-1} + 0.1853 \Delta Y_{t-1} - 0.6750 \Delta X_{t-1} \\ \Delta X_t &= 1.67411 + 0.01565 Y_{t-1} - 0.01118 X_{t-1} + 0.08031 \Delta Y_{t-1} - 0.45887 \Delta X_{t-1} \end{cases} \]

We see that the coefficients of $Y_{t-1}$ and $X_{t-1}$ in $\Delta X_{t}$ equation are insignificant, while the ones in $\Delta Y_{t}$ are significant and are very similar in absolute value, which is inline with our true model. The insignificant coefficients are also inline with our true model.

The normalized cointegration relations ($\beta^\top$) as well as the representation with $\text{ECT}_{t-1}$ variables can be calculated with cajorls():

cajorls(VECM_Y_v2, r = 1)

## $rlm
## 
## Call:
## lm(formula = substitute(form1), data = data.mat)
## 
## Coefficients:
##           Y.d        X.d      
## ect1      -1.296161   0.003323
## constant   3.979508   1.975015
## Y.dl1      0.191262   0.089075
## X.dl1     -0.675996  -0.460283
## 
## 
## $beta
##            ect1
## Y.l1  1.0000000
## X.l1 -0.9968454

The long run equilibrium equation is given by the output under $beta. They are lagged here, but for interpretation of a long run equation we have to forward those equations by one period. So, we see that $\text{ECT}_{t-1} = 1\cdot Y_{t-1} - 0.9968454 \cdot X_{t-1}$, which is close to the results we obtained via the VECM() function. The estimated model is:

\[ \begin{cases} \Delta Y_t &= 3.979508 - 1.296161 \text{ECT}_{t-1} + 0.191262 \Delta Y_{t-1} - 0.675996 \Delta X_{t-1} \\ \Delta X_t &= 1.975015 + 0.003323 \text{ECT}_{t-1} + 0.089075 \Delta Y_{t-1} - 0.460283 \Delta X_{t-1} \\ \end{cases} \]

To see which coefficients are significant, we can use:

summary(cajorls(VECM_Y_v2, r = 1)$rlm)

## Response Y.d :
## 
## Call:
## lm(formula = Y.d ~ ect1 + constant + Y.dl1 + X.dl1 - 1, data = data.mat)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.4681 -1.6895 -0.2287  1.6501  7.4377 
## 
## Coefficients:
##          Estimate Std. Error t value Pr(>|t|)    
## ect1      -1.2962     0.1963  -6.604 2.37e-09 ***
## constant   3.9795     0.4198   9.479 2.36e-15 ***
## Y.dl1      0.1913     0.1310   1.460    0.148    
## X.dl1     -0.6760     0.1503  -4.498 1.96e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.691 on 94 degrees of freedom
## Multiple R-squared:  0.5272, Adjusted R-squared:  0.5071 
## F-statistic: 26.21 on 4 and 94 DF,  p-value: 1.319e-14
## 
## 
## Response X.d :
## 
## Call:
## lm(formula = X.d ~ ect1 + constant + Y.dl1 + X.dl1 - 1, data = data.mat)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.2327 -1.7272 -0.0016  1.6090  7.4487 
## 
## Coefficients:
##           Estimate Std. Error t value Pr(>|t|)    
## ect1      0.003323   0.199978   0.017  0.98678    
## constant  1.975015   0.427765   4.617 1.23e-05 ***
## Y.dl1     0.089075   0.133500   0.667  0.50626    
## X.dl1    -0.460283   0.153112  -3.006  0.00339 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.742 on 94 degrees of freedom
## Multiple R-squared:  0.3287, Adjusted R-squared:  0.3001 
## F-statistic: 11.51 on 4 and 94 DF,  p-value: 1.207e-07

We see that the second equation has an insignificant error correction term, which is desirable, as the error correction term must be negative.

We can also transform the $VECM$ into a $VAR$ representation by specifying the number of cointegrating relations:

vec2var(VECM_Y_v2, r = 1)

## 
## Coefficient matrix of lagged endogenous variables:
## 
## A1:
##          Y.l1      X.l1
## Y -0.10489995 0.6160762
## X  0.09239844 0.5364047
## 
## 
## A2:
##          Y.l2      X.l2
## Y -0.19126151 0.6759964
## X -0.08907538 0.4602827
## 
## 
## Coefficient matrix of deterministic regressor(s).
## 
##   constant
## Y 3.979508
## X 1.975015

The coefficients of $X_{t-1}$ and $X_{t-2}$ and the constants appear to be close to the true values.

If we look at the residuals:

plot.ts(cajorls(VECM_Y_v2, r = 1)$rlm$residuals)

Acf(cajorls(VECM_Y_v2, r = 1)$rlm$residuals)

We see that the residuals appear to be white noise - they are neither autocorrelated, nor serially correlated.

Dropping Insignificant variables from a VECM

Once you have the cointegrating vectors produced by function ca.jo() in urca and have determined the lag order, you can run the estimation by OLS where you construct the right-hand side variables manually. For example, let’s extract all the data, which was used to estimate the models:

temp_data <- cajorls(VECM_Y_v2, r = 1)$rlm$model
temp_data <- data.frame(temp_data$`z@Z0`, temp_data[, -1])
colnames(temp_data) <- c("dY", "dX", "ECT1", "const", "dY.L1", "dX.L1")

and estimate the first equation via OLS:

dY_ols <- lm(dY ~ -1 + const + ECT1 + dY.L1 + dX.L1, data = temp_data)
round(summary(dY_ols)$coefficients, 5)

##       Estimate Std. Error  t value Pr(>|t|)
## const  3.97951    0.41984  9.47852  0.00000
## ECT1  -1.29616    0.19628 -6.60379  0.00000
## dY.L1  0.19126    0.13103  1.45970  0.14771
## dX.L1 -0.67600    0.15028 -4.49832  0.00002

and do the same with the second one:

dX_ols <- lm(dX ~ -1 + const + ECT1 + dY.L1 + dX.L1, data = temp_data)
round(summary(dX_ols)$coefficients, 5)

##       Estimate Std. Error  t value Pr(>|t|)
## const  1.97501    0.42776  4.61706  0.00001
## ECT1   0.00332    0.19998  0.01662  0.98678
## dY.L1  0.08908    0.13350  0.66723  0.50626
## dX.L1 -0.46028    0.15311 -3.00618  0.00339

We see that the results are identical to the $VECM$ ones in cajorls()!

Alternatively we do not need to use the data itself but we need to construct the new variable(s) for each cointegration relation:

beta_vec_est <- cajorls(VECM_Y_v2, r = 1)$beta
coint_r_1 <- Y[, 1] * beta_vec_est[1] + Y[, 2] * beta_vec_est[2]
new_data <- ts(cbind(Y, coint_r_1))

and estimate the models as before, but his time using dynlm:

suppressPackageStartupMessages({
  suppressWarnings({
    library(dynlm)
  })
})

dY_ols <- dynlm(d(Y) ~ 1 + L(coint_r_1, 1) + L(d(Y), 1) + L(d(X), 1), data = new_data)
round(summary(dY_ols)$coefficients, 5)

##                 Estimate Std. Error  t value Pr(>|t|)
## (Intercept)      3.97951    0.41984  9.47852  0.00000
## L(coint_r_1, 1) -1.29616    0.19628 -6.60379  0.00000
## L(d(Y), 1)       0.19126    0.13103  1.45970  0.14771
## L(d(X), 1)      -0.67600    0.15028 -4.49832  0.00002

dX_ols <- dynlm(d(X) ~ 1 + L(coint_r_1, 1) + L(d(Y), 1) + L(d(X), 1), data = new_data)
round(summary(dX_ols)$coefficients, 5)

##                 Estimate Std. Error  t value Pr(>|t|)
## (Intercept)      1.97501    0.42776  4.61706  0.00001
## L(coint_r_1, 1)  0.00332    0.19998  0.01662  0.98678
## L(d(Y), 1)       0.08908    0.13350  0.66723  0.50626
## L(d(X), 1)      -0.46028    0.15311 -3.00618  0.00339

The results are identical. Now, if we want to remove the insignificant variables (we start by removing the variables with the largest p-values from each equation and examine if there are any insignificant variables left; if there are - we repeat the process), we are left with the following equations:

dY_ols <- dynlm(d(Y) ~ 1 + L(coint_r_1, 1) + L(d(X), 1), data = new_data)
round(summary(dY_ols)$coefficients, 5)

##                 Estimate Std. Error  t value Pr(>|t|)
## (Intercept)      3.72570    0.38442  9.69186        0
## L(coint_r_1, 1) -1.08730    0.13515 -8.04526        0
## L(d(X), 1)      -0.50945    0.09838 -5.17823        0

dX_ols <- dynlm(d(X) ~ 1 + L(d(X), 1), data = new_data)
round(summary(dX_ols)$coefficients, 5)

##             Estimate Std. Error  t value Pr(>|t|)
## (Intercept)  2.02974    0.30856  6.57812    0e+00
## L(d(X), 1)  -0.40538    0.09425 -4.30100    4e-05

which we can write in a matrix notation (we also use the evaluated normalized cointegration relation equation to transform from $\text{ECT}_{t-1}$ to $\beta^\top \mathbf{Y}_{t-1}$):

beta_vec_est

##            ect1
## Y.l1  1.0000000
## X.l1 -0.9968454

#The PI matrix:
PI_est = cbind(c(dY_ols$coefficients["L(coint_r_1, 1)"], 0)) %*% t(beta_vec_est)
PI_est

##                      Y.l1     X.l1
## L(coint_r_1, 1) -1.087297 1.083867
##                  0.000000 0.000000

\[ \begin{aligned} \begin{bmatrix} \Delta Y_t \\ \Delta X_t \end{bmatrix} &= \begin{bmatrix} 3.72570 \\ 2.02974 \end{bmatrix} + \begin{bmatrix} -1.08730\\ 0 \end{bmatrix} \text{ECT}_{t-1} + \begin{bmatrix} 0 & -0.50945\\ 0 & -0.40538 \end{bmatrix} \begin{bmatrix} \Delta Y_{t-1}\\ \Delta X_{t-1} \end{bmatrix} \\\\ &= \begin{bmatrix} 3.72570 \\ 2.02974 \end{bmatrix} + \begin{bmatrix} -1.08730\\ 0 \end{bmatrix} \begin{bmatrix} 1 & -0.9968454 \end{bmatrix} \begin{bmatrix} Y_{t-1}\\ X_{t-1} \end{bmatrix} + \begin{bmatrix} 0 & -0.50945\\ 0 & -0.40538 \end{bmatrix} \begin{bmatrix} \Delta Y_{t-1}\\ \Delta X_{t-1} \end{bmatrix} \\ \\ &= \begin{bmatrix} 3.72570 \\ 2.02974 \end{bmatrix} + \begin{bmatrix} -1.08730 & 1.08387\\ 0 & 0 \end{bmatrix} \begin{bmatrix} Y_{t-1}\\ X_{t-1} \end{bmatrix} + \begin{bmatrix} 0 & -0.50945\\ 0 & -0.40538 \end{bmatrix} \begin{bmatrix} \Delta Y_{t-1}\\ \Delta X_{t-1} \end{bmatrix} \end{aligned} \]

For completeness sake, let’s assume that we also want to transform the model back to a $VAR$ representation. Then, following the definition of $VECM$, we get:

Theta_2_est = (-1) * cbind(c(0, 0),
                         c(dY_ols$coefficients["L(d(X), 1)"],
                           dX_ols$coefficients["L(d(X), 1)"]))

Theta_1_est = PI_est + diag(c(1, 1)) - Theta_2_est

print(Theta_1_est)

##                        Y.l1      X.l1
## L(coint_r_1, 1) -0.08729743 0.5744168
##                  0.00000000 0.5946174

print(Theta_2_est)

##            [,1]      [,2]
## L(d(X), 1)    0 0.5094507
## L(d(X), 1)    0 0.4053826

\[ \begin{aligned} \begin{bmatrix} Y_t \\ X_t \end{bmatrix} &= \begin{bmatrix} 3.72570 \\ 2.02974 \end{bmatrix} + \begin{bmatrix} -0.08729743 & 0.5744168\\ 0 & 0.5946174 \end{bmatrix} \begin{bmatrix} Y_{t-1}\\ X_{t-1} \end{bmatrix} + \begin{bmatrix} 0 & 0.5094507\\ 0 & 0.4053826 \end{bmatrix} \begin{bmatrix} Y_{t-2}\\ X_{t-2} \end{bmatrix} \end{aligned} \]

We see that the estimated coefficients are similar to the true model parameter values.

Increasing the sample size

To prove that the estimated steps are correct, it is always recommended to increase the sample size of the simulation, and examine if the model estimates approach the true parameter values. We will only provide one more summarized example, with a sample size of $T = 10000$. we will not repeat each step, but only provide the main results of the $VAR$ model. The previously presented tests and estimation methods should also be applied to this larger sample, to examine any differences, or similarities, with the smaller sample case:

Data generation:

set.seed(123)

nsample = 10000

alpha = c(3, 2)
Theta_1 = matrix(c(0, 0.5, 0, 0.5), ncol = 2, byrow = TRUE)
Theta_2 = matrix(c(0, 0.5, 0, 0.5), ncol = 2, byrow = TRUE)

Sigma = matrix(c(8, 6, 6, 9), ncol = 2, byrow = TRUE)

e = MASS::mvrnorm(n = nsample, mu = c(0, 0), Sigma = Sigma)

Y <- matrix(data = 0, nrow = nsample, ncol = 2)
colnames(Y) <- c("Y", "X")

Y[1, ] = alpha + e[1, ]
Y[2, ] = alpha + Theta_1 %*% Y[1, ] + e[2, ]
for(j in 3:nsample){
  Y[j, ] = alpha + Theta_1 %*% Y[j-1, ] + Theta_2 %*% Y[j-2, ] + e[j, ]
}

VECM_Y_large_smpl <- ca.jo(Y, K = 2, spec = "transitory")

VECM estimation via cajorls():

cajorls(VECM_Y_large_smpl, r = 1)

## $rlm
## 
## Call:
## lm(formula = substitute(form1), data = data.mat)
## 
## Coefficients:
##           Y.d         X.d       
## ect1      -0.9916998   0.0003638
## constant   2.9658757   1.9978561
## Y.dl1     -0.0048090  -0.0002057
## X.dl1     -0.5079258  -0.5108232
## 
## 
## $beta
##           ect1
## Y.l1  1.000000
## X.l1 -1.000007

#summary(cajorls(VECM_Y_large_smpl, r = 1)$rlm)

Dropping insignificant variables:

Initial models re-estimated via dynlm:

beta_vec_est <- cajorls(VECM_Y_large_smpl, r = 1)$beta
coint_r_1 <- Y[, 1] * beta_vec_est[1] + Y[, 2] * beta_vec_est[2]
new_data <- ts(cbind(Y, coint_r_1))

dY_ols <- dynlm(d(Y) ~ 1 + L(coint_r_1, 1) + L(d(Y), 1) + L(d(X), 1), data = new_data)
round(summary(dY_ols)$coefficients, 5)

##                 Estimate Std. Error   t value Pr(>|t|)
## (Intercept)      2.96588    0.03610  82.16757  0.00000
## L(coint_r_1, 1) -0.99170    0.01796 -55.20926  0.00000
## L(d(Y), 1)      -0.00481    0.01290  -0.37293  0.70921
## L(d(X), 1)      -0.50793    0.01402 -36.22036  0.00000

dX_ols <- dynlm(d(X) ~ 1 + L(coint_r_1, 1) + L(d(Y), 1) + L(d(X), 1), data = new_data)
round(summary(dX_ols)$coefficients, 5)

##                 Estimate Std. Error   t value Pr(>|t|)
## (Intercept)      1.99786    0.03845  51.96330  0.00000
## L(coint_r_1, 1)  0.00036    0.01913   0.01902  0.98483
## L(d(Y), 1)      -0.00021    0.01374  -0.01498  0.98805
## L(d(X), 1)      -0.51082    0.01494 -34.19859  0.00000

After removing the insignificant variables sequentially:

dY_ols <- dynlm(d(Y) ~ 1 + L(coint_r_1, 1) + L(d(X), 1), data = new_data)
round(summary(dY_ols)$coefficients, 5)

##                 Estimate Std. Error   t value Pr(>|t|)
## (Intercept)      2.96915    0.03501  84.81625        0
## L(coint_r_1, 1) -0.99605    0.01366 -72.93637        0
## L(d(X), 1)      -0.51201    0.00876 -58.47284        0

dX_ols <- dynlm(d(X) ~ 1 + L(d(X), 1), data = new_data)
round(summary(dX_ols)$coefficients, 5)

##             Estimate Std. Error   t value Pr(>|t|)
## (Intercept)  1.99823    0.03212  62.22025        0
## L(d(X), 1)  -0.51104    0.00860 -59.43911        0

Calculating $\widehat{\Pi}$:

PI_est = cbind(c(dY_ols$coefficients["L(coint_r_1, 1)"], 0)) %*% t(beta_vec_est)
PI_est

##                       Y.l1      X.l1
## L(coint_r_1, 1) -0.9960511 0.9960582
##                  0.0000000 0.0000000

Extracting $\widehat{\Gamma}_1$:

Gamma_est = cbind(c(0, 0),
                  c(dY_ols$coefficients["L(d(X), 1)"],
                    dX_ols$coefficients["L(d(X), 1)"]))
Gamma_est

##            [,1]       [,2]
## L(d(X), 1)    0 -0.5120105
## L(d(X), 1)    0 -0.5110421

Extracting the constant term:

cbind(c(dY_ols$coefficients["(Intercept)"], dX_ols$coefficients["(Intercept)"]))

##                 [,1]
## (Intercept) 2.969154
## (Intercept) 1.998228

Transforming $VECM$ form to $VAR$ representation (only the matrices are calculated):

Theta_2_est = (-1) * Gamma_est

Theta_1_est = PI_est + diag(c(1, 1)) - Theta_2_est

print(Theta_1_est)

##                        Y.l1      X.l1
## L(coint_r_1, 1) 0.003948924 0.4840477
##                 0.000000000 0.4889579

print(Theta_2_est)

##            [,1]      [,2]
## L(d(X), 1)    0 0.5120105
## L(d(X), 1)    0 0.5110421

We see that $\widehat{\Theta}_1$ and $\widehat{\Theta}_2$ are much closer to the true values with the larger sample size. The element in (1,1) in $\widehat{\Theta}_1$ is $(\widehat{\Theta}_1)_{1,1} = 0.003948924$ and it is much closer to zero (i.e. the true value), compared to the estimated value in the smaller sample size, where we got $(\widehat{\Theta}_1)_{1,1} = -0.08729743$.

VAR & VECM: Key Points

General outline for Multivariate Time Series VAR model building

VAR(2) - stationary example

Testing for unit root

Selecting the VAR model order:

Johansens procedure for stationary processes

VAR estimation and output

VAR fit plots

Impulse-Response Function

VAR model forecasts

VAR(2) with unit roots and no cointegration

Testing for unit root

Johansens procedure for non-cointegrated processes

Phillips-Ouliaris cointegration test

VAR model for differenced data:

VAR fit plots

VAR forecast plots

VAR(2) with unit roots and cointegration

Testing for unit root

Johansens procedure

VECM estimation

Estimation using tsDyn package

Estimation using urca package

Dropping Insignificant variables from a VECM

Increasing the sample size

Estimation using `tsDyn` package

Estimation using `urca` package