Task 06: VAR & VECM¶

Granger Causality¶

We say that $X$ Grager causes $Y$, if the past values of $X$ have explanatory power for current values of $Y$.

To test this, We will generate $X$ and $Y$ in the following way:
$$ \begin{cases} X_t = 0.2 X_{t-1} + w_{t},\ w_t \sim \mathcal{N}(0,1) \\ Y_t = 0.3 Y_{t-1} - 0.5 Y_{t-2} + 0.2 Y_{t-3} - 0.2 X_{t-1} - 0.3 X_{t-2} - 0.1 X_{t-3} + \epsilon_t,\ \epsilon_t \sim \mathcal{N}(0,1) \end{cases} $$

#Import the required modules for data generation
import numpy as np
#Import the required modules for plot creation:
import matplotlib.pyplot as plt
#Import the required modules for DataFrame creation:
import pandas as pd

#Import the required modules for TimeSeries data generation:
import statsmodels.api as sm
#Import the required modules for test statistic calculation:
import statsmodels.stats as sm_stat
#Import the required modules for model estimation:
import statsmodels.tsa as smt
#Import the required modules for optimization:
import scipy as scipy
#We also need additional data:
import statsmodels.formula.api as smf

We will simulate for a sample size of 500:

np.random.seed(1)

N = 500

X = np.zeros(N)
Y = np.zeros(N)

w = np.random.normal(loc = 0.0, scale = 1.0, size = N) 
e = np.random.normal(loc = 0.0, scale = 1.0, size = N)

Generate $X$:

X[0] = w[0]
for j in range(1, N):
    X[j] = 0.2 * X[j - 1] + w[j]

Generate $Y$:

Y[0] = e[0]
Y[1] = 0.3 * Y[0] - 0.2 * X[0] + e[1]
Y[2] = 0.3 * Y[1] - 0.5 * Y[0] - 0.2 * X[1] - 0.3 * X[0] + e[2]
for j in range(3, N):
    Y[j] = 0.3 * Y[j - 1] - 0.5 * Y[j - 2] + 0.2 * Y[j - 3] - 0.2 * X[j - 1] - 0.3 * X[j - 2] - 0.1 * X[j - 3] + e[j]

DT = pd.DataFrame([X, Y], index = ["X", "Y"]).T
DT.plot(subplots=True, layout=(2, 1))
plt.show()

The Unrestricted model¶

We will begin by estimating an unrestricted model:

$$Y_t = \alpha + \delta t + \phi_1 Y_{t-1} + ... + \phi_p Y_{t-p} + \beta_1 X_{t-1} + ... + \beta_q X_{t-q} + \epsilon_t$$

We have already learned of the sequential procedure to choose $p$ and $q$ in the model. On the other hand, in the Granger causality case, this is not the question of the first importance and so, we will assume p = q.

#Lag operator:
def L(x, n = 1):
    if n == 0:
        return x
    if isinstance(x, pd.Series):
        return x.shift(n) 
    else:
        x = pd.Series(x)
        return x.shift(n)

#Difference operator:
def d(x):
    if isinstance(x, pd.Series):
        return x.diff() 
    else:
        x = pd.Series(x)
        return x.diff() 

#Trend/time operator:
def time(x):
    return list(range(1, len(x) + 1))

We will also define a funtion to extract the coefficient table:

#Return the model coefficients of a `smf.ols(...).fit()` object:
def get_coef(mdl_fit):
    result = np.round(pd.DataFrame([mdl_fit.params, mdl_fit.tvalues, mdl_fit.pvalues], index = ["coef", "t-value", "p-value"]).T, 5)
    return(result)

We will begin by setting $p=q=4$:

mod_UR = smf.ols(formula = 'Y ~ 1 + time(Y) + L(Y, 1) + L(Y, 2) + L(Y, 3) + L(Y, 4) + L(X, 1) + L(X, 2) + L(X, 3) + L(X, 4)', data = DT).fit()
get_coef(mod_UR)

We then remove L(Y, 4) and check for insignificant coefficients.

Then, we remove time(Y) (because it is insignificant).

While we know that there is no intercept in our data generating pricess (i.e. DGP), it is worth also checking the mean of the series:

print("Series means are:\n", DT.mean(axis = 0))
print("Series variances are:\n", DT.var(axis = 0))

Series means are:
 X    0.066959
Y   -0.016831
dtype: float64
Series variances are:
 X    1.011222
Y    1.569013
dtype: float64

So, it appears that the series has a zero-mean. We can verify this via a t-test where the null hypothesis are $H_0: \mathbb{E}(Y_t) = 0$ and $H_0: \mathbb{E}(X_t) = 0$

print(scipy.stats.ttest_1samp(DT["X"], popmean = 0))
print(scipy.stats.ttest_1samp(DT["Y"], popmean = 0))

Ttest_1sampResult(statistic=1.488923579649989, pvalue=0.13713920371635088)
Ttest_1sampResult(statistic=-0.30045795214060694, pvalue=0.7639530000751562)

So we do not reject the null hypothesis in either case since the p-value > 0.05 for both series. Because of this, and the fact that Intercept coefficient is also insignificant after removing the previous two coefficients - we remove the Intercept as well:

mod_UR = smf.ols(formula = 'Y ~ -1 + L(Y, 1) + L(Y, 2) + L(Y, 3) + L(X, 1) + L(X, 2) + L(X, 3) + L(X, 4)', data = DT).fit()
get_coef(mod_UR)

The Restricted Model¶

Because we want to test, if $X_{t-j},\ j > 0$ improve the model of $Y_t$, we need to estimate a restricted model, where the restriction is that $\beta_1 = ... = \beta_q = 0$ :

$$Y_t = \alpha + \delta t + \phi_1 Y_{t-1} + ... + \phi_p Y_{t-p} + \epsilon_t$$

i.e. the coefficients of $X$ Series are insignificant.

mod_R = smf.ols(formula = 'Y ~ -1 + L(Y, 1) + L(Y, 2) + L(Y, 3)', data = DT).fit()
get_coef(mod_R)

Note that we are not attempting to remove or add any coefficients to the Restricted model, as we are already imposing a restriction on $\beta_j$ coefficients, and we want to compare the $UR$ and $R$ models to see if those specific restrictions are valid.

Model Comparison¶

To compare these two models, we will compare them by their SSR (sum of squared residuals, also known as residual sum of squares (RSS), or sum of squared errors of prediction (SSE)). We can do this by carrying out an ANOVA test.

sm_stat.api.anova_lm(mod_R, mod_UR)

c:\users\andrius\appdata\local\programs\python\python38\lib\site-packages\scipy\stats\_distn_infrastructure.py:903: RuntimeWarning: invalid value encountered in greater
  return (a < x) & (x < b)
c:\users\andrius\appdata\local\programs\python\python38\lib\site-packages\scipy\stats\_distn_infrastructure.py:903: RuntimeWarning: invalid value encountered in less
  return (a < x) & (x < b)
c:\users\andrius\appdata\local\programs\python\python38\lib\site-packages\scipy\stats\_distn_infrastructure.py:1912: RuntimeWarning: invalid value encountered in less_equal
  cond2 = cond0 & (x <= _a)

Since p-value < 0.05, we reject the null hypothesis $H_0: SSR_R = SSR_{UR}$. This means that the models are not identical. To check, which model is better, we can look at the $SSR$ and the $AIC$, or $BIC$ values:

pd.DataFrame([mod_R.ssr, mod_UR.ssr], index = ["Restricted", "Unrestricted"], columns = ["SSR"]).T

pd.DataFrame([mod_R.aic, mod_UR.aic], index = ["Restricted", "Unrestricted"], columns = ["AIC"]).T

pd.DataFrame([mod_R.bic, mod_UR.bic], index = ["Restricted", "Unrestricted"], columns = ["BIC"]).T

We see that in all cases the $UR$ model is better: it has a lower SSR (i.e. the model errors are smaller) and a lower $AIC$ and $BIC$ value compares to the restricted model.

So, we can conclute that $X$ Granger causes $Y$.

Now we are interested if this is the case the other way around. Remembering that we set $p = q$, we are left with the following $UR$ model for $X$:

X_UR = smf.ols(formula = 'X ~ -1 + L(Y, 1) + L(X, 1)', data = DT).fit()
get_coef(X_UR)

The restricted model:

X_R = smf.ols(formula = 'X ~ -1 + L(X, 1)', data = DT).fit()
get_coef(X_R)

Now, we carry out the ANOVA test:

sm_stat.api.anova_lm(X_R, X_UR)

Since p-value > 0.05, we do not reject the null hypothesis $H_0: SSR_R = SSR_{UR}$.

This means that the errors of the two models are simialr in magnitude. In other words, including the additional coefficients from the lags of $Y$ series does not improve our model for $X$, as our $SSR$ is not significantly different from the model $SSR$ without the lags of $Y$.

Note that the errors themselves are not identical, i.e. $e_{t, UR} \neq e_{t, R}$, the results say that overall their squared residual sums are not statistically significantly different.

pd.DataFrame([X_R.ssr, X_UR.ssr], index = ["Restricted", "Unrestricted"], columns = ["SSR"]).T

pd.DataFrame([X_R.aic, X_UR.aic], index = ["Restricted", "Unrestricted"], columns = ["AIC"]).T

pd.DataFrame([X_R.bic, X_UR.bic], index = ["Restricted", "Unrestricted"], columns = ["BIC"]).T

The $SSR$, $AIC$ and $BIC$ values are quite similar in both the $UR$ and $R$ cases.

We can conclude that $Y$ does not Granger cause $X$.

VAR: Estimation and Forecasting¶

For the sake of this example, assume that our VAR model is of the form:

$$ \begin{pmatrix}Y_{t} \\ X_{t} \end{pmatrix} = \begin{pmatrix} 3 \\ 0 \end{pmatrix} + \begin{pmatrix} 0.5 & 0.4 \\ -0.2 & 0.6 \end{pmatrix} \begin{pmatrix} Y_{t-1} \\ X_{t-1} \end{pmatrix} + \begin{pmatrix} -0.2 & -0.1 \\ 0.3 & 0.1 \end{pmatrix} \begin{pmatrix} Y_{t-2} \\ X_{t-2} \end{pmatrix} + \begin{pmatrix} \epsilon_{1,t} \\ \epsilon_{2,t} \end{pmatrix} ,\quad \begin{pmatrix} \epsilon_{1,t} \\ \epsilon_{2,t} \end{pmatrix} \sim \mathcal{N} \left( \begin{bmatrix} 0 \\ 0 \end{bmatrix} , \begin{bmatrix} 8 & 6 \\ 6 & 9 \end{bmatrix} \right) $$

i.e., we are analysing a $VAR(2)$ model with correlated residuals.

Specifying the parameters¶

nsample = 500

alpha = np.array([3, 0])
theta_1 = np.array([[0.5, 0.4], [-0.2, 0.6]])
theta_2 = np.array([[-0.2, -0.1], [0.3, 0.1]])

mu = [0, 0]
sigma = np.array([[8, 6],[6, 9]])

Generating the residuals¶

Note that the covariance matrix is always both symmetric and positive semi-definite. Equivalently, its eigenvalues are non-negative. We can verify this:

print("The eigenvalues: ", np.linalg.eigvals(sigma))
print("Are they non-negative? :", np.all(np.linalg.eigvals(sigma) >= 0))

The eigenvalues:  [ 2.47920271 14.52079729]
Are they non-negative? : True

We begin by generating our shocks, $\vec{\epsilon}_t$:

np.random.seed(1)
e = np.random.multivariate_normal(mu, sigma, size = nsample)

#Transform to DataFrame for easier ploting and stat calculation:
e_test = pd.DataFrame(e, columns = ["E1", "E2"])
e_test.plot(subplots=True, layout = (2, 1))
plt.show()

# The sample covariance:
e_test.cov()

# The sample mean:
e_test.mean()

E1   -0.156420
E2   -0.098235
dtype: float64

Generating the data¶

Now, we can generate the data:

Y = np.zeros(shape = (nsample, 2))
#The first 5 rows:
Y[:5, :]

array([[0., 0.],
       [0., 0.],
       [0., 0.],
       [0., 0.],
       [0., 0.]])

When $t = 1$, we have:

Y[0] = alpha + e[0]

When $t = 2$:

Y[1] = alpha + np.dot(theta_1, Y[0])  + e[1]

When $t > 2$:

for j in range(2, nsample):
    Y[j] = alpha + np.dot(theta_1, Y[j - 1])  + np.dot(theta_2, Y[j - 2]) + e[j]

The resulting series:

Y[:5, :]

array([[ -0.48231193,  -5.20715159],
       [  3.28200428,  -2.69068434],
       [  4.6157294 ,  -7.81673238],
       [ -1.82613492, -10.60194712],
       [ -2.82956403,  -6.55339816]])

Most of time series models in Python require that the pandas DataFrame object to have a valid date/time index, so we need to assign a date and frequency to the data - let's say that our data is daily, starting from 2014:

Y = pd.DataFrame(Y, columns = ["Y", "X"])
Y.index = pd.date_range(start = "2014-01-01", periods = len(Y.index), freq = "D").to_period()
Y.index = Y.index.to_timestamp()
Y.head()

Y.plot(subplots=True, layout = (2, 1))
plt.show()

Checking if a specific VAR model has a unit root¶

We also want to see, if the series is $TS$ or $DS$. To do this, we can check if: $$\det (I - \Theta_1 z - \Theta_2 z^2 ) = 0 \iff |z_j | > 1,\ \forall j$$ i.e. if all the roots are outside the unit complex circle, then the process is stationary.

Fortunately, we can very easily do this in Python using symbolic expressions:

from sympy import *

z = Symbol('z')
D = Matrix([[1, 0], [0, 1]]) - Matrix(theta_1) * z - Matrix(theta_2) * (z**2)

D

Next, we calculate the determinant:

D.det()

We can also multiply the expressions in parenthesis by converting the result into a into a polynomial object:

temp_poly = Poly(D.det(), z)
temp_poly

temp_roots = roots(D.det(), z)
temp_roots

{1.23662943597335: 1,
 18.7362676595998: 1,
 0.513551452213425 - 2.01301370614087*I: 1,
 0.513551452213425 + 2.01301370614087*I: 1}

The real solution is: $$ \begin{aligned} z_1 &= 1.23662943597335\\ z_2 &= 18.7362676595998 \end{aligned} $$ The complex solution is: $$ \begin{aligned} z_3 &= 0.513551452213425 + 2.01301370614087 \cdot i\\ z_4 &= 0.513551452213425 - 2.01301370614087 \cdot i \end{aligned} $$

The additional values are their multiplicities. Also, more root calculation examples using python.

To calculate the absolute values, we need to extract the roots:

temp_roots = [*temp_roots]
temp_roots

[1.23662943597335,
 18.7362676595998,
 0.513551452213425 - 2.01301370614087*I,
 0.513551452213425 + 2.01301370614087*I]

And use the Abs() function from sympy. We also use N(expr, 4) function to round the sympy results (which are a sympy object and not an ordinary value) to 4 decimal precision:

for j in temp_roots:
    print("z = ", N(j, 4), "=> |z| = ", N(Abs(j), 4))

z =  1.237 => |z| =  1.237
z =  18.74 => |z| =  18.74
z =  0.5135 - 2.013*I => |z| =  2.077
z =  0.5135 + 2.013*I => |z| =  2.077

So, all the roots are outside the unit circle, which means that this $VAR$ model is stationary.

Note that in general we do not know the underlying DGP, so we would need to perform the $ADF$ or $KPSS$ tests on each variable to verify whether they have a unit root or not.

Estimating a $VAR$ model¶

A VAR model can be stimated using the documented functions. It is worth reading the documentation as it not only provides examples on using the provided functions, but sometimes gives additional guidelines. For example:

The VAR class assumes that the passed time series are stationary.

Non-stationary or trending data can often be transformed to be stationary by first-differencing or some other method.

For direct analysis of non-stationary time series, a standard stable VAR(p) model is not appropriate.

Y_VAR = smt.api.VAR(Y)

Lag order selection¶

Choosing the lag order can be a difficult problem. Fortunately, we can use the automated procedure, which is based on calculating the information criteria for different lag orders:

Note: in case of errors like this one:

2-th leading minor of the array is not positive definite

make sure that the lag order is not too large (or the data sample too small). Also, this can happen if some roots $|z_j|$ are very close to 1 (but still greater than 1).

Y_VAR.select_order(10)

<statsmodels.tsa.vector_ar.var_model.LagOrderResults at 0x1991c627820>

We see that the recommended model is $VAR(2)$ according to $AIC$, $BIC$, $HQIC$ and $FPE$ information criteria.

Fitting the model¶

We will estimate a $VAR(2)$ model:

results = Y_VAR.fit(2)
results.summary()

  Summary of Regression Results   
==================================
Model:                         VAR
Method:                        OLS
Date:           Mon, 04, May, 2020
Time:                     23:50:18
--------------------------------------------------------------------
No. of Equations:         2.00000    BIC:                    3.61278
Nobs:                     498.000    HQIC:                   3.56141
Log likelihood:          -2281.79    FPE:                    34.0636
AIC:                      3.52823    Det(Omega_mle):         33.3897
--------------------------------------------------------------------
Results for equation Y
========================================================================
           coefficient       std. error           t-stat            prob
------------------------------------------------------------------------
const         2.694126         0.241271           11.166           0.000
L1.Y          0.589969         0.050790           11.616           0.000
L1.X          0.370707         0.046172            8.029           0.000
L2.Y         -0.255685         0.049378           -5.178           0.000
L2.X         -0.075897         0.047882           -1.585           0.113
========================================================================

Results for equation X
========================================================================
           coefficient       std. error           t-stat            prob
------------------------------------------------------------------------
const        -0.359460         0.268197           -1.340           0.180
L1.Y         -0.074297         0.056459           -1.316           0.188
L1.X          0.565941         0.051325           11.027           0.000
L2.Y          0.242305         0.054889            4.414           0.000
L2.X          0.099620         0.053225            1.872           0.061
========================================================================

Correlation matrix of residuals
            Y         X
Y    1.000000  0.698762
X    0.698762  1.000000

We can also examine the error covariance matrix:

results.resid.cov()

results.resid.mean()

Y   -5.243463e-16
X   -1.541382e-15
dtype: float64

The estimated model is: $$ \begin{cases} \hat{Y}_t &= 2.694126 + 0.589969 \hat{Y}_{t-1} + 0.370707 \hat{X}_{t-1} -0.255685 \hat{Y}_{t-2} -0.075897 \hat{X}_{t-2} \\ \hat{X}_t &= -0.359460 -0.074297 \hat{Y}_{t-1} + 0.565941 \hat{X}_{t-1} + 0.242305 \hat{Y}_{t-2} + 0.099620 \hat{X}_{t-2} \end{cases} $$

The model residual covariance matrix is:

$$ \hat{\Sigma}_\epsilon = \begin{bmatrix} 7.208196 & 5.598939 \\ 5.598939 & 8.906885 \end{bmatrix} $$

Compared to the true model:

$$\begin{pmatrix} Y_{t} \\ X_{t} \end{pmatrix} = \begin{pmatrix} 3 \\ 0 \end{pmatrix} + \begin{pmatrix} 0.5 & 0.4 \\ -0.2 & 0.6 \end{pmatrix} \begin{pmatrix} Y_{t-1} \\ X_{t-1} \end{pmatrix} + \begin{pmatrix} -0.2 & -0.1 \\ 0.3 & 0.1 \end{pmatrix} \begin{pmatrix} Y_{t-2} \\ X_{t-2} \end{pmatrix} + \begin{pmatrix} \epsilon_{1,t} \\ \epsilon_{2,t} \end{pmatrix} ,\quad \begin{pmatrix} \epsilon_{1,t} \\ \epsilon_{2,t} \end{pmatrix} \sim \mathcal{N} \left( \begin{bmatrix} 0 \\ 0 \end{bmatrix} , \begin{bmatrix} 8 & 6 \\ 6 & 9 \end{bmatrix} \right) $$

We see that, for the most part, the estimated parameters are close to the true parameter values. The same can be said for the error covariance matrix.

Impulse-response functions¶

From the estimated VAR model, we can plot the impulse response functions of the endogenous variables.

Impulse responses are of interest in econometric studies: they are the estimated responses to a unit impulse in one of the variables. They are computed in practice using the $MA(\infty)$ representation of the $VAR(p$) process: $$ \vec{Y}_t = \vec{\mu} + \sum_{i=0}^\infty \Phi_i \vec{\epsilon}_{t-i}, \quad \vec{\epsilon}_t \sim \mathcal{N}(\vec{0}, \hat{\Sigma}_\epsilon) $$ Orthogonalization is done using the Cholesky decomposition of the estimated error covariance matrix $\hat{\Sigma}_\epsilon$ and hence interpretations may change depending on variable ordering.

irf = results.irf(15)

We can compare the standard deviation shock to the non-orthogonalized residuals:

irf.plot(orth = False)

To the unit shocks to the orthogonalized residuals:

irf.plot(orth = True)

Note that the main difference is the magnitude of the impulses and impulse of $Y$ and the response of $Y$ and $X$.

The cumulative effects $\Psi_n = \sum_{i=0}^n \Phi_i$ (i.e. where each period is the sum of the current and previous period response values) can be plotted with the long run effects:

irf.plot_cum_effects(orth = False)

VAR model Forecasts¶

Finally, we cant calculate the forecasts:

results.plot_forecast(10)
plt.xlim(xmin = 425) # we can subset the last variable forecast plot

(425, 534.45)

We can also extract the forecast and confidence intervals manually:

forecasts = results.forecast_interval(Y.values, 10)
forecasts

(array([[5.80189502, 2.14011442],
        [4.57488068, 2.66405364],
        [4.73485891, 2.42735991],
        [5.01546022, 2.03640298],
        [5.01313619, 1.80948079],
        [4.88557064, 1.71027289],
        [4.79135097, 1.64043578],
        [4.75002143, 1.56711957],
        [4.72785047, 1.49891059],
        [4.70561661, 1.44463752]]), array([[ 0.51846318, -3.73296331],
        [-2.62616573, -3.95950019],
        [-2.83167012, -4.82014334],
        [-2.63088704, -5.78386569],
        [-2.68378843, -6.38156632],
        [-2.86824841, -6.687699  ],
        [-3.00690802, -6.87909669],
        [-3.07456707, -7.03115151],
        [-3.11193269, -7.15204616],
        [-3.14381523, -7.2409047 ]]), array([[11.08532685,  8.01319216],
        [11.77592709,  9.28760748],
        [12.30138794,  9.67486317],
        [12.66180748,  9.85667165],
        [12.71006081, 10.00052791],
        [12.6393897 , 10.10824478],
        [12.58960997, 10.15996824],
        [12.57460992, 10.16539065],
        [12.56763362, 10.14986735],
        [12.55504845, 10.13017973]]))

#mean forecasts:
forc  = forecasts[0]
#lower confidence bound:
l_bound = forecasts[1]
#upper confidence bound:
u_bound = forecasts[2]

#Combine into a single one:
tmp_forc = np.concatenate((forecasts), axis=1)

#Create an empty data frame:
Y_forc = np.empty((len(Y.index), len(Y.columns)*3))
Y_forc[:] = np.nan

Y_forc = pd.DataFrame(np.append(Y_forc, tmp_forc, axis = 0), columns = ["Y_forc", "X_forc", "Y_forc_lower", "X_forc_lower", "Y_forc_upp", "X_forc_upp"])
Y_forc.index = pd.date_range(start = "2014-01-01", periods = len(Y_forc.index), freq = "D").to_period()
Y_forc.index = Y_forc.index.to_timestamp()

Y_forc.tail(11)

#Additionally, include the last historical data for nicer looking plots:
#We take the last row of our historical DataFrame
# and assign those values to the same row (using the date index) of our forecast DataFrame:
Y_forc.loc[Y.tail(1).index, ["Y_forc", "X_forc"]] = Y.loc[Y.tail(1).index].values

Y_forc.tail(11)

#Subplot layout = 2 rows and 1 column - we only need one index
f, axarr = plt.subplots(2, 1) # if we had layout = 2 rows and 2 columns - we would need to use [i, j] indexes to access different subplots
#Size of the figure output: - also works with   plt.subplots(2, 1, figsize = (10, 5))
f.set_figheight(5)
f.set_figwidth(10)

#Plot the first series:
axarr[0].plot(Y["Y"], "g-", label = "Y")
axarr[0].plot(Y_forc[["Y_forc"]], "b--", label = "Forecast")
axarr[0].plot(Y_forc[["Y_forc_lower"]], "r:", label = "Forc 0.95 STD err")
axarr[0].plot(Y_forc[["Y_forc_upp"]], "r:")
#Change the range to show the historic values from 2015 January:
axarr[0].set_xlim(xmin = pd.to_datetime("2015-03-01"))
#Add a legend:
axarr[0].legend()

#Plot the second series:
axarr[1].plot(Y["X"], "c-", label = "X")
axarr[1].plot(Y_forc[["X_forc"]], "b--", label = "Forecast")
axarr[1].plot(Y_forc[["X_forc_lower"]], "r:", label = "Forc 0.95 STD err")
axarr[1].plot(Y_forc[["X_forc_upp"]], "r:")
#Change the range to show the historic values from 2015 January:
axarr[1].set_xlim(xmin = pd.to_datetime("2015-03-01"))
#Add a legend:
axarr[1].legend()

plt.tight_layout()

VAR model residual diagnostics¶

Finally, we can examine the residuals of our model:

results.resid.plot(subplots = True)
plt.show()

fig = plt.figure(figsize = (10, 5))
results.resid["Y"].plot(ax = fig.add_subplot(211))
plt.title("VAR model residuals of Y")
sm.graphics.tsa.plot_acf(results.resid["Y"], lags = 20, zero = False, ax = fig.add_subplot(223))
sm.graphics.tsa.plot_pacf(results.resid["Y"], lags = 20, zero = False, ax = fig.add_subplot(224))
plt.tight_layout()
plt.show()

fig = plt.figure(figsize = (10, 5))
results.resid["X"].plot(ax = fig.add_subplot(211))
plt.title("VAR model residuals of X")
sm.graphics.tsa.plot_acf(results.resid["X"], lags = 20, zero = False, ax = fig.add_subplot(223))
sm.graphics.tsa.plot_pacf(results.resid["X"], lags = 20, zero = False, ax = fig.add_subplot(224))
plt.tight_layout()
plt.show()

From the $ACF$ and $PACF$ plots we can say that the residuals are white noise. We can conslude that the specified model is acceptible for our data sample.

VECM¶

In general Error Correction Models (ECM's) are useful for estimating both the short-term and the long-term effects of either a univariate of multivariate time series. The error-correction term relates to the fact that the last-period's deviation from the long-run equilibrium, the error, influences its short-run dynamics. So ECMS's directly estimate the speed of which a dependent variable returns to equilibrium after a change in other variables.

In the multivariate case, a VECM is just a representation fo a cointegrated VAR. This means that we can specify a cointegrated VAR in the form of a VECM and vice versa. The VECM representation also impses restrictions of the coefficients, which are based on the number of cointegrating relationships. These restrictions lead to more efficient coefficient estimates of the VECM, compared to the VAR representation (which does not impose those restrictions).

VECM Example (1)¶

We will examine the following VAR(1) model:

$$ \begin{pmatrix} Y_{1,t} \\ Y_{2, t} \end{pmatrix} = \begin{pmatrix} 1 \\ 2 \end{pmatrix} + \begin{pmatrix} 0.5 & -1 \\ -0.25 & 0.5 \end{pmatrix} \begin{pmatrix} Y_{1, t-1} \\ Y_{2, t-1} \end{pmatrix} + \begin{pmatrix} \epsilon_{1,t} \\ \epsilon_{2,t} \end{pmatrix}, \quad \begin{pmatrix} \epsilon_{1,t} \\ \epsilon_{2,t} \end{pmatrix} \sim \mathcal{N} \left( \begin{pmatrix} 0 \\ 0 \end{pmatrix} , \begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix} \right) $$

nsample = 500

alpha = np.array([1, 2])
theta_1 = np.array([[0.5, -1], [-0.25, 0.5]])

mu = [0, 0]
sigma = np.array([[1, 0],[0, 1]])

We begin by checking the stability (which implies stationarity) of the model:

z = Symbol('z')
D = Matrix([[1, 0], [0, 1]]) - Matrix(theta_1) * z

temp_roots = roots(D.det(), z)
temp_roots = [*temp_roots]

for j in temp_roots:
    print("z = ", N(j, 4), "=> |z| = ", N(Abs(j), 4))

z =  1.000 => |z| =  1.000

We see that this VAR model has a unit root and thus is not stationary.

Intuition about the cointegration relations¶

Next, we need to estimate the number of cointegrating relations in the process. If there are none - we differentiate the data and estiamte a VAR model on the (stationary) differences. Otherwise, we need to include an error correction term and work with a VECM.

We subtract $(Y_{1,t-1},\ Y_{2,t-1})^\top$ from both sides of the the VAR model to get:

$$ \begin{pmatrix} \Delta Y_{1,t} \\ \Delta Y_{2, t} \end{pmatrix} = \begin{pmatrix} 1 \\ 2 \end{pmatrix} + \begin{pmatrix} -0.5 & -1 \\ -0.25 & -0.5 \end{pmatrix} \begin{pmatrix} Y_{1, t-1} \\ Y_{2, t-1} \end{pmatrix} + \begin{pmatrix} \epsilon_{1,t} \\ \epsilon_{2,t} \end{pmatrix} $$

where

$$\Pi = \Theta_1 - \mathbf{I} = \begin{pmatrix} 0.5 & -1 \\ -0.25 & 0.5 \end{pmatrix} - \begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix} = \begin{pmatrix} -0.5 & -1 \\ -0.25 & -0.5 \end{pmatrix}$$

PI = Matrix(theta_1) - Matrix([[1, 0], [0, 1]])
PI

The rank of $\Pi$ matrix indicates the number of cointegrating relations:

PI.rank()

1

The VAR model has one cointegrating relation. Then, $\Pi$ can be written as $\Pi_{2 \times 2} = \alpha_{2 \times 1} \beta^\top _{1 \times 2}$.

In general if we have a $d$ -dimensional VAR (so that $\Pi$ is $d \times d$) with $r$ linearly independent cointegrating relations (so that $rank(\Pi) = r$), then: $\Pi_{d\times d} = \alpha_{d \times r} \beta^\top _{r \times d}$. In this case the rows of $\beta$ are cointegration vectors. The matrix $\beta^\top$ is therefore called the cointegrating matrix, and the matrix $\alpha$ is sometimes referred to as the loading matrix.

Note that since cointegration relationships are not unique, $\beta$ is not unique. As a matter of convention, we typically normalize the coefficients of $\beta$ such that one of the elements of $\beta$ is one (when $r = 1$, otherwise we need to normalize in such a way that $\beta = ( \mathbf{I}_{r \times r}, \beta_{(d-r) \times r} )^\top$).

So, a solution for $\Pi = \alpha \beta^\top$ is:

$$ \begin{pmatrix} -0.5 & -1 \\ -0.25 & -0.5 \end{pmatrix} = \begin{pmatrix} \alpha_1 \\ \alpha_2 \end{pmatrix} \begin{pmatrix} 1 & \beta_2 \end{pmatrix} $$

Because of the restriction of a normalization on $\beta$, we have: $\alpha_1 = -0.5, \alpha_2 = -0.25 \Rightarrow \beta_2 = 2$. And we get $\alpha = (-0.5, -0.25)^\top$ and $\beta^\top = (1, 2)$.

We can substitute these values in the VECM to get:

$$ \begin{pmatrix} \Delta Y_{1,t} \\ \Delta Y_{2, t} \end{pmatrix} = \begin{pmatrix} 1 \\ 2 \end{pmatrix} + \begin{pmatrix} -0.5 \\ -0.25 \end{pmatrix} \begin{pmatrix} 1 & 2 \end{pmatrix} \begin{pmatrix} Y_{1, t-1} \\ Y_{2, t-1} \end{pmatrix} + \begin{pmatrix} \epsilon_{1,t} \\ \epsilon_{2,t} \end{pmatrix} $$

and multiplying out $\beta$ gives us:

$$ \begin{pmatrix} \Delta Y_{1,t} \\ \Delta Y_{2, t} \end{pmatrix} = \begin{pmatrix} 1 \\ 2 \end{pmatrix} + \begin{pmatrix} -0.5 \\ -0.25 \end{pmatrix} \begin{pmatrix} Y_{1, t-1} + 2 Y_{2, t-1} \end{pmatrix} + \begin{pmatrix} \epsilon_{1,t} \\ \epsilon_{2,t} \end{pmatrix} $$

Since the differences give us a $VAR(0)$ model, both $\Delta \vec{Y}_t$ and $\vec{\epsilon}_t$ are stationary, so the linear combination $Y_{1, t-1} + 2 Y_{2, t-1}$, which appears in both $\Delta Y_{1,t}$ and $\Delta Y_{2, t}$ equations, is also stationary.

The component $Y_{1, t-1} + 2 Y_{2, t-1}$ is our the cointegration relation.

Currently $VECM$ estimation using the statsmodels package is not available. However, $VECM$ implementation in this package is being developed on github and will hopefully be included in the next version of the package.

	coef	t-value	p-value
Intercept	0.04786	0.54051	0.58909
time(Y)	-0.00010	-0.31712	0.75129
L(Y, 1)	0.26118	5.81055	0.00000
L(Y, 2)	-0.46762	-10.35969	0.00000
L(Y, 3)	0.16800	3.82247	0.00015
L(Y, 4)	0.03509	0.84809	0.39680
L(X, 1)	-0.15171	-3.43602	0.00064
L(X, 2)	-0.35091	-7.72272	0.00000
L(X, 3)	-0.15685	-3.25712	0.00120
L(X, 4)	0.14562	3.00051	0.00283

	coef	t-value	p-value
L(Y, 1)	0.26850	6.07557	0.00000
L(Y, 2)	-0.48680	-12.57832	0.00000
L(Y, 3)	0.18030	4.34041	0.00002
L(X, 1)	-0.14919	-3.39262	0.00075
L(X, 2)	-0.34744	-7.68324	0.00000
L(X, 3)	-0.15513	-3.23323	0.00131
L(X, 4)	0.14259	2.97370	0.00309

	coef	t-value	p-value
L(Y, 1)	0.36183	8.20549	0.00000
L(Y, 2)	-0.52855	-12.97914	0.00000
L(Y, 3)	0.19286	4.37158	0.00002

	df_resid	ssr	df_diff	ss_diff	F	Pr(>F)
0	494.0	561.093262	0.0	NaN	NaN	NaN
1	489.0	457.810248	5.0	103.283014	22.063898	5.886248e-20

	coef	t-value	p-value
L(Y, 1)	0.03078	0.87004	0.38470
L(X, 1)	0.18394	4.18541	0.00003

	df_resid	ssr	df_diff	ss_diff	F	Pr(>F)
0	498.0	487.12786	0.0	NaN	NaN	NaN
1	497.0	486.38705	1.0	0.74081	0.756974	0.384697

	Y	X
2014-01-01	-0.482312	-5.207152
2014-01-02	3.282004	-2.690684
2014-01-03	4.615729	-7.816732
2014-01-04	-1.826135	-10.601947
2014-01-05	-2.829564	-6.553398

	Y_forc	X_forc	Y_forc_lower	X_forc_lower	Y_forc_upp	X_forc_upp
2015-05-15	NaN	NaN	NaN	NaN	NaN	NaN
2015-05-16	5.801895	2.140114	0.518463	-3.732963	11.085327	8.013192
2015-05-17	4.574881	2.664054	-2.626166	-3.959500	11.775927	9.287607
2015-05-18	4.734859	2.427360	-2.831670	-4.820143	12.301388	9.674863
2015-05-19	5.015460	2.036403	-2.630887	-5.783866	12.661807	9.856672
2015-05-20	5.013136	1.809481	-2.683788	-6.381566	12.710061	10.000528
2015-05-21	4.885571	1.710273	-2.868248	-6.687699	12.639390	10.108245
2015-05-22	4.791351	1.640436	-3.006908	-6.879097	12.589610	10.159968
2015-05-23	4.750021	1.567120	-3.074567	-7.031152	12.574610	10.165391
2015-05-24	4.727850	1.498911	-3.111933	-7.152046	12.567634	10.149867
2015-05-25	4.705617	1.444638	-3.143815	-7.240905	12.555048	10.130180

	Y_forc	X_forc	Y_forc_lower	X_forc_lower	Y_forc_upp	X_forc_upp
2015-05-15	8.811978	1.086297	NaN	NaN	NaN	NaN
2015-05-16	5.801895	2.140114	0.518463	-3.732963	11.085327	8.013192
2015-05-17	4.574881	2.664054	-2.626166	-3.959500	11.775927	9.287607
2015-05-18	4.734859	2.427360	-2.831670	-4.820143	12.301388	9.674863
2015-05-19	5.015460	2.036403	-2.630887	-5.783866	12.661807	9.856672
2015-05-20	5.013136	1.809481	-2.683788	-6.381566	12.710061	10.000528
2015-05-21	4.885571	1.710273	-2.868248	-6.687699	12.639390	10.108245
2015-05-22	4.791351	1.640436	-3.006908	-6.879097	12.589610	10.159968
2015-05-23	4.750021	1.567120	-3.074567	-7.031152	12.574610	10.165391
2015-05-24	4.727850	1.498911	-3.111933	-7.152046	12.567634	10.149867
2015-05-25	4.705617	1.444638	-3.143815	-7.240905	12.555048	10.130180

	E1	E2
E1	7.273977	5.685325
E2	5.685325	9.022872