## 4.8 Autocorrelated (Serially Correlated) Errors

Consider the case where assumption (MR.4) does not hold, but assumption (MR.3) (and the other remaining assumptions (MR.1), (MR.2), (MR.5) and, optionally, (MR.6)) are still valid. Then we can write the following model as: $\mathbf{Y} = \mathbf{X} \boldsymbol{\beta} + \boldsymbol{\varepsilon}, \quad \mathbb{E} \left( \boldsymbol{\varepsilon} | \mathbf{X}\right) = \boldsymbol{0},\quad \mathbb{V}{\rm ar}\left( \boldsymbol{\varepsilon} | \mathbf{X} \right) = \mathbb{E} \left( \boldsymbol{\varepsilon} \boldsymbol{\varepsilon}^\top \right)= \mathbf{\Sigma} \neq \sigma_\epsilon^2 \mathbf{I}$ The case when the error variance-covariance matrix is no longer diagonal, but with equal diagonal elements, expressed as: $\mathbf{\Sigma} = \begin{bmatrix} \sigma^2 & \sigma_{1,2} & \sigma_{1,3} & ... & \sigma_{1,N}\\ \sigma_{2,1} & \sigma^2 & \sigma_{2,3} & ... & \sigma_{2,N}\\ \vdots & \vdots & \vdots & \ddots & \vdots\\ \sigma_{N,1} & \sigma_{N,2} & \sigma_{N3,} & ... & \sigma^2 \end{bmatrix}, \quad \sigma_{i,j} = \sigma_{j,i} = \mathbb{C}{\rm ov}(\epsilon_i, \epsilon_j) \neq 0, \quad i,j = 1,...,N$ is referred to as autocorrelation (or serial correlation). Just like before with heteroskedasticity:

• the OLS estimators remain unbiased and consistent;
• OLS estimators are no longer efficient;
• the variance estimator of the OLS estimates is biased and inconsistent;
• $$t$$-statistics of the OLS estimates are invalid.

It should be stressed that serial correlation is usually present in time-series data. For cross-sectional data, the errors may be correlated in terms of social, or geographical distance. For example, the distance between cities, towns, neighborhoods, etc.

As this problem is the same as the one stated in 4.6, we have the following options:

The strategy is as follows:

1. Assume that $$\mathbf{Y} = \mathbf{X} \boldsymbol{\beta} + \boldsymbol{\varepsilon}$$ is the true model.
2. Create a model for the OLS residuals. For example, the residual auxiliary regression in the Breusch-Godfrey test includes the explanatory variables of the original model, as well as the lagged residuals: $\widehat{\epsilon}_i = \alpha_0 + \alpha_1 X_{1, i} + ... + \alpha_k X_{k,i} + \rho_1 \widehat{\epsilon}_{i - 1} + \rho_2 \widehat{\epsilon}_{i - 2} + ... + \rho_p \widehat{\epsilon}_{i - p} + u_t$
3. Test the null hypothesis that the residuals are serially correlated using the OLS residual auxiliary regression:
$H_0: \rho_1 = \rho_2 = ... = \rho_p = 0$
4. If we fail to reject the null hypothesis - we can use the usual OLS estimators.
5. If we reject the null hypothesis, there are two ways we can go:
• Use the OLS estimators, but correct their variance estimators (i.e. make them consistent);
• Instead of OLS, use FGLS (and its variations).
• Attempt to specify a different model, which would, hopefully, be able to account for serial correlation
(autocorrelation may be the cause of a misspecified model, as demonstrated in 3.8.3.3 )
Example 4.29 We will simulate the following model:

$\begin{cases} Y_{i} &= \beta_0 + \beta_1 X_{1,i} + \beta_2 X_{2,i} + \epsilon_i \\ \epsilon_i &= \rho \epsilon_{i-1} + u_i,\ |\rho| < 1,\ u_i \sim \mathcal{N}(0, \sigma^2),\quad \epsilon_0 = 0 \end{cases}$

#
#
#
#
#
set.seed(123)
#
N <- 200
beta_vec <- c(10, 5, -3)
rho <- 0.8
#
x1 <- seq(from = 0, to = 5, length.out = N)
x2 <- sample(seq(from = 3, to = 17, length.out = 80), size = N, replace = TRUE)
# serially correlated residuals:
ee <- rnorm(mean = 0, sd = 3, n = N)
for(i in 2:N){
ee[i] <- rho * ee[i-1] + ee[i]
}
#
x_mat <- cbind(1, x1, x2)
#
y <- x_mat %*% beta_vec + ee
#
data_mat <- data.frame(y, x1, x2)
import numpy as np
import statsmodels.api as sm
import statsmodels.formula.api as smf
import pandas as pd
#
np.random.seed(123)
#
N = 200
beta_vec = np.array([10, 5, -3])
rho = 0.8
#
x1 = np.linspace(start = 0, stop = 5, num = N)
x2 = np.random.choice(np.linspace(start = 3, stop = 17, num = 80), size = N, replace = True)
# serially correlated residuals:
ee = np.random.normal(loc = 0, scale = 3, size = N)
for i in range(1, N):
ee[i] = rho * ee[i-1] + ee[i]
#
#
#
y = np.dot(x_mat, beta_vec) + ee
#
data_mat = pd.DataFrame(np.column_stack([y, x1, x2]), columns = ["y", "x1", "x2"])

### 4.8.1 Testing For Serial Correlation

We can examine the presence of autocorrelation from the residuals plots, as well as conducting a number of formal tests.

We will begin by estimating our model via OLS, as we usually would.

mdl_1 <- lm(y ~ x1 + x2, data = data_mat)
print(round(coef(summary(mdl_1)), 5))
##             Estimate Std. Error   t value Pr(>|t|)
## (Intercept) 10.61601    0.97983  10.83457        0
## x1           4.89018    0.20457  23.90497        0
## x2          -2.93686    0.07545 -38.92241        0
mdl_1 = smf.ols(formula = "y ~ x1 + x2", data = data_mat)
print(np.round(mdl_1.fit().summary2().tables, 5))
##               Coef.  Std.Err.         t  P>|t|   [0.025    0.975]
## Intercept  10.20024   1.15943   8.79767    0.0  7.91376  12.48671
## x1          4.81317   0.25507  18.87032    0.0  4.31016   5.31618
## x2         -2.92246   0.08668 -33.71534    0.0 -3.09340  -2.75152

#### 4.8.1.1 Residual Correlogram

We begin by calculating the residuals from the OLS model: $\widehat{\boldsymbol{\varepsilon}} = \mathbf{Y} - \mathbf{X} \widehat{\boldsymbol{\beta}}_{\rm OLS}$ we will use these residuals as estimates for the true (unobserved) error term $$\boldsymbol{\varepsilon}$$ and examine their autocorrelations. Note that from the OLS structure it holds true that $$\mathbb{E} \left( \widehat{\boldsymbol{\varepsilon}} | \mathbf{X}\right) = \boldsymbol{0}$$. So, assume that we want to calculate the sample correlation between $$\widehat{\epsilon}_i$$ and $$\widehat{\epsilon}_{i - k}$$ - we want to calculate the autocorrelation at lag k: $\widehat{\rho}(k) = \widehat{\mathbb{C}{\rm orr}}(\widehat{\epsilon}_i, \widehat{\epsilon}_{i-k}) = \dfrac{\widehat{\mathbb{C}{\rm ov}}(\widehat{\epsilon}_i, \widehat{\epsilon}_{i-k})}{\sqrt{\widehat{\mathbb{V}{\rm ar}}(\widehat{\epsilon}_i)}\sqrt{\widehat{\mathbb{V}{\rm ar}}(\widehat{\epsilon}_{i-k})}} = \dfrac{\sum_{i = k + 1}^N \widehat{\epsilon}_i \widehat{\epsilon}_{i-k}}{\sum_{i = 1}^N \widehat{\epsilon}_i^2}$ Furthermore, assume that we are interested in testing whether the sample (auto)correlation $$\widehat{\rho}(k)$$ is significantly different from zero: \begin{aligned} H_0: \widehat{\rho}(k) = 0\\ H_1: \widehat{\rho}(k) \neq 0 \end{aligned} Under the null hypothesis it holds true that $$\widehat{\rho}(k) \stackrel{a}{\sim} \mathcal{N}(0, 1/N)$$, where $$\stackrel{a}{\sim}$$ indicates asymptotic distribution - the distribution as the sample size $$N \rightarrow \infty$$. In this case, it means that for large samples the distribution is approximately normal. Consequently, a suitable statistic can be constructed as: $Z = \dfrac{\widehat{\rho}(k) - 0}{\sqrt{1/N}} = \widehat{\rho}(k) \cdot \sqrt{N} \stackrel{a}{\sim} \mathcal{N}(0, 1)$ So, at a $$5\%$$ significance level, the critical value is $$Z_c \approx 1.96$$. In this case, w would reject the null hypothesis when: $\widehat{\rho}(k) \cdot \sqrt{N} \geq 1.96, \text{ or } \widehat{\rho}(k) \cdot \sqrt{N} \leq 1.96$ or alternatively, if we want to have the notation similar to “$$\text{estimate} \pm \text{critical value} \cdot \text{se}(\text{estimate})$$”, we can write it as: $\widehat{\rho}(k) \geq 1.96 / \sqrt{N}, \text{ or } \widehat{\rho}(k) \leq 1.96 / \sqrt{N}$ As a rule of thumb, we can sometimes take $$2$$ instead of $$1.96$$. We can also calculate this via software:

#
#
z_c <- qnorm(p = 1 - 0.05 / 2, mean = 0, sd = 1)
print(z_c)
##  1.959964
import scipy.stats as stats
#
z_c = stats.norm.ppf(q = 1 - 0.05 / 2, loc = 0, scale = 1)
print(z_c)
## 1.959963984540054

We will begin by manually calculating the autocorrelations and their confidence bounds for $$k = 1,...,20$$:

rho <- NULL
e <- mdl_1residuals for(k in 1:20){ r <- sum(e[-c(1:k)] * e[1:(N-k)]) / sum(e^2) rho <- c(rho, r) } rho_lower <- - z_c / sqrt(N) rho_upper <- + z_c / sqrt(N) rho = np.empty(0, int) e = mdl_1.fit().resid for k in range(0, 20): r = np.sum(np.array(e[(k + 1):]) * np.array(e[:-(k + 1)])) / np.sum(e**2) rho = np.append(rho, r) # rho_lower = - z_c / np.sqrt(N) rho_upper = + z_c / np.sqrt(N) and we can plot them: # # # plot(1:k, rho, ylim = c(min(rho_lower), 1), pch = 21, bg = "red") segments(x0 = 1:k, y0 = rho, x1 = 1:k, y1 = 0, lwd = 2) abline(h = 0) # Draw the confidence bounds: abline(h = rho_upper, lty = 2, col = "blue") # abline(h = rho_lower, lty = 2, col = "blue") import matplotlib.pyplot as plt # _ = plt.figure(num = 0, figsize = (10, 8)) _ = plt.plot(list(range(1, 21)), rho, linestyle = "None", marker = "o") _ = [plt.vlines(x = _x, ymin = 0, ymax = _y) for _x, _y in zip(list(range(1, 21)), rho)] _ = plt.axhline(y = 0, linestyle = "-", color = "black") # Draw the confidence bounds: _ = plt.axhline(y = rho_lower, linestyle = "--") _ = plt.axhline(y = rho_upper, linestyle = "--") plt.show() Alternatively, we can use the built-in functions: # # # # # forecast::Acf(e) # from statsmodels.tsa.stattools import acf from statsmodels.graphics.tsaplots import plot_acf # fig = plt.figure(num = 1, figsize = (10, 8)) _ = plot_acf(e, lags = 20, zero = False, ax = fig.add_subplot(111)) plt.show() Note the confidence intervals in Python are calculated differently via Bertlett’s formula, which is under the alternative hypothesis, that serial correlation exists up to lag $$k-1$$. We can see from the plots that there are some sample autocorrelations $$\widehat{\rho}(k)$$, which are statistically significantly different from zero (i.e. their values are above the blue horizontal line). As with most visual diagnostics tools - it may not always be a clear-cut answer from the plots alone. So, we can apply a number of different statistical tests to check whether there are statistically significant sample correlations. #### 4.8.1.2 Autocorrelation Tests The tests are identical to the ones described in 3.8.3.3, but re-visited for the multiple regression model case. These tests will be re-examined when dealing with time-series data models. ##### 4.8.1.2.1 Durbin-Watson Test The DW test is used to test the hypothesis that the residuals are serially correlated at lag 1, i.e. that in the following model: $\epsilon_i = \rho \epsilon_{i-1} + v_i$ the hypothesis being tested is: $\begin{cases} H_0: \rho = 0\qquad \text{(no serial correlation)}\\ H_1: \rho \neq 0\qquad \text{(serial correlation at lag 1)} \end{cases}$ Since we do not observe the true error terms, we use the OLS residuals $$\widehat{\epsilon}_i$$ and calculate the Durbin-Watson statistic as: $DW = \dfrac{\sum_{i = 2}^N (\widehat{\epsilon}_i - \widehat{\epsilon}_{i-1})^2}{\sum_{i = 1}^N \widehat{\epsilon}_i^2}$ The DW test statistics critical values may not be available in some econometric software. Furthermore, its distribution no longer holds, when the equation of $$Y_i$$ contains a lagged dependent variable, $$Y_{i-1}$$. As a quick rule of thumb, if the DW statistic is near $$2$$, then we do not reject the null hypothesis of no serial correlation. If we assume that $$\sum_{i = 2}^N \widehat{\epsilon}_i^2 \approx \sum_{i = 2}^N \widehat{\epsilon}_{i-1}^2$$, then we can re-write the DW statistic as: $DW = \dfrac{\sum_{i = 2}^N \widehat{\epsilon}_i^2 - 2 \sum_{i = 2}^N \widehat{\epsilon}_i\widehat{\epsilon}_{i-1} + \sum_{i = 2}^N \widehat{\epsilon}_{i-1}^2}{\sum_{i = 1}^N \widehat{\epsilon}_i^2} \approx \dfrac{2\left[ \sum_{i = 2}^N \widehat{\epsilon}_i^2 - \sum_{i = 2}^N \widehat{\epsilon}_i\widehat{\epsilon}_{i-1} \right]}{\sum_{i = 1}^N \widehat{\epsilon}_i^2} = 2 \left[ 1 - \widehat{\rho}(1)\right] = \begin{cases} 4,\ \text{ if } \widehat{\rho}(1) = -1\\ 2,\ \text{ if } \widehat{\rho}(1) = 0\\ 0,\ \text{ if } \widehat{\rho}(1) = 1 \end{cases}$ which helps in understanding why we expect the DW statistic to be close to $$2$$ under the null hypothesis. # # print(lmtest::dwtest(mdl_1, alternative = "two.sided")) ## ## Durbin-Watson test ## ## data: mdl_1 ## DW = 0.56637, p-value < 2.2e-16 ## alternative hypothesis: true autocorrelation is not 0 import statsmodels.stats.stattools as sm_tools # print(sm_tools.durbin_watson(mdl_1.fit().resid)) ## 0.5086521126452894 because $$p-value < 0.05$$, we reject the null hypothesis at the $$5\%$$ significance level and conclude that the residuals are serially correlated at lag order 1. ##### 4.8.1.2.2 Breusch-Godfrey (BG or LM) Test The BG test can be applied to a model, with a lagged response variable on the right-hand side, for example: $$Y_i = \beta_0 + \beta_1 X_{1,i} + ... + \beta_k X_{k,i} + \beta_{k+1} Y_{i - 1} + \epsilon_i$$. In general, we estimate the model parameters via OLS and calculate the residuals: $\widehat{\boldsymbol{\varepsilon}} = \mathbf{Y} - \mathbf{X} \widehat{\boldsymbol{\beta}}$ where $$\mathbf{X}$$ can contain $$X_{1,i},...,X_{k,i}, Y_{i - 1}$$ for $$i = 1,...,N$$. Then, we estimate an auxiliary regression on $$\widehat{\boldsymbol{\varepsilon}}$$ as: $\widehat{\epsilon}_i = \alpha_0 + \alpha_1 X_{1,i} + ... + \alpha_k X_{k,i} + \rho_i \widehat{\epsilon}_{i-1} + ... + \rho_p \widehat{\epsilon}_{i-p} + u_i$ (Note: if we included $$Y_{i - 1}$$ in the initial regression, then we need to also include them in the auxiliary regression). We want to test the null hypothesis that the lagged residual coefficients are insignificant: $\begin{cases} H_0: \rho_1 = ... = \rho_p = 0\\ H_1: \rho_j \neq 0, \text{ for some } j \end{cases}$ We can either carry out an $$F$$-test, or a Chi-squared test: $LM = (N-p)R_{\widehat{\epsilon}}^2 \sim \chi^2_p$ where $$R_{\widehat{\epsilon}}^2$$ is the $$R$$-squared from the auxiliary regression on $$\widehat{\boldsymbol{\varepsilon}}$$. The BG test is a more general test compared to DW (for some discussions about the test, see (here) and (here), and an applied example using STATA) Looking at the correlogram plots Let’s test the null hypothesis that at least on of the first three lags of the residuals is not zero, i.e. $$p = 3$$: # # # # bg_t <- lmtest::bgtest(mdl_1, order = 3) print(bg_t) ## ## Breusch-Godfrey test for serial correlation of order up to 3 ## ## data: mdl_1 ## LM test = 100.72, df = 3, p-value < 2.2e-16 import statsmodels.stats.diagnostic as sm_diagnostic from statsmodels.compat import lzip # name = ['LM-stat', 'LM: p-value', 'F-value', 'F: p-value'] bg_t = sm_diagnostic.acorr_breusch_godfrey(mdl_1.fit(), nlags = 2) print(pd.DataFrame(lzip(name, bg_t))) ## 0 1 ## 0 LM-stat 1.122980e+02 ## 1 LM: p-value 4.119139e-25 ## 2 F-value 1.248438e+02 ## 3 F: p-value 1.238847e-35 Since the $$p-value < 0.05$$, we reject the null hypothesis and conclude that there is serial correlation in the residuals. ### 4.8.2 Heteroskedasticity-and-Autocorrelation-Consistent Standard Errors (HAC) So far, we have assumed that the diagonal elements of $$\mathbf{\Sigma}$$ are constant - i.e. that the residuals are serially correlated but homoskedastic. We can further generalize this for the case of heteroskedastic serially correlated standard errors: $\mathbf{\Sigma} = \begin{bmatrix} \sigma_1^2 & \sigma_{1,2} & \sigma_{1,3} & ... & \sigma_{1,N}\\ \sigma_{2,1} & \sigma_2^2 & \sigma_{2,3} & ... & \sigma_{2,N}\\ \vdots & \vdots & \vdots & \ddots & \vdots\\ \sigma_{N,1} & \sigma_{N,2} & \sigma_{N3,} & ... & \sigma_N^2 \end{bmatrix}, \quad \sigma_{i,j} = \sigma_{j,i} = \mathbb{C}{\rm ov}(\epsilon_i, \epsilon_j) \neq 0, \quad i,j = 1,...,N$ Similarly to 4.7, we can use OLS estimator and correct the standard errors. In this case the corrected standard errors are known as HAC (Heteroskedasticity-and-Autocorrelation-Consistent) standard errors, or Newey–West standard errors. The White covariance matrix assumes serially uncorrelated residuals. On the other hand, the Newey-West proposed a more general covariance estimator, which is robust to both heteroskedasticity and autocorrelation, of the residuals of unknown covariance form. The HAC coefficient covariance estimator handles autocorrelation with lags up to $$p$$ - it is assumed that lags larger than $$p$$ are insignificant and thus can be ignored. It is defined as: $HAC_{\widehat{\boldsymbol{\beta}}_{OLS}} = \left( \mathbf{X}^\top \mathbf{X}\right)^{-1} N \widehat{\mathbf{\Sigma}} \left( \mathbf{X}^\top \mathbf{X}\right)^{-1}$ where: $\widehat{\mathbf{\Sigma}} = \widehat{\boldsymbol{\Gamma}}(0) + \sum_{j = 1}^p \left[ 1 - \dfrac{j}{p + 1}\right] \left[ \widehat{\boldsymbol{\Gamma}}(j) + \widehat{\boldsymbol{\Gamma}}(-j)\right]$ $\widehat{\boldsymbol{\Gamma}}(j) = \dfrac{1}{N} \left( \sum_{i = 1}^N\widehat{\epsilon}_{i}\widehat{\epsilon}_{i-j} \mathbf{X}_i\mathbf{X}_{i-j}^\top\right),\quad \mathbf{X}_i = \left[1,\ X_{1,i},\ X_{2,i},\ ...,\ X_{k,i} \right]^\top$ In the absence of serial correlation we would have that: $\widehat{\mathbf{\Sigma}} = \widehat{\boldsymbol{\Gamma}}(0)$ which is equivalent to the White Estimators (HCE). Note: HAC not only corrects for autocorrelation, but also for heteroskedasticity. Do not be alarmed if you see slightly different HAC standard errors in different statistical programs - there are a number of different variations of $$\widehat{\mathbf{\Sigma}}$$. Using the built-in functions we have that: # # #V_HAC <- sandwich::vcovHAC(mdl_1) V_HAC <- sandwich::NeweyWest(mdl_1, lag = 1) print(V_HAC) ## (Intercept) x1 x2 ## (Intercept) 1.43560709 -0.303808295 -0.043572462 ## x1 -0.30380830 0.183608167 -0.002018108 ## x2 -0.04357246 -0.002018108 0.004421496 Following the documentation, NeweyWest() is a convenience interface to vcovHAC() using Bartlett kernel weights. In comparison vcovHAC() allows choosing weights as either weightsAndrews, or weightsLumley, or a custom function to calculate the weights. import statsmodels.stats as sm_stats # V_HAC = sm_stats.sandwich_covariance.cov_hac_simple(mdl_1.fit(), nlags = 1) print(V_HAC) ## [[ 1.73550283 -0.30329739 -0.07751371] ## [-0.30329739 0.11432819 0.00191364] ## [-0.07751371 0.00191364 0.00758616]] then the coefficients, their standard errors and $$p$$-values can be summarized as: # # print(lmtest::coeftest(mdl_1, sandwich::vcovHAC(mdl_1))) print(lmtest::coeftest(mdl_1, sandwich::NeweyWest(mdl_1, lag = 1))[, ]) ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 10.616005 1.19816822 8.860196 4.719858e-16 ## x1 4.890185 0.42849524 11.412460 1.741811e-23 ## x2 -2.936860 0.06649433 -44.167068 3.799730e-104 Note that we used [, ] so that we could treat the output as a data.frame if we needed to round the results. mdl_1_HAC = mdl_1.fit().get_robustcov_results(cov_type = 'HAC', maxlags = 1) print(np.round(mdl_1_HAC.summary2().tables, 5)) ## Coef. Std.Err. t P>|t| [0.025 0.975] ## Intercept 10.20024 1.30747 7.80152 0.0 7.62181 12.77866 ## x1 4.81317 0.33558 14.34289 0.0 4.15139 5.47496 ## x2 -2.92246 0.08644 -33.80798 0.0 -3.09293 -2.75199 compared with the biased residuals in the OLS output: print(coef(summary(mdl_1))) ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 10.616005 0.97982714 10.83457 9.311699e-22 ## x1 4.890185 0.20456771 23.90497 3.887306e-60 ## x2 -2.936860 0.07545421 -38.92241 1.933624e-94 print(np.round(mdl_1.fit().summary2().tables, 5)) ## Coef. Std.Err. t P>|t| [0.025 0.975] ## Intercept 10.20024 1.15943 8.79767 0.0 7.91376 12.48671 ## x1 4.81317 0.25507 18.87032 0.0 4.31016 5.31618 ## x2 -2.92246 0.08668 -33.71534 0.0 -3.09340 -2.75152 we see that the HAC standard errors are somewhat larger, compared to the ones we get from OLS. ### 4.8.3 Feasible GLS - Cochrane-Orcutt Procedure (CORC) Alternatively to HAC, we can make some assumptions regarding the nature of autocorrelation and employ a more efficient GLS estimator. For the case, when the residuals are serially correlated at lag 1, but not heteroskedastic: $\epsilon_i = \rho \epsilon_{i-1} + u_i,\ |\rho| < 1,\ u_i \sim \mathcal{N}(0, \sigma^2), \quad i \in \{0, \pm 1, \pm 2,... \}$ we note that: \begin{aligned} \mathbb{V}{\rm ar}(\epsilon_i) &= \mathbb{V}{\rm ar}(\rho\epsilon_{i-1} + u_i)\\ &= \mathbb{V}{\rm ar}(\rho(\rho\epsilon_{i-2} + u_{i-1}) + u_i) \\ &= \mathbb{V}{\rm ar}(\rho^2(\rho\epsilon_{i-3} + u_{i-2}) + u_i + \rho u_{i-1})\\ &= ... \\ &= \mathbb{V}{\rm ar} \left(\sum_{j = 0}^\infty \rho^j u_{i-j}\right)\\ &=\sum_{j = 0}^\infty \rho^{2j} \cdot \mathbb{V}{\rm ar} \left(u_{i-j}\right)\\ &= \dfrac{\sigma^2}{1 - \rho^2} \end{aligned} and: \begin{aligned} \mathbb{C}{\rm ov}(\epsilon_i,\ \epsilon_{i - k}) &= \mathbb{C}{\rm ov}(\rho(\rho\epsilon_{i-2} + u_{i-1}) + u_i,\ \epsilon_{i - k}) \\ &=\mathbb{C}{\rm ov}(\rho^2(\rho\epsilon_{i-3} + u_{i-2}) + \rho u_{i-1},\ \epsilon_{i - k}) \\ &= ... \\ &=\mathbb{C}{\rm ov}(\rho^k\epsilon_{i-k},\ \epsilon_{i - k}) \\ &= \rho^k \mathbb{C}{\rm ov}(\epsilon_{i-k},\ \epsilon_{i - k}) \\ &= \rho^k \sigma^2, \quad \text{ since } \mathbb{C}{\rm ov}(u_i, \epsilon_j ) = 0, i \neq j \end{aligned} Consequently, we can re-write the covariance matrix as: $\mathbf{\Sigma} = \dfrac{\sigma^2}{1 - \rho^2} \begin{bmatrix} 1 & \rho & \rho^2 & ... & \rho^{N-1}\\ \rho & 1 & \rho & ... & \rho^{N-2}\\ \vdots & \vdots & \vdots & \ddots & \vdots\\ \rho^{N-1} & \rho^{N-2} & \rho^{N-3} & ... & 1 \end{bmatrix}$ In this case, the knowledge of parameter $$\rho$$ allows us to empirically apply the GLS - as FGLS. Cochrane–Orcutt (CORC) estimator: 1. Estimate $$\boldsymbol{\beta}$$ via OLS from: $\mathbf{Y} = \mathbf{X} \boldsymbol{\beta} + \boldsymbol{\varepsilon}$ and calculate the residuals $$\widehat{\boldsymbol{\varepsilon}}^{(1)} = \mathbf{Y} - \mathbf{X} \widehat{\boldsymbol{\beta}}$$, where $${(1)}$$ denotes the first iteration. 2. Estimate the following residual regression via OLS: $\widehat{\epsilon}^{(1)}_i = \rho \widehat{\epsilon}^{(1)}_{i-1} + \widehat{u}_i$ and obtain $$\widehat{\rho}^{(1)}$$. 3. Calculate the following transformed variables: $Y_i^* = Y_i - \widehat{\rho}^{(1)} Y_{i-1}^*, \qquad \boldsymbol{X}_i^* = \boldsymbol{X}_i - \widehat{\rho}^{(1)} \boldsymbol{X}_{i-1}^*,\text{ where }\mathbf{X}_i = \left[1,\ X_{1,i},\ X_{2,i},\ ...,\ X_{k,i} \right]^\top$ 4. Apply OLS to the following model: $(Y_i - \widehat{\rho}^{(1)} Y_{i-1}) = \beta_0 (1 - \widehat{\rho}^{(1)}) + \beta_1 (X_{1,i} - \widehat{\rho}^{(1)}) + ... + \beta_k (X_{k,i} - \widehat{\rho}^{(1)}) + u_i$ or, more compactly: $\mathbf{Y}^* = \mathbf{X}^* \boldsymbol{\beta} + \boldsymbol{u}$ to get the OLS estimates $$\widetilde{\boldsymbol\beta}$$. Note: if we decide to use a column of ones for the constant $$\beta_0$$, then we will actually have estimated $$\widetilde{\beta}_0^* = \widetilde{\beta}_0 (1 - \widehat{\rho}^{(1)})$$ and will need to transform the intercept term as $$\widetilde{\beta}_0 = \widehat{\beta}_0^* / (1 - \widehat{\rho}^{(1)})$$. On the other hand, if we transform the intercept column to have $$1 - \widehat{\rho}^{(1)}$$ in the design matrix instead, then we will not need to transform the intercept coefficient (this is similar to how we had to carry out WLS for the heteroskedastic error case in the previous section). 5. Having estimated the model, calculate the residuals on the non-transformed data: $$\widehat{\boldsymbol{\varepsilon}}^{(2)} = \mathbf{Y} - \mathbf{X} \widetilde{\boldsymbol{\beta}}$$ and go to step (2). Repeat this procedure until $$\widehat{\rho}$$ converges: if the change in $$\widehat{\rho}^{(K)}$$, compared to the previous iteration $$\widehat{\rho}^{(K-1)}$$ is no more than $$\Delta$$ (for example $$\Delta = 0.001$$) - stop the procedure. The final value $$\widehat{\rho}^{(K)}$$ is then used to get the FGLS (CORC) estimates of $$\widehat{\boldsymbol\beta}$$. Again, depending on your specification, you may need to transform the intercept coefficient: $$\widehat{\beta}_0 = \widehat{\beta}_0^* / (1 - \widehat{\rho}^{(K)})$$. These FGLS estimators are not unbiased but they are consistent and asymptotically efficient. The procedure can be carried out with the built-in functions as follows: mdl_1_CORC <- orcutt::cochrane.orcutt(mdl_1) # print(coef(summary(mdl_1_CORC))) ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 10.402800 1.50824063 6.897 7.124332e-11 ## x1 5.082365 0.48877934 10.398 1.899433e-20 ## x2 -2.978490 0.04202203 -70.879 1.185668e-141 mdl_1_CORC = sm.GLSAR(mdl_1.endog, mdl_1.exog) mdl_1_CORC_fit = mdl_1_CORC.iterative_fit(maxiter = 100) print(mdl_1_CORC_fit.summary2().tables) ## Coef. Std.Err. t P>|t| [0.025 0.975] ## const 11.072803 2.083601 5.314262 2.897596e-07 6.963647 15.181958 ## x1 4.693541 0.686176 6.840138 9.839513e-11 3.340305 6.046778 ## x2 -2.974871 0.045583 -65.262702 6.559064e-135 -3.064767 -2.884975 the estimated $$\rho$$ is: print(mdl_1_CORCrho)
##  0.7088817
print(mdl_1_CORC.rho)
## [0.75084929]

which is close to the true value of $$\rho$$ used in the data generation.

Some more examples can be found here.

Alternative procedures to CORC are Hildreth-Lu Procedure and Prais–Winsten Procedure.

### 4.8.4 Choosing Between HAC and FGLS

It has become more popular to estimate models by OLS and correct the standard errors for fairly arbitrary forms of serial correlation and heteroskedasticity.

Regarding HAC:

• Computation of robust standard errors works generally well in large datasets;
• With increase in computational power, not only has it become possible to (quickly) estimate models on large datasets, but also to calculate their robust covariance (HAC) estimators;
• While FGLS offers a theoretical efficiency, it involves making additional assumptions on the error covariance matrix, which may not be easy to test/verify, which may threaten the consistency of the estimator.

Regarding FGLS:

• if the explanatory variables are not strictly exogenous (i.e. if we include $$Y_{i-1}$$ on the right-hand-side of the equation) - the FGLS is not only inefficient, but it is also inconsistent;
• in most applications of FGLS, the errors are assumed to follow a first order autoregressive process. It may be better to evaluate OLS estimates and use a robust correction on their standard errors for more general forms of serial correlation;
• in addition to imposing an assumption of the residual covariance structure in regard to autocorrelation, GLS also requires an exogeneity assumption (MR.3) to hold, unlike HAC.

Generally, serial correlation is usually encountered in time-series data, which has its own set of models that specifically deal address serial correlation of either the residuals $$\epsilon$$, the endogenous variable $$Y$$, or the exogeneous variables $$X_j$$, or even all at once.

It is worth noting that autocorrelated residuals are more frequently the result of a misspecified regression equation, rather than a genuine autocorrelation.

A final thing to note:

In many cases, the presence of autocorrelation, especially in cross-sectional data, is not an indication that the model has autocorrelated errors, but rather that it:

• is misspecified;
• is suffering from omitted variable bias;
• has an incorrect functional form for either $$Y$$, or $$X$$.

### 4.8.5 Monte Carlo Simulation: OLS vs FGLS

To illustrate the effects of heteroskedasticity on the standard errors of the estimates, and efficiency between OLS and FGLS, we will carry out a Monte Carlo simulation. We will simulate the following model: $\begin{cases} Y_{i}^{(m)} &= \beta_0 + \beta_1 X_{1,i}^{(m)} + \beta_2 X_{2,i}^{(m)} + \epsilon_i^{(m)} \\ \epsilon_i^{(m)} &= \rho \epsilon_{i-1}^{(m)} + u_i^{(m)} ,\ |\rho| < 1,\ u_i^{(m)} \sim \mathcal{N}(0, \sigma^2),\quad \epsilon_0^{(m)} = 0,\quad i = 1,...,N,\quad m = 1,..., MC \end{cases}$ We will simulate a total of $$MC = 1000$$ samples from this model with specific coefficients and estimate the parameters via OLS, WLS, as well as correct the standard errors of OLS via HCE. We will do so with the following code:

set.seed(123)
# Number of simulations:
MC <- 1000
# Fixed parameters
N <- 100
beta_vec <- c(1, 0.2, -0.3)
rho <- 0.8
# matrix of parameter estimates for each sample:
beta_est_ols  <- NULL
beta_pval_ols <- NULL
beta_pval_hac <- NULL
#
beta_est_fgls  <- NULL
beta_pval_fgls <- NULL
#
beta_est_gls  <- NULL
beta_pval_gls <- NULL
#
for(i in 1:MC){
# simulate the data:
x1 <- seq(from = 0, to = 5, length.out = N)
x2 <- sample(seq(from = 3, to = 17, length.out = 80), size = N, replace = TRUE)
ee <- rnorm(mean = 0, sd = 3, n = N)
for(i in 2:N){
ee[i] <- rho * ee[i-1] + ee[i]
}
y  <- cbind(1, x1, x2) %*% beta_vec + ee
data_mat <- data.frame(y, x1, x2)
# estimate via OLS
mdl_0 <- lm(y ~ x1 + x2, data = data_mat)
# correct OLS se's:
# mdl_hac <- lmtest::coeftest(mdl_0, vcov. = sandwich::vcovHAC(mdl_0))
mdl_hac <- lmtest::coeftest(mdl_0, vcov. = sandwich::NeweyWest(mdl_1, lag = 1))
# estimate via CORC:
suppressWarnings({
mdl_fgls <- orcutt::cochrane.orcutt(mdl_0)
})
# Save the estimates from each sample:
beta_est_ols  <- rbind(beta_est_ols, coef(summary(mdl_0))[, 1])
beta_est_fgls  <- rbind(beta_est_fgls, coef(summary(mdl_fgls))[, 1])
# Save the coefficient p-values from each sample:
beta_pval_ols <- rbind(beta_pval_ols, coef(summary(mdl_0))[, 4])
beta_pval_hac <- rbind(beta_pval_hac, mdl_hac[, 4])
beta_pval_fgls <- rbind(beta_pval_fgls, coef(summary(mdl_fgls))[, 4])
}
np.random.seed(123)
# Number of simulations:
MC = 1000
# Fixed parameters
N = 100
beta_vec = np.array([1, 0.2, -0.3])
rho = 0.8
# matrix of parameter estimates for each sample:
beta_est_ols  = np.empty((0, len(beta_vec)), int)
beta_pval_ols = np.empty((0, len(beta_vec)), int)
beta_pval_hac = np.empty((0, len(beta_vec)), int)
#
beta_est_fgls  = np.empty((0, len(beta_vec)), int)
beta_pval_fgls = np.empty((0, len(beta_vec)), int)
#
beta_est_gls  = np.empty((0, len(beta_vec)), int)
beta_pval_gls = np.empty((0, len(beta_vec)), int)
#
for i in range(0, MC):
# simulate the data:
x1 = np.linspace(start = 0, stop = 5, num = N)
x2 = np.random.choice(np.linspace(start = 3, stop = 17, num = 80), size = N, replace = True)
ee = np.random.normal(loc = 0, scale = 3, size = N)
for i in range(1, N):
ee[i] = rho * ee[i-1] + ee[i]
#
y  = np.dot(sm.add_constant(np.column_stack((x1, x2))), beta_vec) + ee
data_mat = pd.DataFrame(np.column_stack([y, x1, x2]), columns = ["y", "x1", "x2"])
# estimate via OLS
mdl_1 = smf.ols(formula = "y ~ x1 + x2", data = data_mat)
# correct OLS se's:
mdl_hac = mdl_1.fit().get_robustcov_results(cov_type = 'HAC', maxlags = 1)
# estimate via CORC:
mdl_fgls = sm.GLSAR(mdl_1.endog, mdl_1.exog).iterative_fit(maxiter = 100)
# Save the estimates from each sample:
beta_est_ols = np.vstack([beta_est_ols, mdl_1.fit().params])
beta_est_fgls = np.vstack([beta_est_fgls, mdl_fgls.params])
# Save the coefficient p-values from each sample:
beta_pval_ols = np.vstack([beta_pval_ols, mdl_1.fit().pvalues])
beta_pval_hac = np.vstack([beta_pval_hac, mdl_hac.pvalues])
beta_pval_fgls = np.vstack([beta_pval_fgls, mdl_fgls.pvalues])
#

Regarding the rejection rate of the null hypothesis that a parameter is insignificant, we have that:

alpha = 0.05
a1 <- colSums(beta_pval_ols < alpha) / MC
a2 <- colSums(beta_pval_hac < alpha) / MC
a3 <- colSums(beta_pval_fgls < alpha) / MC
#
a <- t(data.frame(a1, a2, a3))
colnames(a) <- names(a1)
rownames(a) <- c("OLS: H0 rejection rate", "HAC: H0 rejection rate", "FGLS: H0 rejection rate")
print(a)
##                         (Intercept)    x1    x2
## OLS: H0 rejection rate        0.373 0.505 0.757
## HAC: H0 rejection rate        0.457 0.365 0.919
## FGLS: H0 rejection rate       0.135 0.122 0.999
alpha = 0.05
a1 = (beta_pval_ols < alpha).sum(axis = 0) / MC
a2 = (beta_pval_hac < alpha).sum(axis = 0) / MC
a3 = (beta_pval_fgls < alpha).sum(axis = 0) / MC
#
a = pd.DataFrame(np.column_stack([a1, a2, a3]),
index = mdl_1.exog_names,
columns = ["OLS: H0 rejection rate", "HAC: H0 rejection rate", "FGLS: H0 rejection rate"]).T
print(a)
##                          Intercept     x1     x2
## OLS: H0 rejection rate       0.369  0.514  0.734
## HAC: H0 rejection rate       0.344  0.419  0.763
## FGLS: H0 rejection rate      0.139  0.117  0.999

So, the FGLS seems to be better in some parameter cases, while HAC may be similar to OLS. All in all, if we only have an autocorrelation of order 1 problem - the corrections may not have a huge impact on our conclusions in some cases.

We can look at the coefficient estimate histograms:

# plot the estimate histograms:
layout(matrix(c(1, 2, 3, 3), nrow = 2, byrow = TRUE))
for(j in ncol(beta_est_ols):1){
p1 <- hist(beta_est_ols[, j], plot = FALSE, breaks = 25)
p2 <- hist(beta_est_fgls[, j], plot = FALSE, breaks = 25)
#
plot(p1, col = rgb(0,0,1,1/4), main = colnames(beta_est_ols)[j], ylim = c(0, max(p1$counts, p2$counts)))
plot(p2, col = rgb(1,0,0,1/4), add = T)
abline(v = beta_vec[j], lwd = 3, lty = 2, col = "red")
#
legend("topleft", legend = c("True Value", "OLS", "FGLS"),
lty = c(3, NA, NA), lwd = c(3, 0, 0), pch = c(NA, 22, 22),
col = c("red", "black", "black"), pt.cex = 2,
pt.bg = c(NA, rgb(0,0,1,1/4), rgb(1,0,0,1/4))
)
} #
fig = plt.figure(num = 2, figsize = (10, 8))
for j in range(len(a.columns), 0, -1):
if j != 1:
ax = fig.add_subplot(2, 2, 4 - j)
else:
plt.show() 