4.8 Autocorrelated (Serially Correlated) Errors

Consider the case where assumption (MR.4) does not hold, but assumption (MR.3) (and the other remaining assumptions (MR.1), (MR.2), (MR.5) and, optionally, (MR.6)) are still valid. Then we can write the following model as: \[ \mathbf{Y} = \mathbf{X} \boldsymbol{\beta} + \boldsymbol{\varepsilon}, \quad \mathbb{E} \left( \boldsymbol{\varepsilon} | \mathbf{X}\right) = \boldsymbol{0},\quad \mathbb{V}{\rm ar}\left( \boldsymbol{\varepsilon} | \mathbf{X} \right) = \mathbb{E} \left( \boldsymbol{\varepsilon} \boldsymbol{\varepsilon}^\top \right)= \mathbf{\Sigma} \neq \sigma_\epsilon^2 \mathbf{I} \] The case when the error variance-covariance matrix is no longer diagonal, but with equal diagonal elements, expressed as: \[ \mathbf{\Sigma} = \begin{bmatrix} \sigma^2 & \sigma_{1,2} & \sigma_{1,3} & ... & \sigma_{1,N}\\ \sigma_{2,1} & \sigma^2 & \sigma_{2,3} & ... & \sigma_{2,N}\\ \vdots & \vdots & \vdots & \ddots & \vdots\\ \sigma_{N,1} & \sigma_{N,2} & \sigma_{N3,} & ... & \sigma^2 \end{bmatrix}, \quad \sigma_{i,j} = \sigma_{j,i} = \mathbb{C}{\rm ov}(\epsilon_i, \epsilon_j) \neq 0, \quad i,j = 1,...,N \] is referred to as autocorrelation (or serial correlation). Just like before with heteroskedasticity:

  • the OLS estimators remain unbiased and consistent;
  • OLS estimators are no longer efficient;
  • the variance estimator of the OLS estimates is biased and inconsistent;
  • \(t\)-statistics of the OLS estimates are invalid.

It should be stressed that serial correlation is usually present in time-series data. For cross-sectional data, the errors may be correlated in terms of social, or geographical distance. For example, the distance between cities, towns, neighborhoods, etc.

As this problem is the same as the one stated in 4.6, we have the following options:

The strategy is as follows:

  1. Assume that \(\mathbf{Y} = \mathbf{X} \boldsymbol{\beta} + \boldsymbol{\varepsilon}\) is the true model.
  2. Create a model for the OLS residuals. For example, the residual auxiliary regression in the Breusch-Godfrey test includes the explanatory variables of the original model, as well as the lagged residuals: \[ \widehat{\epsilon}_i = \alpha_0 + \alpha_1 X_{1, i} + ... + \alpha_k X_{k,i} + \rho_1 \widehat{\epsilon}_{i - 1} + \rho_2 \widehat{\epsilon}_{i - 2} + ... + \rho_p \widehat{\epsilon}_{i - p} + u_t \]
  3. Test the null hypothesis that the residuals are serially correlated using the OLS residual auxiliary regression:
    \[ H_0: \rho_1 = \rho_2 = ... = \rho_p = 0 \]
  4. If we fail to reject the null hypothesis - we can use the usual OLS estimators.
  5. If we reject the null hypothesis, there are two ways we can go:
    • Use the OLS estimators, but correct their variance estimators (i.e. make them consistent);
    • Instead of OLS, use FGLS (and its variations).
    • Attempt to specify a different model, which would, hopefully, be able to account for serial correlation
(autocorrelation may be the cause of a misspecified model, as demonstrated in 3.8.3.3 )
Example 4.29 We will simulate the following model:

\[ \begin{cases} Y_{i} &= \beta_0 + \beta_1 X_{1,i} + \beta_2 X_{2,i} + \epsilon_i \\ \epsilon_i &= \rho \epsilon_{i-1} + u_i,\ |\rho| < 1,\ u_i \sim \mathcal{N}(0, \sigma^2),\quad \epsilon_0 = 0 \end{cases} \]

4.8.1 Testing For Serial Correlation

We can examine the presence of autocorrelation from the residuals plots, as well as conducting a number of formal tests.

We will begin by estimating our model via OLS, as we usually would.

##             Estimate Std. Error   t value Pr(>|t|)
## (Intercept) 10.61601    0.97983  10.83457        0
## x1           4.89018    0.20457  23.90497        0
## x2          -2.93686    0.07545 -38.92241        0
##               Coef.  Std.Err.         t  P>|t|   [0.025    0.975]
## Intercept  10.20024   1.15943   8.79767    0.0  7.91376  12.48671
## x1          4.81317   0.25507  18.87032    0.0  4.31016   5.31618
## x2         -2.92246   0.08668 -33.71534    0.0 -3.09340  -2.75152

4.8.1.1 Residual Correlogram

We begin by calculating the residuals from the OLS model: \[ \widehat{\boldsymbol{\varepsilon}} = \mathbf{Y} - \mathbf{X} \widehat{\boldsymbol{\beta}}_{\rm OLS} \] we will use these residuals as estimates for the true (unobserved) error term \(\boldsymbol{\varepsilon}\) and examine their autocorrelations. Note that from the OLS structure it holds true that \(\mathbb{E} \left( \widehat{\boldsymbol{\varepsilon}} | \mathbf{X}\right) = \boldsymbol{0}\). So, assume that we want to calculate the sample correlation between \(\widehat{\epsilon}_i\) and \(\widehat{\epsilon}_{i - k}\) - we want to calculate the autocorrelation at lag k: \[ \widehat{\rho}(k) = \widehat{\mathbb{C}{\rm orr}}(\widehat{\epsilon}_i, \widehat{\epsilon}_{i-k}) = \dfrac{\widehat{\mathbb{C}{\rm ov}}(\widehat{\epsilon}_i, \widehat{\epsilon}_{i-k})}{\sqrt{\widehat{\mathbb{V}{\rm ar}}(\widehat{\epsilon}_i)}\sqrt{\widehat{\mathbb{V}{\rm ar}}(\widehat{\epsilon}_{i-k})}} = \dfrac{\sum_{i = k + 1}^N \widehat{\epsilon}_i \widehat{\epsilon}_{i-k}}{\sum_{i = 1}^N \widehat{\epsilon}_i^2} \] Furthermore, assume that we are interested in testing whether the sample (auto)correlation \(\widehat{\rho}(k)\) is significantly different from zero: \[ \begin{aligned} H_0: \widehat{\rho}(k) = 0\\ H_1: \widehat{\rho}(k) \neq 0 \end{aligned} \] Under the null hypothesis it holds true that \(\widehat{\rho}(k) \stackrel{a}{\sim} \mathcal{N}(0, 1/N)\), where \(\stackrel{a}{\sim}\) indicates asymptotic distribution - the distribution as the sample size \(N \rightarrow \infty\). In this case, it means that for large samples the distribution is approximately normal. Consequently, a suitable statistic can be constructed as: \[ Z = \dfrac{\widehat{\rho}(k) - 0}{\sqrt{1/N}} = \widehat{\rho}(k) \cdot \sqrt{N} \stackrel{a}{\sim} \mathcal{N}(0, 1) \] So, at a \(5\%\) significance level, the critical value is \(Z_c \approx 1.96\). In this case, w would reject the null hypothesis when: \[ \widehat{\rho}(k) \cdot \sqrt{N} \geq 1.96, \text{ or } \widehat{\rho}(k) \cdot \sqrt{N} \leq 1.96 \] or alternatively, if we want to have the notation similar to “\(\text{estimate} \pm \text{critical value} \cdot \text{se}(\text{estimate})\)”, we can write it as: \[ \widehat{\rho}(k) \geq 1.96 / \sqrt{N}, \text{ or } \widehat{\rho}(k) \leq 1.96 / \sqrt{N} \] As a rule of thumb, we can sometimes take \(2\) instead of \(1.96\). We can also calculate this via software:

We will begin by manually calculating the autocorrelations and their confidence bounds for \(k = 1,...,20\):

and we can plot them:

Alternatively, we can use the built-in functions:

Note the confidence intervals in Python are calculated differently via Bertlett’s formula, which is under the alternative hypothesis, that serial correlation exists up to lag \(k-1\).

We can see from the plots that there are some sample autocorrelations \(\widehat{\rho}(k)\), which are statistically significantly different from zero (i.e. their values are above the blue horizontal line).

As with most visual diagnostics tools - it may not always be a clear-cut answer from the plots alone. So, we can apply a number of different statistical tests to check whether there are statistically significant sample correlations.

4.8.1.2 Autocorrelation Tests

The tests are identical to the ones described in 3.8.3.3, but re-visited for the multiple regression model case. These tests will be re-examined when dealing with time-series data models.

4.8.1.2.1 Durbin-Watson Test

The DW test is used to test the hypothesis that the residuals are serially correlated at lag 1, i.e. that in the following model: \[ \epsilon_i = \rho \epsilon_{i-1} + v_i \] the hypothesis being tested is: \[ \begin{cases} H_0: \rho = 0\qquad \text{(no serial correlation)}\\ H_1: \rho \neq 0\qquad \text{(serial correlation at lag 1)} \end{cases} \] Since we do not observe the true error terms, we use the OLS residuals \(\widehat{\epsilon}_i\) and calculate the Durbin-Watson statistic as: \[ DW = \dfrac{\sum_{i = 2}^N (\widehat{\epsilon}_i - \widehat{\epsilon}_{i-1})^2}{\sum_{i = 1}^N \widehat{\epsilon}_i^2} \]

The DW test statistics critical values may not be available in some econometric software. Furthermore, its distribution no longer holds, when the equation of \(Y_i\) contains a lagged dependent variable, \(Y_{i-1}\).

As a quick rule of thumb, if the DW statistic is near \(2\), then we do not reject the null hypothesis of no serial correlation.

If we assume that \(\sum_{i = 2}^N \widehat{\epsilon}_i^2 \approx \sum_{i = 2}^N \widehat{\epsilon}_{i-1}^2\), then we can re-write the DW statistic as: \[ DW = \dfrac{\sum_{i = 2}^N \widehat{\epsilon}_i^2 - 2 \sum_{i = 2}^N \widehat{\epsilon}_i\widehat{\epsilon}_{i-1} + \sum_{i = 2}^N \widehat{\epsilon}_{i-1}^2}{\sum_{i = 1}^N \widehat{\epsilon}_i^2} \approx \dfrac{2\left[ \sum_{i = 2}^N \widehat{\epsilon}_i^2 - \sum_{i = 2}^N \widehat{\epsilon}_i\widehat{\epsilon}_{i-1} \right]}{\sum_{i = 1}^N \widehat{\epsilon}_i^2} = 2 \left[ 1 - \widehat{\rho}(1)\right] = \begin{cases} 4,\ \text{ if } \widehat{\rho}(1) = -1\\ 2,\ \text{ if } \widehat{\rho}(1) = 0\\ 0,\ \text{ if } \widehat{\rho}(1) = 1 \end{cases} \] which helps in understanding why we expect the DW statistic to be close to \(2\) under the null hypothesis.

## 
##  Durbin-Watson test
## 
## data:  mdl_1
## DW = 0.56637, p-value < 2.2e-16
## alternative hypothesis: true autocorrelation is not 0

because \(p-value < 0.05\), we reject the null hypothesis at the \(5\%\) significance level and conclude that the residuals are serially correlated at lag order 1.

4.8.1.2.2 Breusch-Godfrey (BG or LM) Test

The BG test can be applied to a model, with a lagged response variable on the right-hand side, for example: \(Y_i = \beta_0 + \beta_1 X_{1,i} + ... + \beta_k X_{k,i} + \beta_{k+1} Y_{i - 1} + \epsilon_i\).

In general, we estimate the model parameters via OLS and calculate the residuals: \[ \widehat{\boldsymbol{\varepsilon}} = \mathbf{Y} - \mathbf{X} \widehat{\boldsymbol{\beta}} \] where \(\mathbf{X}\) can contain \(X_{1,i},...,X_{k,i}, Y_{i - 1}\) for \(i = 1,...,N\).

Then, we estimate an auxiliary regression on \(\widehat{\boldsymbol{\varepsilon}}\) as: \[ \widehat{\epsilon}_i = \alpha_0 + \alpha_1 X_{1,i} + ... + \alpha_k X_{k,i} + \rho_i \widehat{\epsilon}_{i-1} + ... + \rho_p \widehat{\epsilon}_{i-p} + u_i \] (Note: if we included \(Y_{i - 1}\) in the initial regression, then we need to also include them in the auxiliary regression).

We want to test the null hypothesis that the lagged residual coefficients are insignificant: \[ \begin{cases} H_0: \rho_1 = ... = \rho_p = 0\\ H_1: \rho_j \neq 0, \text{ for some } j \end{cases} \] We can either carry out an \(F\)-test, or a Chi-squared test: \[ LM = (N-p)R_{\widehat{\epsilon}}^2 \sim \chi^2_p \] where \(R_{\widehat{\epsilon}}^2\) is the \(R\)-squared from the auxiliary regression on \(\widehat{\boldsymbol{\varepsilon}}\).

The BG test is a more general test compared to DW

(for some discussions about the test, see (here) and (here), and an applied example using STATA)

Looking at the correlogram plots Let’s test the null hypothesis that at least on of the first three lags of the residuals is not zero, i.e. \(p = 3\):

## 
##  Breusch-Godfrey test for serial correlation of order up to 3
## 
## data:  mdl_1
## LM test = 100.72, df = 3, p-value < 2.2e-16

Since the \(p-value < 0.05\), we reject the null hypothesis and conclude that there is serial correlation in the residuals.

4.8.2 Heteroskedasticity-and-Autocorrelation-Consistent Standard Errors (HAC)

So far, we have assumed that the diagonal elements of \(\mathbf{\Sigma}\) are constant - i.e. that the residuals are serially correlated but homoskedastic. We can further generalize this for the case of heteroskedastic serially correlated standard errors: \[ \mathbf{\Sigma} = \begin{bmatrix} \sigma_1^2 & \sigma_{1,2} & \sigma_{1,3} & ... & \sigma_{1,N}\\ \sigma_{2,1} & \sigma_2^2 & \sigma_{2,3} & ... & \sigma_{2,N}\\ \vdots & \vdots & \vdots & \ddots & \vdots\\ \sigma_{N,1} & \sigma_{N,2} & \sigma_{N3,} & ... & \sigma_N^2 \end{bmatrix}, \quad \sigma_{i,j} = \sigma_{j,i} = \mathbb{C}{\rm ov}(\epsilon_i, \epsilon_j) \neq 0, \quad i,j = 1,...,N \] Similarly to 4.7, we can use OLS estimator and correct the standard errors. In this case the corrected standard errors are known as HAC (Heteroskedasticity-and-Autocorrelation-Consistent) standard errors, or Newey–West standard errors.

The White covariance matrix assumes serially uncorrelated residuals. On the other hand, the Newey-West proposed a more general covariance estimator, which is robust to both heteroskedasticity and autocorrelation, of the residuals of unknown covariance form. The HAC coefficient covariance estimator handles autocorrelation with lags up to \(p\) - it is assumed that lags larger than \(p\) are insignificant and thus can be ignored. It is defined as: \[ HAC_{\widehat{\boldsymbol{\beta}}_{OLS}} = \left( \mathbf{X}^\top \mathbf{X}\right)^{-1} N \widehat{\mathbf{\Sigma}} \left( \mathbf{X}^\top \mathbf{X}\right)^{-1} \] where: \[ \widehat{\mathbf{\Sigma}} = \widehat{\boldsymbol{\Gamma}}(0) + \sum_{j = 1}^p \left[ 1 - \dfrac{j}{p + 1}\right] \left[ \widehat{\boldsymbol{\Gamma}}(j) + \widehat{\boldsymbol{\Gamma}}(-j)\right] \] \[ \widehat{\boldsymbol{\Gamma}}(j) = \dfrac{1}{N} \left( \sum_{i = 1}^N\widehat{\epsilon}_{i}\widehat{\epsilon}_{i-j} \mathbf{X}_i\mathbf{X}_{i-j}^\top\right),\quad \mathbf{X}_i = \left[1,\ X_{1,i},\ X_{2,i},\ ...,\ X_{k,i} \right]^\top \] In the absence of serial correlation we would have that: \[ \widehat{\mathbf{\Sigma}} = \widehat{\boldsymbol{\Gamma}}(0) \] which is equivalent to the White Estimators (HCE).

Note: HAC not only corrects for autocorrelation, but also for heteroskedasticity.

Do not be alarmed if you see slightly different HAC standard errors in different statistical programs - there are a number of different variations of \(\widehat{\mathbf{\Sigma}}\).

Using the built-in functions we have that:

##             (Intercept)           x1           x2
## (Intercept)  1.43560709 -0.303808295 -0.043572462
## x1          -0.30380830  0.183608167 -0.002018108
## x2          -0.04357246 -0.002018108  0.004421496

Following the documentation, NeweyWest() is a convenience interface to vcovHAC() using Bartlett kernel weights. In comparison vcovHAC() allows choosing weights as either weightsAndrews, or weightsLumley, or a custom function to calculate the weights.

## [[ 1.73550283 -0.30329739 -0.07751371]
##  [-0.30329739  0.11432819  0.00191364]
##  [-0.07751371  0.00191364  0.00758616]]

then the coefficients, their standard errors and \(p\)-values can be summarized as:

##              Estimate Std. Error    t value      Pr(>|t|)
## (Intercept) 10.616005 1.19816822   8.860196  4.719858e-16
## x1           4.890185 0.42849524  11.412460  1.741811e-23
## x2          -2.936860 0.06649433 -44.167068 3.799730e-104

Note that we used [, ] so that we could treat the output as a data.frame if we needed to round the results.

##               Coef.  Std.Err.         t  P>|t|   [0.025    0.975]
## Intercept  10.20024   1.30747   7.80152    0.0  7.62181  12.77866
## x1          4.81317   0.33558  14.34289    0.0  4.15139   5.47496
## x2         -2.92246   0.08644 -33.80798    0.0 -3.09293  -2.75199

compared with the biased residuals in the OLS output:

##              Estimate Std. Error   t value     Pr(>|t|)
## (Intercept) 10.616005 0.97982714  10.83457 9.311699e-22
## x1           4.890185 0.20456771  23.90497 3.887306e-60
## x2          -2.936860 0.07545421 -38.92241 1.933624e-94
##               Coef.  Std.Err.         t  P>|t|   [0.025    0.975]
## Intercept  10.20024   1.15943   8.79767    0.0  7.91376  12.48671
## x1          4.81317   0.25507  18.87032    0.0  4.31016   5.31618
## x2         -2.92246   0.08668 -33.71534    0.0 -3.09340  -2.75152

we see that the HAC standard errors are somewhat larger, compared to the ones we get from OLS.

4.8.3 Feasible GLS - Cochrane-Orcutt Procedure (CORC)

Alternatively to HAC, we can make some assumptions regarding the nature of autocorrelation and employ a more efficient GLS estimator.

For the case, when the residuals are serially correlated at lag 1, but not heteroskedastic: \[ \epsilon_i = \rho \epsilon_{i-1} + u_i,\ |\rho| < 1,\ u_i \sim \mathcal{N}(0, \sigma^2), \quad i \in \{0, \pm 1, \pm 2,... \} \] we note that: \[ \begin{aligned} \mathbb{V}{\rm ar}(\epsilon_i) &= \mathbb{V}{\rm ar}(\rho\epsilon_{i-1} + u_i)\\ &= \mathbb{V}{\rm ar}(\rho(\rho\epsilon_{i-2} + u_{i-1}) + u_i) \\ &= \mathbb{V}{\rm ar}(\rho^2(\rho\epsilon_{i-3} + u_{i-2}) + u_i + \rho u_{i-1})\\ &= ... \\ &= \mathbb{V}{\rm ar} \left(\sum_{j = 0}^\infty \rho^j u_{i-j}\right)\\ &=\sum_{j = 0}^\infty \rho^{2j} \cdot \mathbb{V}{\rm ar} \left(u_{i-j}\right)\\ &= \dfrac{\sigma^2}{1 - \rho^2} \end{aligned} \] and: \[ \begin{aligned} \mathbb{C}{\rm ov}(\epsilon_i,\ \epsilon_{i - k}) &= \mathbb{C}{\rm ov}(\rho(\rho\epsilon_{i-2} + u_{i-1}) + u_i,\ \epsilon_{i - k}) \\ &=\mathbb{C}{\rm ov}(\rho^2(\rho\epsilon_{i-3} + u_{i-2}) + \rho u_{i-1},\ \epsilon_{i - k}) \\ &= ... \\ &=\mathbb{C}{\rm ov}(\rho^k\epsilon_{i-k},\ \epsilon_{i - k}) \\ &= \rho^k \mathbb{C}{\rm ov}(\epsilon_{i-k},\ \epsilon_{i - k}) \\ &= \rho^k \sigma^2, \quad \text{ since } \mathbb{C}{\rm ov}(u_i, \epsilon_j ) = 0, i \neq j \end{aligned} \] Consequently, we can re-write the covariance matrix as: \[ \mathbf{\Sigma} = \dfrac{\sigma^2}{1 - \rho^2} \begin{bmatrix} 1 & \rho & \rho^2 & ... & \rho^{N-1}\\ \rho & 1 & \rho & ... & \rho^{N-2}\\ \vdots & \vdots & \vdots & \ddots & \vdots\\ \rho^{N-1} & \rho^{N-2} & \rho^{N-3} & ... & 1 \end{bmatrix} \]

In this case, the knowledge of parameter \(\rho\) allows us to empirically apply the GLS - as FGLS.

Cochrane–Orcutt (CORC) estimator:

  1. Estimate \(\boldsymbol{\beta}\) via OLS from: \[ \mathbf{Y} = \mathbf{X} \boldsymbol{\beta} + \boldsymbol{\varepsilon} \] and calculate the residuals \(\widehat{\boldsymbol{\varepsilon}}^{(1)} = \mathbf{Y} - \mathbf{X} \widehat{\boldsymbol{\beta}}\), where \({(1)}\) denotes the first iteration.
  2. Estimate the following residual regression via OLS: \[ \widehat{\epsilon}^{(1)}_i = \rho \widehat{\epsilon}^{(1)}_{i-1} + \widehat{u}_i \] and obtain \(\widehat{\rho}^{(1)}\).
  3. Calculate the following transformed variables: \[ Y_i^* = Y_i - \widehat{\rho}^{(1)} Y_{i-1}^*, \qquad \boldsymbol{X}_i^* = \boldsymbol{X}_i - \widehat{\rho}^{(1)} \boldsymbol{X}_{i-1}^*,\text{ where }\mathbf{X}_i = \left[1,\ X_{1,i},\ X_{2,i},\ ...,\ X_{k,i} \right]^\top \]
  4. Apply OLS to the following model: \[ (Y_i - \widehat{\rho}^{(1)} Y_{i-1}) = \beta_0 (1 - \widehat{\rho}^{(1)}) + \beta_1 (X_{1,i} - \widehat{\rho}^{(1)}) + ... + \beta_k (X_{k,i} - \widehat{\rho}^{(1)}) + u_i \] or, more compactly: \[ \mathbf{Y}^* = \mathbf{X}^* \boldsymbol{\beta} + \boldsymbol{u} \] to get the OLS estimates \(\widetilde{\boldsymbol\beta}\). Note: if we decide to use a column of ones for the constant \(\beta_0\), then we will actually have estimated \(\widetilde{\beta}_0^* = \widetilde{\beta}_0 (1 - \widehat{\rho}^{(1)})\) and will need to transform the intercept term as \(\widetilde{\beta}_0 = \widehat{\beta}_0^* / (1 - \widehat{\rho}^{(1)})\). On the other hand, if we transform the intercept column to have \(1 - \widehat{\rho}^{(1)}\) in the design matrix instead, then we will not need to transform the intercept coefficient (this is similar to how we had to carry out WLS for the heteroskedastic error case in the previous section).
  5. Having estimated the model, calculate the residuals on the non-transformed data: \(\widehat{\boldsymbol{\varepsilon}}^{(2)} = \mathbf{Y} - \mathbf{X} \widetilde{\boldsymbol{\beta}}\) and go to step (2).

Repeat this procedure until \(\widehat{\rho}\) converges: if the change in \(\widehat{\rho}^{(K)}\), compared to the previous iteration \(\widehat{\rho}^{(K-1)}\) is no more than \(\Delta\) (for example \(\Delta = 0.001\)) - stop the procedure.

The final value \(\widehat{\rho}^{(K)}\) is then used to get the FGLS (CORC) estimates of \(\widehat{\boldsymbol\beta}\). Again, depending on your specification, you may need to transform the intercept coefficient: \(\widehat{\beta}_0 = \widehat{\beta}_0^* / (1 - \widehat{\rho}^{(K)})\).

These FGLS estimators are not unbiased but they are consistent and asymptotically efficient.

The procedure can be carried out with the built-in functions as follows:

##              Estimate Std. Error t value      Pr(>|t|)
## (Intercept) 10.402800 1.50824063   6.897  7.124332e-11
## x1           5.082365 0.48877934  10.398  1.899433e-20
## x2          -2.978490 0.04202203 -70.879 1.185668e-141
##            Coef.  Std.Err.          t          P>|t|    [0.025     0.975]
## const  11.072803  2.083601   5.314262   2.897596e-07  6.963647  15.181958
## x1      4.693541  0.686176   6.840138   9.839513e-11  3.340305   6.046778
## x2     -2.974871  0.045583 -65.262702  6.559064e-135 -3.064767  -2.884975

the estimated \(\rho\) is:

## [1] 0.7088817
## [0.75084929]

which is close to the true value of \(\rho\) used in the data generation.

Some more examples can be found here.

Alternative procedures to CORC are Hildreth-Lu Procedure and Prais–Winsten Procedure.

4.8.4 Choosing Between HAC and FGLS

It has become more popular to estimate models by OLS and correct the standard errors for fairly arbitrary forms of serial correlation and heteroskedasticity.

Regarding HAC:

  • Computation of robust standard errors works generally well in large datasets;
  • With increase in computational power, not only has it become possible to (quickly) estimate models on large datasets, but also to calculate their robust covariance (HAC) estimators;
  • While FGLS offers a theoretical efficiency, it involves making additional assumptions on the error covariance matrix, which may not be easy to test/verify, which may threaten the consistency of the estimator.

Regarding FGLS:

  • if the explanatory variables are not strictly exogenous (i.e. if we include \(Y_{i-1}\) on the right-hand-side of the equation) - the FGLS is not only inefficient, but it is also inconsistent;
  • in most applications of FGLS, the errors are assumed to follow a first order autoregressive process. It may be better to evaluate OLS estimates and use a robust correction on their standard errors for more general forms of serial correlation;
  • in addition to imposing an assumption of the residual covariance structure in regard to autocorrelation, GLS also requires an exogeneity assumption (MR.3) to hold, unlike HAC.

Generally, serial correlation is usually encountered in time-series data, which has its own set of models that specifically deal address serial correlation of either the residuals \(\epsilon\), the endogenous variable \(Y\), or the exogeneous variables \(X_j\), or even all at once.

It is worth noting that autocorrelated residuals are more frequently the result of a misspecified regression equation, rather than a genuine autocorrelation.

A final thing to note:

In many cases, the presence of autocorrelation, especially in cross-sectional data, is not an indication that the model has autocorrelated errors, but rather that it:

  • is misspecified;
  • is suffering from omitted variable bias;
  • has an incorrect functional form for either \(Y\), or \(X\).

4.8.5 Monte Carlo Simulation: OLS vs FGLS

To illustrate the effects of heteroskedasticity on the standard errors of the estimates, and efficiency between OLS and FGLS, we will carry out a Monte Carlo simulation. We will simulate the following model: \[ \begin{cases} Y_{i}^{(m)} &= \beta_0 + \beta_1 X_{1,i}^{(m)} + \beta_2 X_{2,i}^{(m)} + \epsilon_i^{(m)} \\ \epsilon_i^{(m)} &= \rho \epsilon_{i-1}^{(m)} + u_i^{(m)} ,\ |\rho| < 1,\ u_i^{(m)} \sim \mathcal{N}(0, \sigma^2),\quad \epsilon_0^{(m)} = 0,\quad i = 1,...,N,\quad m = 1,..., MC \end{cases} \] We will simulate a total of \(MC = 1000\) samples from this model with specific coefficients and estimate the parameters via OLS, WLS, as well as correct the standard errors of OLS via HCE. We will do so with the following code:

np.random.seed(123)
# Number of simulations:
MC = 1000
# Fixed parameters
N = 100
beta_vec = np.array([1, 0.2, -0.3])
rho = 0.8
# matrix of parameter estimates for each sample:
beta_est_ols  = np.empty((0, len(beta_vec)), int)
beta_pval_ols = np.empty((0, len(beta_vec)), int)
beta_pval_hac = np.empty((0, len(beta_vec)), int)
#
beta_est_fgls  = np.empty((0, len(beta_vec)), int)
beta_pval_fgls = np.empty((0, len(beta_vec)), int)
#
beta_est_gls  = np.empty((0, len(beta_vec)), int)
beta_pval_gls = np.empty((0, len(beta_vec)), int)
#
for i in range(0, MC):
    # simulate the data:
    x1 = np.linspace(start = 0, stop = 5, num = N)
    x2 = np.random.choice(np.linspace(start = 3, stop = 17, num = 80), size = N, replace = True)
    ee = np.random.normal(loc = 0, scale = 3, size = N)
    for i in range(1, N):
        ee[i] = rho * ee[i-1] + ee[i]
    #
    y  = np.dot(sm.add_constant(np.column_stack((x1, x2))), beta_vec) + ee
    data_mat = pd.DataFrame(np.column_stack([y, x1, x2]), columns = ["y", "x1", "x2"])
    # estimate via OLS
    mdl_1 = smf.ols(formula = "y ~ x1 + x2", data = data_mat)
    # correct OLS se's:
    mdl_hac = mdl_1.fit().get_robustcov_results(cov_type = 'HAC', maxlags = 1)
    # estimate via CORC:
    mdl_fgls = sm.GLSAR(mdl_1.endog, mdl_1.exog).iterative_fit(maxiter = 100)
    # Save the estimates from each sample:    
    beta_est_ols = np.vstack([beta_est_ols, mdl_1.fit().params])
    beta_est_fgls = np.vstack([beta_est_fgls, mdl_fgls.params])
    # Save the coefficient p-values from each sample:
    beta_pval_ols = np.vstack([beta_pval_ols, mdl_1.fit().pvalues])
    beta_pval_hac = np.vstack([beta_pval_hac, mdl_hac.pvalues])
    beta_pval_fgls = np.vstack([beta_pval_fgls, mdl_fgls.pvalues])
#

Regarding the rejection rate of the null hypothesis that a parameter is insignificant, we have that:

So, the FGLS seems to be better in some parameter cases, while HAC may be similar to OLS. All in all, if we only have an autocorrelation of order 1 problem - the corrections may not have a huge impact on our conclusions in some cases.

We can look at the coefficient estimate histograms:

For higher order and more complex correlation structure of the residuals, this may not always be the case, hence, if we detect serial correlation, we should account for it in some way.