4.6 Generalized Least Squares

Primary sources of this chapter are: (Davidson et al. 2004) as well as Source 1 and Source 2.

Let our multiple regression be defined as: \[ \mathbf{Y} = \mathbf{X} \boldsymbol{\beta} + \boldsymbol{\varepsilon} \]

So far, one of our assumptions about the error term was defined by (MR.3) - (MR.4), namely that: \[ \mathbb{V}{\rm ar}\left( \boldsymbol{\varepsilon} | \mathbf{X} \right) = \mathbb{E} \left( \boldsymbol{\varepsilon} \boldsymbol{\varepsilon}^\top \right) = \sigma^2_\epsilon \mathbf{I} \] However, sometimes the variance-covariance matrix of the residuals \(\mathbb{V}{\rm ar}\left( \boldsymbol{\varepsilon} | \mathbf{X} \right) \neq \sigma^2_\epsilon \mathbf{I}\). In this chapter, we will examine how this change effects parameter estimation, as well as the possible solutions for such cases.

4.6.1 A General Multiple Linear Regression

Consider the following model: \[ \mathbf{Y} = \mathbf{X} \boldsymbol{\beta} + \boldsymbol{\varepsilon}, \quad \mathbb{E} \left( \boldsymbol{\varepsilon} | \mathbf{X}\right) = \boldsymbol{0},\quad \mathbb{V}{\rm ar}\left( \boldsymbol{\varepsilon} | \mathbf{X} \right) = \mathbb{E} \left( \boldsymbol{\varepsilon} \boldsymbol{\varepsilon}^\top \right)= \boldsymbol{\Sigma} = \sigma_\epsilon^2 \mathbf{\Omega} \] where \(\boldsymbol{\Omega}\) is symmetric and positive definite \(N \times N\) matrix. This model allows for the errors to be heteroskedastic or autocorrelated (or both) and is often referred to as special case of the generalized linear (regression) model (GLM).

Consequently, we may refer to this type of a models as a general multiple linear regression (GMLR), to distinguish it as only a special case of a GLM. We will examine GLM’s in more detail in the chapter 5.
  • If \(\mathbf{\Omega} = \mathbf{I}\), then the GMLR is just the simple multiple linear regression model that we discussed at the beginning of this chapter;
  • If \(\mathbf{\Omega}\) is diagonal with non-constant diagonal elements, then the error terms are uncorrelated but they are heteroskedastic;
  • If \(\mathbf{\Omega}\) is not diagonal then \(\mathbb{C}{\rm ov} (\epsilon_i, \epsilon_j) = \Omega_{i,j} \neq 0\) for some \(i \neq j\). In econometrics, non-diagonal covariance matrices are most commonly encountered in time-series data, with higher correlations for observations closer in time (usually when \(i\) and \(j\) are differ by 1 or 2).
    • If \(\mathbf{\Omega}\) is not diagonal and the diagonal elements are constant, then the errors are autocorrelated and homoskedastic;
    • If \(\mathbf{\Omega}\) is not diagonal and the diagonal elements are not constant, then the errors are autocorrelated and heteroskedastic;

For the remainder of this chapter, we will assume that \(\mathbf{\Omega} \neq \mathbf{I}\), since we have already covered this case in previous chapters.

4.6.2 Consequences when OLS is used on GMLR

Since the OLS estimator can be expressed as: \[ \widehat{\boldsymbol{\beta}}_{OLS} = \left( \mathbf{X}^\top \mathbf{X}\right)^{-1} \mathbf{X}^\top \mathbf{Y} \] then:

  • The expected value of the OLS estimator: \[ \mathbb{E} \left( \widehat{\boldsymbol{\beta}}_{OLS} \right) = \boldsymbol{\beta} + \left( \mathbf{X}^\top \mathbf{X}\right)^{-1} \mathbf{X}^\top \mathbb{E} \left( \boldsymbol{\varepsilon} \right) = \boldsymbol{\beta} \] i.e. the OLS estimators of \(\boldsymbol{\beta}\) are unbiased;
  • The variance-covariance matrix of the OLS estimator: \[ \mathbb{V}{\rm ar}\left( \widehat{\boldsymbol{\beta}}_{OLS} \right) = \mathbb{E} \left[ \left( \mathbf{X}^\top \mathbf{X}\right)^{-1} \mathbf{X}^\top \boldsymbol{\varepsilon} \boldsymbol{\varepsilon}^\top \mathbf{X} \left( \mathbf{X}^\top \mathbf{X}\right)^{-1}\right] = \sigma_\epsilon^2 \left( \mathbf{X}^\top \mathbf{X}\right)^{-1} \mathbf{X}^\top \mathbf{\Omega} \mathbf{X} \left( \mathbf{X}^\top \mathbf{X}\right)^{-1} \neq \sigma^2_\epsilon \left( \mathbf{X}^\top \mathbf{X}\right)^{-1} \]
  • Since \(\mathbb{V}{\rm ar}\left( \widehat{\boldsymbol{\beta}}_{OLS} \right) \neq \sigma^2_\epsilon \left( \mathbf{X}^\top \mathbf{X}\right)^{-1}\), statistical inference based on the following assumptions: \[ \begin{cases} \widehat{\mathbb{V}{\rm ar}}\left( \widehat{\boldsymbol{\beta}}_{OLS} \right) = \widehat{\sigma}^2_{OLS}\left( \mathbf{X}^\top \mathbf{X}\right)^{-1} \\\\ \widehat{\sigma}^2_{OLS} = \dfrac{\widehat{\boldsymbol{\varepsilon}}_{OLS}^\top \widehat{\boldsymbol{\varepsilon}}_{OLS}}{N-(k+1)} = \dfrac{1}{N-(k+1)} \left(\mathbf{Y} - \mathbf{X} \widehat{\boldsymbol{\beta}}_{OLS} \right)^\top \left( \mathbf{Y} - \mathbf{X} \widehat{\boldsymbol{\beta}}_{OLS} \right) \end{cases} \] are invalid since, in general, \(\widehat{\sigma}^2_{OLS}\) is a biased and inconsistent estimator of \(\sigma^2\) for GMLR. Consequently \(\widehat{\mathbb{V}{\rm ar}}\left( \widehat{\boldsymbol{\beta}}_{OLS} \right)\) are also biased and inconsistent.
  • The usual OLS \(t\)-statistics do not have Students \(t\) distribution, even in large samples;
  • The \(F\)-statistic no longer has Fisher distribution; the \(LM\) statistic (see heteroskedasticity tests, described in no longer has an asymptotic \(\chi^2\) distribution.
  • \(\widehat{\boldsymbol{\beta}}_{OLS} \rightarrow \boldsymbol{\beta}\), if the largest eigenvalue of \(\mathbf{\Omega}\) is bounded for all \(N\), and the largest eigenvalue of \(\left( \mathbf{X}^\top \mathbf{X}\right)^{-1}\) tends to zero as \(N \rightarrow \infty\).

In other words, the OLS estimates are unbiased and consistent, but their variance estimators are biased and inconsistent, which leads to incorrect results for statistical tests.

4.6.3 Generalized Least Squares (GLS)

The general idea behind GLS is that in order to obtain an efficient estimator of \(\widehat{\boldsymbol{\beta}}\), we need to transform the model, so that the transformed model satisfies the Gauss-Markov theorem (which is defined by our (MR.1)-(MR.5) assumptions). Then, estimating the transformed model by OLS yields efficient estimates. The transformation is expressed in terms of a (usually) triangular matrix \(\mathbf{\Psi}\), such that: \[ \mathbf{\Omega}^{-1} = \mathbf{\Psi} \mathbf{\Psi}^\top \] Then, premultiplying our initial model via this matrix yields: \[\begin{equation} \mathbf{\Psi}^\top \mathbf{Y} = \mathbf{\Psi}^\top\mathbf{X} \boldsymbol{\beta} + \mathbf{\Psi}^\top\boldsymbol{\varepsilon} \tag{4.1} \end{equation}\] Because \(\mathbf{\Omega}\) is non-singular - so is \(\mathbf{\Psi}\), therefore we can always multiply the previous expression by \(\left(\mathbf{\Psi}^\top\right)^{-1}\) to arrive back at the initial model. In other words - we have simply scaled both sides of the initial equation. Then, the following properties hold for the residuals: \[ \begin{aligned} \mathbb{E} \left( \mathbf{\Psi}^\top\boldsymbol{\varepsilon} \right) &= \mathbf{\Psi}^\top \mathbb{E} \left( \boldsymbol{\varepsilon} \right) = \boldsymbol{0}\\\\ \mathbb{V}{\rm ar} \left( \mathbf{\Psi}^\top\boldsymbol{\varepsilon} \right) &= \mathbb{V}{\rm ar} \left( \mathbf{\Psi}^\top\boldsymbol{\varepsilon} \right)\\ &= \mathbf{\Psi}^\top \mathbb{V}{\rm ar} \left( \boldsymbol{\varepsilon} \right) \mathbf{\Psi} \\ &= \sigma^2 \mathbf{\Psi}^\top \mathbf{\Omega} \mathbf{\Psi} \\ &= \sigma^2 \mathbf{\Psi}^\top \left(\mathbf{\Omega}^{-1} \right)^{-1} \mathbf{\Psi} \\ &= \sigma^2 \mathbf{\Psi}^\top \left(\mathbf{\Psi} \mathbf{\Psi}^\top \right)^{-1} \mathbf{\Psi} \\ &= \sigma^2 \mathbf{\Psi}^\top \left(\mathbf{\Psi}^\top \right)^{-1} \mathbf{\Psi} ^{-1} \mathbf{\Psi} \\ &= \sigma^2 \mathbf{I} \end{aligned} \] which means that now the assumptions (MR.3) - (MR.4) hold true for the model, defined in (4.1). The Parameter GLS Estimator

Consequently, the OLS estimator of \(\boldsymbol{\beta}\) from regression in (4.1) is: \[ \begin{aligned} \widehat{\boldsymbol{\beta}} &= \left( \mathbf{X}^\top \mathbf{\Psi} \mathbf{\Psi}^\top \mathbf{X}\right)^{-1} \mathbf{X}^\top \mathbf{\Psi} \mathbf{\Psi}^\top \mathbf{Y} = \left( \mathbf{X}^\top \mathbf{\Omega}^{-1} \mathbf{X}\right)^{-1} \mathbf{X}^\top \mathbf{\Omega}^{-1} \mathbf{Y} \end{aligned} \] This estimator is called the generalized least squares (GLS) estimator of \(\boldsymbol{\beta}\).

So, the GLS estimator can be formulated as follows:

Let \(\mathbf{\Omega}^{-1} = \mathbf{\Psi} \mathbf{\Psi}^\top\) and let \(\mathbf{Y}^* = \mathbf{X}^* \boldsymbol{\beta} + \boldsymbol{\varepsilon}^*\), where \(\mathbf{Y}^* = \mathbf{\Psi}^\top \mathbf{Y}\), \(\mathbf{X}^* = \mathbf{\Psi}^\top\mathbf{X}\) and \(\boldsymbol{\varepsilon}^* =\mathbf{\Psi}^\top\boldsymbol{\varepsilon}\). then the GLS estimator of \(\boldsymbol{\beta}\) is: \[ \widehat{\boldsymbol{\beta}}_{GLS} = \left( \mathbf{X}^{*\top} \mathbf{X}^* \right)^{-1} \mathbf{X}^{*\top} \mathbf{Y}^* = \left( \mathbf{X}^\top \mathbf{\Omega}^{-1} \mathbf{X}\right)^{-1} \mathbf{X}^\top \mathbf{\Omega}^{-1} \mathbf{Y} \]

Alternatively, it can also be obtained as a solution to the minimization of the GLS criterion function: \[ \left( \mathbf{Y} - \mathbf{X} \boldsymbol{\beta} \right)^\top \mathbf{\Omega}^{-1} \left( \mathbf{Y} - \mathbf{X} \boldsymbol{\beta} \right) \rightarrow \min_{\boldsymbol{\beta}} \] This criterion function can be thought of as a generalization of the \(RSS\) function, which is minimized in the OLS case. The effect of such weighting is clear when \(\mathbf{\Omega}\) is diagonal - each observation is simply given a weight proportional to the inverse of the variance of its error term. The Error Variance Estimator

Next, we need to estimate the error variance: \[ \begin{aligned} \widehat{\sigma}^2 &= \dfrac{1}{N-(k+1)} \left( \mathbf{Y}^* - \mathbf{X}^* \widehat{\boldsymbol{\beta}}_{GLS}\right)^\top \left( \mathbf{Y}^* - \mathbf{X}^* \widehat{\boldsymbol{\beta}}_{GLS}\right) \\ &= \dfrac{1}{N-(k+1)} \left( \mathbf{\Psi}^\top \mathbf{Y} - \mathbf{\Psi}^\top\mathbf{X} \widehat{\boldsymbol{\beta}}_{GLS}\right)^\top \left( \mathbf{\Psi}^\top \mathbf{Y} - \mathbf{\Psi}^\top\mathbf{X}\widehat{\boldsymbol{\beta}}_{GLS}\right) \\ &= \dfrac{1}{N-(k+1)} \left( \mathbf{Y} - \mathbf{X} \widehat{\boldsymbol{\beta}}_{GLS}\right)^\top \mathbf{\Psi} \mathbf{\Psi}^\top \left( \mathbf{Y} - \mathbf{X}\widehat{\boldsymbol{\beta}}_{GLS}\right) \end{aligned} \] which leads to the following unbiased and consistent estimator of \(\sigma^2\):

\[ \widehat{\sigma}^2 = \dfrac{1}{N-(k+1)} \left( \mathbf{Y} - \mathbf{X} \widehat{\boldsymbol{\beta}}_{GLS}\right)^\top \mathbf{\Omega}^{-1} \left( \mathbf{Y} - \mathbf{X}\widehat{\boldsymbol{\beta}}_{GLS}\right) \] Properties of the GLS Estimator

Because of our conveniently written GMLR model form \(\mathbf{Y}^* = \mathbf{X}^* \boldsymbol{\beta} + \boldsymbol{\varepsilon}^*\), the following GLS properties can be derived following the same principles as in the OLS case:

  • The GLS is an unbiased estimator: \[ \mathbb{E} \left( \widehat{\boldsymbol{\beta}}_{GLS} \right) = \boldsymbol{\beta} + \left( \mathbf{X}^{*\top} \mathbf{X}^* \right)^{-1} \mathbf{X}^{*\top} \mathbb{E} \left(\boldsymbol{\varepsilon}^* \right) = \boldsymbol{\beta} \]
  • The GLS variance-covariance matrix is: \[ \begin{aligned} \mathbb{V}{\rm ar} \left( \widehat{\boldsymbol{\beta}}_{GLS} \right) = \sigma^2 \left( \mathbf{X}^{*\top} \mathbf{X}^* \right)^{-1} = \sigma^2 \left( \mathbf{X}^\top \mathbf{\Psi} \mathbf{\Psi}^\top\mathbf{X} \right)^{-1} = \sigma^2 \left( \mathbf{X}^\top \mathbf{\Omega}^{-1} \mathbf{X} \right)^{-1} \end{aligned} \]
  • If the errors are normally distributed, then: \[ \widehat{\boldsymbol{\beta}}_{GLS} | \mathbf{X} \sim \mathcal{N} \left(\boldsymbol{\beta},\ \sigma^2 \left( \mathbf{X}^\top \mathbf{\Omega}^{-1} \mathbf{X} \right)^{-1}\right) \]
  • \(\widehat{\sigma}^2\) is unbiased and consistent.

The GLS estimator is BLUE for the GMLR.

Consequently, for the GMLR, the OLS estimator is also inefficient.

Sometimes, we may decompose the whole variance-covariance matrix: \[ \mathbb{V}{\rm ar}\left( \boldsymbol{\varepsilon} | \mathbf{X} \right) = \sigma_\epsilon^2 \mathbf{\Omega} = \mathbf{\Sigma}, \quad \mathbf{\Sigma}^{-1} = \mathbb{\Phi} \mathbb{\Phi}^{-1} \] Then the variance-covariance matrix is: \(\mathbb{V}{\rm ar} \left( \mathbf{\Phi}^\top\boldsymbol{\varepsilon} \right) = \mathbf{\Phi}^\top \mathbb{V}{\rm ar} \left( \boldsymbol{\varepsilon} \right) \mathbf{\Phi}= \mathbf{I}\)

In some econometric software, the variance-covariance matrix may be decomposed in this way. The difference is that in this case the GLS error variance is known to be \(1\), while before, we needed to estimate \(\sigma^2\).

Either way, the GLS estimates will be the same, regardless of the error variance-covariance specification used. Multiple Linear Restriction Test

\(M\) multiple Linear Restrictions can be tested via the following hypothesis: \[ \begin{cases} H_0&: \mathbf{L} \boldsymbol{\beta} = \boldsymbol{r}\\ H_1&: \mathbf{L} \boldsymbol{\beta} \neq \boldsymbol{r} \end{cases} \] Then, the \(F\)-statistic is: \[ F = \dfrac{ \left( \mathbf{L} \widehat{\boldsymbol{\beta}}_{GLS} - \boldsymbol{r} \right)^\top \left[ \mathbf{L} \left( \mathbf{X}^\top \mathbf{\Omega}^{-1} \mathbf{X} \right)^{-1} \mathbf{L}^\top \right]^{-1} \left( \mathbf{L} \widehat{\boldsymbol{\beta}}_{GLS} - \boldsymbol{r} \right)}{M\widehat{\sigma}^2} \sim F_{(M, N - (k+1))} \]

4.6.4 Weighted Least Squares

It is particularly easy to obtain GLS estimates, when the error terms are heteroskedastic but uncorrelated - this implies that the matrix is \(\mathbf{\Omega}\) diagonal, with non-constant diagonal elements: \[ \mathbf{\Omega} = \begin{bmatrix} \omega_1^2 & 0 & 0 &... & 0\\ 0 & \omega_2^2 & 0 & ... &0\\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & ... & \omega_N^2 \end{bmatrix} \iff \mathbf{\Omega}^{-1} = \begin{bmatrix} \omega_1^{-2} & 0 & 0 &... & 0\\ 0 & \omega_2^{-2} & 0 & ... &0\\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & ... & \omega_N^{-2} \end{bmatrix} \] So, if we select matrix \(\mathbf{\Psi}\) as: \[ \mathbf{\Psi} = \begin{bmatrix} \omega_1^{-1} & 0 & 0 &... & 0\\ 0 & \omega_2^{-1} & 0 & ... &0\\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & ... & \omega_N^{-1} \end{bmatrix} \] Then, our typical regression expression \(\mathbf{\Psi}^\top \mathbf{Y} = \mathbf{\Psi}^\top\mathbf{X} \boldsymbol{\beta} + \mathbf{\Psi}^\top\boldsymbol{\varepsilon}\) can be written as: \[ Y_i / \omega_i = \beta_0 \cdot(1/\omega_i) + \beta_1 \cdot(X_{1,i} / \omega_i) + ... + \beta_k \cdot(X_{k,i} / \omega_i) + \epsilon_i/\omega_i \] In other words, all of the variables, including the constant term, are multiplied by the same weights \(\omega_i^{-1}\). Consequently, \(\beta_0\) is no longer multiplied by a constant, so when estimating the model with these transformed variables via OLS, create a new variable for \(1/\omega_i\), and exclude a constant from your model.

There are various ways of determining the weights used in weighted least squares estimation. In the simplest case, either from economic theory, or from some preliminary testing, we may assume that \(\mathbb{E}(\epsilon_i^2)\) is proportional to \(Z_i^2\), where \(Z_i\) is some observed variable. For example \(Z_i\) may be the population, or income. Alternatively, sometimes \(\mathbb{E}(\epsilon_i^2)\) may be proportional the sample size used to obtain an average (or total) value of observation \(i\), which is saved in the dataset.

  • If observation \(Y_i\) is an average of \(N_i\) equally variable (uncorrelated) observations, then \(\mathbb{V}{\rm ar} \left( Y_i \right) = \sigma^2 / N_i\) and \(\omega_i = 1/\sqrt{N_i}\);
  • If observation \(Y_i\) is an aggregated total of \(N_i\) (uncorrelated) observations, then \(\mathbb{V}{\rm ar} \left( Y_i \right) = N_i \sigma^2\) and \(\omega_i = \sqrt{N_i}\);
  • If the variance of \(Y_i\) is proportional to some predictor \(Z_i\), then \(\mathbb{V}{\rm ar} \left( Y_i \right) = Z_i^2\sigma^2\) and \(\omega_i = Z_i\);

It is possible to report various summary statistics, like \(R^2\), \(ESS\) and \(RSS\) in terms of \(Y_i\), or \(Y_i / \omega_i\), however, \(R^2\) is only valid for the transformed variables \(Y_i / \omega_i\) (since the coefficients are estimated on the transformed dependent variables).

4.6.5 Feasible Generalized Least Squares

In practice the true form of \(\mathbf{\Omega}\) is unknown.

Even in the simplest case of \(\mathbf{\Omega} = \text{diag} \left( \omega_1^2, ..., \omega_N^2 \right)\), we would need to estimate a total of \(k+1 + N\) parameters (\(\beta_0,...,\beta_k\) and \(\omega_1^2, ..., \omega_N^2\)), which is impossible since we only have \(N\) observations (and a rule of thumb says that we would need a minimum of 5 - 20 observations per parameters in order to get satisfactory estimates of \(\boldsymbol{\beta}\)).

Therefore, we need to assume some simplifying conditions for \(\mathbf{\Omega}\). For example, if \(\mathbf{\Omega}\) is diagonal, suppose that \(\omega_i^2 = \alpha_0 + \alpha_1 X_{m,i}\) for some \(m = 1,...,k\) - now we only need to estimate two additional parameters (as opposed to \(N\)).

The case, where we use an estimated matrix \(\widehat{\mathbf{\Omega}}\), is known as the feasible (or, estimable) generalized least squares (FGLS). The estimators have good properties in large samples.

Consequently, using the estimated matrix of \(\mathbf{\Omega}\) results in the following FGLS estimator:

The FGLS estimator is defined as: \[ \widehat{\boldsymbol{\beta}}_{FGLS} = \left( \mathbf{X}^\top \widehat{\mathbf{\Omega}}^{-1} \mathbf{X}\right)^{-1} \mathbf{X}^\top \widehat{\mathbf{\Omega}}^{-1} \mathbf{Y} \] where \(\widehat{\mathbf{\Omega}} = \widehat{\mathbf{\Omega}}(\boldsymbol{\theta})\) is a parametric estimation (with parameter vector \(\boldsymbol{\theta}\)) of the true unknown matrix \(\mathbf{\Omega}\).

For the weighted least squares case we would need to:

  1. Estimate a model via OLS and calculate the residuals \(\widehat{\epsilon}_{i,(OLS)}\). If the regression is correctly specified, an estimate of \(\omega_i^2\) is the OLS squared residual \(\widehat{\epsilon}_{i,(OLS)}^2\).
  2. Generally, the residuals are much too variable to be used directly in estimating the weights, so instead we could use:
    • some other kind of transformation of the residuals, or their squares. For example, \(\log(\widehat{\epsilon}^2_i)\). Then regressing these variables on some selected predictors and using the fitted values as weights.
    • regressing the squared residuals against the fitted values. Then the square root of the fitted values could be used as \(\omega_i\);
    • regressing the absolute residuals against the fitted values. Then the fitted values could be used as \(\omega_i\);

Plotting the residuals (or their squared values, or their absolute values, or the root square of their absolute value) against the independent variable \(X\) would help in determining the regression form.

The resulting fitted values from this regression are estimates of \(\omega_i\).

Generally, the structure can be imposed in two most popular ways: by assuming error heteroskedasticity, or by assuming error serial correlation.

If the assumption about the error covariance structure is incorrect - heteroskedasticity still remains and FGLS is is no longer BLUE. In this case:

  • FGLS it is still unbiased, just like OLS;
  • the FGLS estimator of the error variance is biased, just like OLS (but the magnitude of the bias is not the same);

Furthermore, regarding the use of FGLS instead of GLS, (Hayashi 2011) stated that:

Since we do not know the true \(\mathbf{\Omega}\), but instead estimate \(\widehat{\mathbf{\Omega}}\) from the sample, then it becomes a random variable (just like when we estimate other parameters, like \(\widehat{\boldsymbol{\beta}}\)). This fact affects the distribution of the GLS estimator. Very little is known about the finite-sample properties of the FGLS estimator.

4.6.6 A Note on Coefficient Interpretation

Since we are using the following model: \[ \begin{aligned} \mathbf{Y}^* &= \mathbf{X}^* \boldsymbol{\beta} + \boldsymbol{\varepsilon}^*,\\ &\text{ where } \mathbf{Y}^* = \mathbf{\Psi}^\top \mathbf{Y}, \quad \mathbf{X}^* = \mathbf{\Psi}^\top\mathbf{X}, \quad \boldsymbol{\varepsilon}^* = \mathbf{\Psi}^\top\boldsymbol{\varepsilon} \end{aligned} \] to estimate \(\widehat{\boldsymbol{\beta}}\) with GLS (or FGLS), then the predicted values are: \[ \begin{aligned} \widehat{\mathbf{Y}^*} &= \mathbf{X}^* \widehat{\boldsymbol{\beta}}_{GLS} \\ &= \mathbf{\Psi}^\top \mathbf{X} \widehat{\boldsymbol{\beta}}_{GLS}\quad \left(i.e.\ = \mathbf{\Psi}^\top \widehat{\mathbf{Y}} \right) \end{aligned} \] In other words multiplying both sides by \(\left(\mathbf{\Psi}^\top\right)^{-1}\) yields: \[ \begin{aligned} \left(\mathbf{\Psi}^\top\right)^{-1}\widehat{\mathbf{Y}^*} &= \mathbf{X} \widehat{\boldsymbol{\beta}}_{GLS} = \widehat{\mathbf{Y}} \end{aligned} \] If we were to use \(\mathbf{X}^*\) to calculate the fitted/predicted values, then we would need to multiply the fitted values \(\widehat{\mathbf{Y}^*}\) by \(\left(\mathbf{\Psi}^\top\right)^{-1}\) to get the fitted values for the original data \(\widehat{\mathbf{Y}}\). Alternatively, this expression also means that we can use the the original design matrix \(\mathbf{X}\) with the GLS estimates of \(\boldsymbol{\beta}\) to get the fitted/predicted values of \(\mathbf{Y}\). In other words:

The GLS (as well as WLS and FGLS) estimates of \(\boldsymbol{\beta}\) retain their coefficient interpretation of how a unit increase in one explanatory variable \(X_j\) affects the dependent variable \(Y\), ceteris paribus.


Davidson, R., C. R.C. E.R. Davidson, J. G. MacKinnon, and E. P.P. E.J. G. MacKinnon. 2004. Econometric Theory and Methods. Oxford University Press. https://books.google.lt/books?id=vqkap8AwAy0C.

Hayashi, F. 2011. Econometrics. Princeton University Press. https://books.google.lt/books?id=QyIW8WUIyzcC.