## 4.4 Restricted Least Squares

This section is mainly based on Source 1 and Source 2.

Suppose that we have the following multiple regression with $$k$$ independent variables $$X_j$$, $$j = 1,...,k$$, in matrix notation, with $$M < k +1$$ different linear restrictions on the parameters: \begin{aligned} \mathbf{Y} &= \mathbf{X} \boldsymbol{\beta} + \boldsymbol{\varepsilon}\\ \mathbf{L} \boldsymbol{\beta} &= \boldsymbol{r} \end{aligned} where: $\mathbf{L} = \begin{bmatrix} c_{10} & c_{11} & ... & c_{1k} \\ c_{20} & c_{21} & ... & c_{2k} \\ \vdots & \vdots & \ddots & \vdots \\ c_{M0} & c_{M1} & ... & c_{Mk} \\ \end{bmatrix},\quad \boldsymbol{r} = \begin{bmatrix} r_1\\ r_2\\ \vdots \\ r_M \end{bmatrix}$

### 4.4.1 Estimator Derivation

Lagrange multipliers are widely used to solve various constrained optimization problems in economics.

In general, in order to find the stationary points of a function $$f(\mathbf{X})$$, subject to an equality constraint $$g(\mathbf{X}) = \boldsymbol{0}$$, from the Lagrangian function: $\mathcal{L}(\mathbf{X}, \boldsymbol{\lambda}) = f(\mathbf{X}) + \boldsymbol{\lambda}g(\mathbf{X})$ Note that the solution corresponding to the above constrained optimization is always a saddle point of the Lagrangian function.

As before, we want to minimize the sum of squares, but this time, they are subject to the condition that $$\mathbf{L} \boldsymbol{\beta} = \boldsymbol{r}$$. This leads to the Lagrangian function: \begin{aligned} L(\boldsymbol{\beta} , \boldsymbol{\lambda} ) &= \left( \mathbf{Y} - \mathbf{X} \boldsymbol{\beta} \right)^\top \left( \mathbf{Y} - \mathbf{X} \boldsymbol{\beta} \right) + 2\boldsymbol{\lambda}^\top (\mathbf{L} \boldsymbol{\beta} - \boldsymbol{r})\\ &= \mathbf{Y} ^\top \mathbf{Y} - 2\mathbf{Y}^\top \mathbf{X} \boldsymbol{\beta} + \boldsymbol{\beta}^\top \mathbf{X}^\top \mathbf{X} \boldsymbol{\beta} + 2 \boldsymbol{\lambda}^\top \mathbf{L} \boldsymbol{\beta} - 2\boldsymbol{\lambda}^\top \boldsymbol{r} \end{aligned} Then equating the partial derivatives to zero: $\begin{cases} \dfrac{\partial L(\widehat{\boldsymbol{\beta}} , \widehat{\boldsymbol{\lambda}} )}{\partial \widehat{\boldsymbol{\beta}}} &= -2 \mathbf{X}^\top \mathbf{Y} + 2 \mathbf{X}^\top \mathbf{X} \widehat{\boldsymbol{\beta}} + 2 \boldsymbol{\lambda}^\top \mathbf{L} = 0 \\\\ \dfrac{\partial L(\widehat{\boldsymbol{\beta}} , \boldsymbol{\lambda} )}{\partial \boldsymbol{\lambda}} &= 2 \mathbf{L} \boldsymbol{\beta} - 2\boldsymbol{r} = 0 \end{cases}$ After some rearranging, we have: $\begin{cases} \mathbf{X}^\top \mathbf{X} \widehat{\boldsymbol{\beta}} + \mathbf{L}^\top \boldsymbol{\lambda} = \mathbf{X}^\top \mathbf{Y} \\\\ \mathbf{L} \boldsymbol{\beta} = \boldsymbol{r} \end{cases}$ which we rewrite in the following block-matrix form: $\begin{bmatrix} \mathbf{X}^\top \mathbf{X} & \mathbf{L}^\top \\ \mathbf{L} & \boldsymbol{0} \end{bmatrix} \begin{bmatrix} \widehat{\boldsymbol{\beta}} \\ \boldsymbol{\lambda} \end{bmatrix} = \begin{bmatrix} \mathbf{X}^\top \mathbf{Y} \\ \boldsymbol{r} \end{bmatrix}$ which we can further rewrite to get the solution: \begin{aligned} \begin{bmatrix} \widehat{\boldsymbol{\beta}} \\ \boldsymbol{\lambda} \end{bmatrix} &= \begin{bmatrix} \mathbf{X}^\top \mathbf{X} & \mathbf{L}^\top \\ \mathbf{L} & \boldsymbol{0} \end{bmatrix}^{-1} \begin{bmatrix} \mathbf{X}^\top \mathbf{Y} \\ \boldsymbol{r} \end{bmatrix} \end{aligned} Alternatively, from $$\dfrac{\partial L(\widehat{\boldsymbol{\beta}} , \widehat{\boldsymbol{\lambda}} )}{\partial \widehat{\boldsymbol{\beta}}} = 0$$, we have that: \begin{aligned} \widehat{\boldsymbol{\beta}}^{(RLS)} &= \left( \mathbf{X}^\top \mathbf{X} \right)^{-1} \mathbf{X}^\top \mathbf{Y} - \left( \mathbf{X}^\top \mathbf{X} \right)^{-1} \mathbf{L}^\top \boldsymbol{\lambda} \\ &=\widehat{\boldsymbol{\beta}}^{(OLS)} - \left( \mathbf{X}^\top \mathbf{X} \right)^{-1} \mathbf{L}^\top \boldsymbol{\lambda} \end{aligned} Multiplying both sides by $$\mathbf{L}$$ yields: \begin{aligned} \mathbf{L} \widehat{\boldsymbol{\beta}}^{(RLS)} &= \mathbf{L}\widehat{\boldsymbol{\beta}}^{(OLS)} - \mathbf{L}\left( \mathbf{X}^\top \mathbf{X} \right)^{-1} \mathbf{L}^\top \boldsymbol{\lambda} \end{aligned} Since $$\mathbf{L} \widehat{\boldsymbol{\beta}}^{(RLS)} = \boldsymbol{r}$$, we can express $$\boldsymbol{\lambda}$$ as: \begin{aligned} \boldsymbol{\lambda} &= \left( \mathbf{L}\left( \mathbf{X}^\top \mathbf{X} \right)^{-1} \mathbf{L}^\top \right)^{-1} \left( \mathbf{L}\widehat{\boldsymbol{\beta}}^{(OLS)} - \boldsymbol{r} \right) \end{aligned} Then, we can substitute it into the initial $$\widehat{\boldsymbol{\beta}}^{(RLS)}$$ expression to get:

$\widehat{\boldsymbol{\beta}}^{(RLS)} =\widehat{\boldsymbol{\beta}}^{(OLS)} - \left( \mathbf{X}^\top \mathbf{X} \right)^{-1} \mathbf{L}^\top \left( \mathbf{L}\left( \mathbf{X}^\top \mathbf{X} \right)^{-1} \mathbf{L}^\top \right)^{-1} \left( \mathbf{L}\widehat{\boldsymbol{\beta}}^{(OLS)} - \boldsymbol{r} \right)$

what this means is that the RLS estimator can be defined as: $\widehat{\boldsymbol{\beta}}^{(RLS)} = \widehat{\boldsymbol{\beta}}^{(OLS)} + \text{"Restriction Adjustment"}$ where the Restriction Adjustment is the divergence between $$\mathbf{L}\widehat{\boldsymbol{\beta}}^{(OLS)}$$ and $$\mathbb{E} \left(\mathbf{L}\widehat{\boldsymbol{\beta}}^{(OLS)} \right) = \boldsymbol{r}$$.

We note the following terms can be expressed as: \begin{aligned} \mathbb{V}{\rm ar}\left( \mathbf{L}\widehat{\boldsymbol{\beta}}^{(OLS)} \right) &= \mathbf{L} \mathbb{V}{\rm ar} \left( \widehat{\boldsymbol{\beta}}^{(OLS)} \right)\mathbf{L}^\top = \sigma^2 \mathbf{L} \left( \mathbf{X}^\top \mathbf{X} \right)^{-1} \mathbf{L}^\top\\ \mathbb{C}{\rm ov}\left( \widehat{\boldsymbol{\beta}}^{(OLS)} ,\ \mathbf{L}\widehat{\boldsymbol{\beta}}^{(OLS)} \right) &= \sigma^2 \left( \mathbf{X}^\top \mathbf{X} \right)^{-1} \mathbf{L}^\top \end{aligned} which means that the RLS estimator is: $\widehat{\boldsymbol{\beta}}^{(RLS)} =\widehat{\boldsymbol{\beta}}^{(OLS)} - \mathbb{C}{\rm ov}\left( \widehat{\boldsymbol{\beta}}^{(OLS)} ,\ \mathbf{L}\widehat{\boldsymbol{\beta}}^{(OLS)} \right) \left( \mathbb{V}{\rm ar}\left( \mathbf{L}\widehat{\boldsymbol{\beta}}^{(OLS)} \right) \right)^{-1} \left( \mathbf{L}\widehat{\boldsymbol{\beta}}^{(OLS)} - \boldsymbol{r} \right)$

### 4.4.2 Properties of the RLS Estimator

Similarly to OLS, the RLS estimator has a number of properties, which can be derived and verified.

#### 4.4.2.1 Exact Restrictions with RLS Estimator

To verify that $$\mathbf{L} \widehat{\boldsymbol{\beta}}^{(RLS)} = \mathbf{r}$$, multiply the expression of $$\widehat{\boldsymbol{\beta}}^{(RLS)}$$ by $$\mathbf{L}$$ to get: \begin{aligned} \mathbf{L} \widehat{\boldsymbol{\beta}}^{(RLS)} &= \mathbf{L}\left[ \widehat{\boldsymbol{\beta}}^{(OLS)} - \left( \mathbf{X}^\top \mathbf{X} \right)^{-1} \mathbf{L}^\top \left( \mathbf{L}\left( \mathbf{X}^\top \mathbf{X} \right)^{-1} \mathbf{L}^\top \right)^{-1} \left( \mathbf{L}\widehat{\boldsymbol{\beta}}^{(OLS)} - \boldsymbol{r} \right)\right]\\ &=\mathbf{L} \widehat{\boldsymbol{\beta}}^{(OLS)} - \left( \mathbf{L}\widehat{\boldsymbol{\beta}}^{(OLS)} - \boldsymbol{r} \right) \\ &= \boldsymbol{r} \end{aligned}

#### 4.4.2.2 Unbiasedness of RLS Estimator

We begin by examining the estimation error of estimating $$\boldsymbol{\beta}$$ via RLS in a similar way as we did for OLS: \begin{aligned} \widehat{\boldsymbol{\beta}}^{(RLS)} &= \widehat{\boldsymbol{\beta}}^{(OLS)} - \left( \mathbf{X}^\top \mathbf{X} \right)^{-1} \mathbf{L}^\top \left( \mathbf{L}\left( \mathbf{X}^\top \mathbf{X} \right)^{-1} \mathbf{L}^\top \right)^{-1} \left( \mathbf{L}\widehat{\boldsymbol{\beta}}^{(OLS)} - \mathbf{L}\boldsymbol{\beta} \right) \\ &= \widehat{\boldsymbol{\beta}}^{(OLS)} + \left( \mathbf{X}^\top \mathbf{X} \right)^{-1} \mathbf{L}^\top \left( \mathbf{L}\left( \mathbf{X}^\top \mathbf{X} \right)^{-1} \mathbf{L}^\top \right)^{-1} \mathbf{L} \left( \boldsymbol{\beta} - \widehat{\boldsymbol{\beta}}^{(OLS)}\right) \\ \end{aligned} if $$\mathbf{L} \boldsymbol{\beta} = \boldsymbol{r}$$.

Then the expected value of the RLS estimator is: \begin{aligned} \mathbb{E} \left( \widehat{\boldsymbol{\beta}}^{(RLS)} \right) &=\mathbb{E} \left( \widehat{\boldsymbol{\beta}}^{(OLS)} \right) + \left[\left( \mathbf{X}^\top \mathbf{X} \right)^{-1} \mathbf{L}^\top \left( \mathbf{L}\left( \mathbf{X}^\top \mathbf{X} \right)^{-1} \mathbf{L}^\top \right)^{-1} \mathbf{L} \right]\mathbb{E}\left( \boldsymbol{\beta} - \widehat{\boldsymbol{\beta}}^{(OLS)}\right) = \boldsymbol{\beta} \end{aligned} if $$\mathbb{E} \left( \widehat{\boldsymbol{\beta}}^{(OLS)} \right) = \boldsymbol{\beta}$$.

In other words:

If the OLS estimator is unbiased, then the RLS estimator is also unbiased as long as the constraints $$\mathbf{L} \boldsymbol{\beta} = \boldsymbol{r}$$ are true.

Otherwise, the RLS estimator is biased, i.e. $$\mathbb{E} \left( \widehat{\boldsymbol{\beta}}^{(RLS)} \right) \neq \boldsymbol{\beta}$$.

#### 4.4.2.3 Efficiency of RLS Estimator

Noting that the difference $$\widehat{\boldsymbol{\beta}}^{(RLS)} - \boldsymbol{\beta}$$ can be expressed as: \begin{aligned} \widehat{\boldsymbol{\beta}}^{(RLS)} - \boldsymbol{\beta} &= \widehat{\boldsymbol{\beta}}^{(OLS)} - \boldsymbol{\beta} - \left( \mathbf{X}^\top \mathbf{X} \right)^{-1} \mathbf{L}^\top \left( \mathbf{L}\left( \mathbf{X}^\top \mathbf{X} \right)^{-1} \mathbf{L}^\top \right)^{-1} \left( \mathbf{L}\widehat{\boldsymbol{\beta}}^{(OLS)} - \mathbf{L}\boldsymbol{\beta} \right) \\ &= \left[ \mathbf{I} - \left( \mathbf{X}^\top \mathbf{X} \right)^{-1} \mathbf{L}^\top \left( \mathbf{L}\left( \mathbf{X}^\top \mathbf{X} \right)^{-1} \mathbf{L}^\top \right)^{-1} \mathbf{L}\right] \left( \widehat{\boldsymbol{\beta}}^{(OLS)} - \boldsymbol{\beta}\right) \\ &= \mathbf{D} \left( \widehat{\boldsymbol{\beta}}^{(OLS)} - \boldsymbol{\beta}\right) \end{aligned} where $$\mathbf{D} = \mathbf{I} - \left( \mathbf{X}^\top \mathbf{X} \right)^{-1} \mathbf{L}^\top \left( \mathbf{L}\left( \mathbf{X}^\top \mathbf{X} \right)^{-1} \mathbf{L}^\top \right)^{-1} \mathbf{L}$$.

This allows us to calculate the variance of the RLS estimator: \begin{aligned} \mathbb{V}{\rm ar} \left(\widehat{\boldsymbol{\beta}}^{(RLS)} \right) &= \mathbb{E} \left[(\widehat{\boldsymbol{\beta}}^{(RLS)} - \boldsymbol{\beta})(\widehat{\boldsymbol{\beta}}^{(RLS)} - \boldsymbol{\beta}))^\top \right] \\ &= \mathbf{D} \mathbb{E} \left[(\widehat{\boldsymbol{\beta}}^{(RLS)} - \boldsymbol{\beta})(\widehat{\boldsymbol{\beta}}^{(RLS)} - \boldsymbol{\beta})^\top \right] \mathbf{D}^\top \\ &= \sigma^2 \mathbf{D} \left( \mathbf{X}^\top \mathbf{X}\right)^{-1} \mathbf{D}^\top \\ &= \sigma^2 \mathbf{D} \left( \mathbf{X}^\top \mathbf{X}\right)^{-1} \\ &= \sigma^2 \left( \mathbf{X}^\top \mathbf{X}\right)^{-1} - \sigma^2 \left( \mathbf{X}^\top \mathbf{X}\right)^{-1} \mathbf{L}^\top \left( \mathbf{L}\left( \mathbf{X}^\top \mathbf{X} \right)^{-1} \mathbf{L}^\top \right)^{-1} \mathbf{L} \left( \mathbf{X}^\top \mathbf{X}\right)^{-1} \end{aligned} Since $$\mathbb{V}{\rm ar} \left(\widehat{\boldsymbol{\beta}}^{(OLS)} \right) = \sigma^2 \left( \mathbf{X}^\top \mathbf{X}\right)^{-1}$$, we can calculate the difference in the variances of the RLS and OLS estimators: \begin{aligned} \mathbb{V}{\rm ar} \left(\widehat{\boldsymbol{\beta}}^{(OLS)} \right) - \mathbb{V}{\rm ar} \left(\widehat{\boldsymbol{\beta}}^{(RLS)} \right) &= \sigma^2 \left( \mathbf{X}^\top \mathbf{X}\right)^{-1} \mathbf{L}^\top \left( \mathbf{L}\left( \mathbf{X}^\top \mathbf{X} \right)^{-1} \mathbf{L}^\top \right)^{-1} \mathbf{L} \left( \mathbf{X}^\top \mathbf{X}\right)^{-1} = \sigma^2 \mathbf{C} \end{aligned} It can be shown that the matrix $$\mathbf{C}$$ is a positive semi-definite matrix. Consequently, the diagonal elements of $$\text{diag}\left( \mathbb{V}{\rm ar} \left(\widehat{\boldsymbol{\beta}}^{(RLS)} \right) \right) \leq \text{diag}\left( \mathbb{V}{\rm ar} \left(\widehat{\boldsymbol{\beta}}^{(OLS)} \right) \right)$$. This means that:

If the a priori information on the parameters is correct (i.e. if the restrictions $$\mathbf{L} \boldsymbol{\beta} = \boldsymbol{r}$$ are true), the RLS estimator is more efficient than the OLS estimator.

It can be shown that even if $$\mathbf{L} \boldsymbol{\beta} \neq \boldsymbol{r}$$, then the RLS estimator remains at least as efficient as the OLS estimator.

#### 4.4.2.4 Consistency of RLS Estimator

From the OLS estimator we have that $$\widehat{\boldsymbol{\beta}}^{(OLS)} \rightarrow \boldsymbol{\beta}$$. This means that $$\left( \widehat{\boldsymbol{\beta}}^{(OLS)} - \boldsymbol{\beta} \right) \rightarrow \boldsymbol{0}$$. Applying this to the RLS estimator yields: \begin{aligned} \widehat{\boldsymbol{\beta}}^{(RLS)} &= \widehat{\boldsymbol{\beta}}^{(OLS)} + \left( \mathbf{X}^\top \mathbf{X} \right)^{-1} \mathbf{L}^\top \left( \mathbf{L}\left( \mathbf{X}^\top \mathbf{X} \right)^{-1} \mathbf{L}^\top \right)^{-1} \left( \boldsymbol{r} - \mathbf{L} \widehat{\boldsymbol{\beta}}^{(OLS)}\right) \rightarrow \boldsymbol{\beta} \end{aligned} if $$\mathbf{L} \boldsymbol{\beta} = \boldsymbol{r}$$ (which we can then plug in our RLS estimator expression). So, the RLS estimator is a consistent estimator of $$\boldsymbol{\beta}$$, provided the restrictions are correctly specified.

### 4.4.3 Estimating the Error Variance

According to the RLS estimator, the residual vector can be defined as: $\widehat{\boldsymbol{\varepsilon}} = \mathbf{Y} - \mathbf{X} \widehat{\boldsymbol{\beta}}_{RLS}$ The error variance is derived analogously to how it would be via OLS, except we have $$M$$ restrictions, so now:

The RLS (with $$M$$ restrictions) estimator of the error variance $$\sigma^2$$: $\widehat{\sigma}^2_{RLS} = \dfrac{\widehat{\boldsymbol{\varepsilon}}^\top \widehat{\boldsymbol{\varepsilon}}}{N - (k+1 - M)}$ is an unbiased estimator of $$\widehat{\sigma}^2$$.

Some final notes about the RLS estimator:

If the restrictions are correct, $$\mathbf{L} \boldsymbol{\beta} = \boldsymbol{r}$$, then:

• $$\widehat{\boldsymbol{\beta}}^{(RLS)}$$ is unbiased;
• $$\widehat{\boldsymbol{\beta}}^{(RLS)}$$ is more efficient than $$\widehat{\boldsymbol{\beta}}^{(OLS)}$$;
• $$\widehat{\boldsymbol{\beta}}^{(RLS)}$$ is BLUE;
• $$\widehat{\boldsymbol{\beta}}^{(RLS)}$$ is consistent;
• $$\widehat{\boldsymbol{\beta}}^{(RLS)}$$ is asymptotically normally distributed: $\widehat{\boldsymbol{\beta}}^{(RLS)} \sim \mathcal{N} \left(\boldsymbol{\beta},\ \sigma^2 \mathbf{D} \left( \mathbf{X}^\top \mathbf{X}\right)^{-1} \right)$ where $$\mathbf{D} = \mathbf{I} - \left( \mathbf{X}^\top \mathbf{X} \right)^{-1} \mathbf{L}^\top \left( \mathbf{L}\left( \mathbf{X}^\top \mathbf{X} \right)^{-1} \mathbf{L}^\top \right)^{-1} \mathbf{L}$$.

If the restrictions are incorrect, $$\mathbf{L} \boldsymbol{\beta} \neq \boldsymbol{r}$$, then:

• $$\widehat{\boldsymbol{\beta}}^{(RLS)}$$ is biased (and thus, no longer BLUE);
• $$\widehat{\boldsymbol{\beta}}^{(RLS)}$$ is remains more efficient than $$\widehat{\boldsymbol{\beta}}^{(OLS)}$$;
• $$\widehat{\boldsymbol{\beta}}^{(RLS)}$$ is inconsistent;

So, the variance of $$\widehat{\boldsymbol{\beta}}^{(RLS)}$$ is smaller than $$\widehat{\boldsymbol{\beta}}^{(OLS)}$$, whether the constraints are true or not.

### 4.4.4 Testing for Linear Restrictions

In practice, we begin by estimating an unrestricted model via OLS. We then test the following hypothesis of $$M$$ linear restrictions: $\begin{cases} H_0&: \mathbf{L} \boldsymbol{\beta} = \boldsymbol{r}\\ H_1&: \mathbf{L} \boldsymbol{\beta} \neq \boldsymbol{r} \end{cases}$

We can test it with the $$F$$-test, presented in subsection 4.2.4.4.2: $F = \dfrac{1}{M} \left( \mathbf{L} \widehat{\boldsymbol{\beta}}^{(OLS)} - \boldsymbol{r} \right)^\top \left[ \mathbf{L} \widehat{\mathbb{V}{\rm ar}} \left(\widehat{\boldsymbol{\beta}}^{(OLS)} \right) \mathbf{L}^\top \right]^{-1} \left( \mathbf{L} \widehat{\boldsymbol{\beta}}^{(OLS)} - \boldsymbol{r} \right) \sim F_{(M, N - (k+1))}$

Then:

• If the null hypothesis cannot be rejected - re-estimate the model via RLS;
• If the null hypothesis is rejected - keep the OLS estimates.

As we have seen - RLS is biased, unless the imposed constraints are exactly true.

Consequently, being knowledgeable about economics allows one to:

• Identify which variables may effect the dependent variable;
• Specify the model in a correct functional form;
• Specify any additional restrictions on the coefficients.

When these restrictions, as well as the model form itself, primarily come from your knowledge of economic theory, rather than from the data sample, they may be regarded as nonsample information. As can be seen from the first sections of this chapter, there are many different combinations of variables and their transformations, which would lead to a near-infinite number of possible models. This is where economic theory is very useful.

As we have noted, evidence of whether the imposed restrictions are true or not can be obtained by carrying out the specified $$F$$-test. Then again, if the test rejects the hypothesis about our constraints, but we are absolutely sure that they are valid from economic theory, then our RLS estimation may be closer to the true model (it will have lower variance), but the hypothesis may be rejected due to other reasons (for example, an incorrect model form, or omitted important variable).

If we are wrong, then the RLS estimates will be biased (though their variance will still be lower than OLS). In general, biased estimates are undesirable.

Example 4.21 We will simulate the following model:

\begin{aligned} Y_i = \beta_0 + \beta_1 X_{1,i} + \beta_2 X_{2,i} + \beta_3 X_{3,i} + \beta_4 X_{4,i} + \beta_5 X_{5,i} + \epsilon_i \end{aligned} where:

• $$\beta_1 = \beta_3$$;
• $$(-2) \cdot \beta_2 = \beta_4$$;
• $$\beta_5 = 0$$ (i.e. insignificant variable);
#
#
#
#
set.seed(123)
#
N <- 1000
beta_vec <- c(-5, 2, 3, 2, -6, 0)
#
x1 <- rnorm(mean = 5, sd = 2, n = N)
x2 <- sample(1:50, size = N, replace = TRUE)
x3 <- seq(from = 0, to = 10, length.out = N)
x4 <- rnorm(mean = 10, sd = 3, n = N)
x5 <- rnorm(mean = -2, sd = 3, n = N)
#
x_mat <- cbind(x1, x2, x3, x4, x5)
#
e <- rnorm(mean = 0, sd = 3, n = N)
#
y <- cbind(1, x_mat) %*% beta_vec + e
#
data_smpl <- data.frame(y, x_mat)
#
import numpy as np
import pandas as pd
import statsmodels.api as sm
#
np.random.seed(123)
#
N = 1000
beta_vec = [-5, 2, 3, 2, -6, 0]
#
x1 = np.random.normal(loc = 5, scale = 2, size = N)
x2 = np.random.choice(list(range(1, 51)), size = N, replace = True)
x3 = np.linspace(start = 0, stop = 10, num = N)
x4 = np.random.normal(loc = 10, scale = 3, size = N)
x5 = np.random.normal(loc = -2, scale = 3, size = N)
#
x_mat = np.column_stack([x1, x2, x3, x4, x5])
#
e  = np.random.normal(loc = 0, scale = 3, size = N)
#
y = np.dot(sm.add_constant(x_mat), beta_vec) + e
#
data_smpl = pd.DataFrame(np.column_stack([y, x_mat]),
columns = ["y", "x1", "x2", "x3", "x4", "x5"])

Next up, we will estimate the relevant model:

In Python, we can estimate a model via a formula specification without the additional module from import statsmodels.formula.api as smf.

y_mdl_fit <- lm("y ~ x1 + x2 + x3 + x4 + x5", data = data_smpl)
#
print(round(coef(summary(y_mdl_fit)), 5))
##             Estimate Std. Error    t value Pr(>|t|)
## (Intercept) -5.49252    0.47593  -11.54062  0.00000
## x1           2.01919    0.04760   42.42357  0.00000
## x2           3.01065    0.00658  457.86812  0.00000
## x3           2.02373    0.03265   61.98141  0.00000
## x4          -6.00084    0.03180 -188.71161  0.00000
## x5           0.01270    0.03121    0.40704  0.68407
y_mdl = sm.OLS.from_formula("y ~ x1 + x2 + x3 + x4 + x5", data = data_smpl)
y_mdl_fit = y_mdl.fit()
print(y_mdl_fit.summary().tables)
## ==============================================================================
##                  coef    std err          t      P>|t|      [0.025      0.975]
## ------------------------------------------------------------------------------
## Intercept     -4.3625      0.481     -9.066      0.000      -5.307      -3.418
## x1             2.0135      0.048     42.292      0.000       1.920       2.107
## x2             2.9994      0.007    450.490      0.000       2.986       3.012
## x3             2.0157      0.033     61.061      0.000       1.951       2.080
## x4            -6.0537      0.032   -186.394      0.000      -6.117      -5.990
## x5             0.0129      0.032      0.403      0.687      -0.050       0.076
## ==============================================================================

We can see that the coefficients (and their significance) is similar to what we used to generate the data sample.

Next up, we will carry out an $$F$$-test to test the following hypothesis: $\begin{cases} H_0 &: \beta_1 = \beta_3 \text{ and } (-2)\cdot \beta_2 = \beta_4\text{ and } \beta_5 = 0 \\ H_1 &: \text{at least one equality doesn't hold} \end{cases}$ Note that we can re-write the hypothesis alternatively as: $\begin{cases} H_0 &: \beta_1 - \beta_3 = 0 \text{ and } 2\cdot \beta_2 + \beta_4 = 0\text{ and } \beta_5 = 0 \\ H_1 &: \text{at least one equality doesn't hold} \end{cases}$

car::linearHypothesis(y_mdl_fit, c("x1 - x3 = 0", "2*x2 + x4 = 0", "x5 = 0"))
## Linear hypothesis test
##
## Hypothesis:
## x1 - x3 = 0
## 2 x2  + x4 = 0
## x5 = 0
##
## Model 1: restricted model
## Model 2: y ~ x1 + x2 + x3 + x4 + x5
##
##   Res.Df    RSS Df Sum of Sq      F Pr(>F)
## 1    997 8845.0
## 2    994 8840.1  3    4.9046 0.1838 0.9074
print(y_mdl_fit.f_test("x1 - x3 = 0, 2*x2 + x4 = 0, x5 = 0"))
## <F test: F=array([[0.85748152]]), p=0.462700142241137, df_denom=994, df_num=3>

which is equivalent to using a matrix restriction specification:

L <- matrix(c(0, 1, 0, -1, 0, 0,
0, 0, 2, 0, 1, 0,
0, 0, 0, 0, 0, 1), nrow = 3, byrow = TRUE)
rr <- c(0, 0, 0)
car::linearHypothesis(y_mdl_fit, hypothesis.matrix = L, rhs = rr)
## Linear hypothesis test
##
## Hypothesis:
## x1 - x3 = 0
## 2 x2  + x4 = 0
## x5 = 0
##
## Model 1: restricted model
## Model 2: y ~ x1 + x2 + x3 + x4 + x5
##
##   Res.Df    RSS Df Sum of Sq      F Pr(>F)
## 1    997 8845.0
## 2    994 8840.1  3    4.9046 0.1838 0.9074
L = np.array(([0,1,0,-1,0,0],
[0,0,2,0,1,0],
[0,0,0,0,0,1]))
rr = [0, 0, 0]
print(y_mdl_fit.f_test((L, rr)))
## <F test: F=array([[0.85748152]]), p=0.462700142241137, df_denom=994, df_num=3>

The $$p$$-value > 0.05, so we do not reject the null hypothesis.

Since we do not reject the null hypothesis - RLS would be more efficient than OLS and still be BLUE and Consistent. Let’s move on to estimating the regression via RLS:

We can extract the formula used in the lm() function via the formula() function. In order to estimate the model via RLS, we need to use the rls function from the lrmest package:

L <- matrix(c(0, 1, 0, -1, 0, 0,
0, 0, 2, 0, 1, 0,
0, 0, 0, 0, 0, 1), nrow = 3, byrow = TRUE)
rr <- c(0, 0, 0)
#
mdl_rls = lrmest::rls(formula = formula(y_mdl_fit), R = L, r = rr, data = data_smpl, delt = rep(0, length(rr)))
## Warning in model.matrix.default(md, cal, contrasts): non-list contrasts argument ignored
print(mdl_rls[])
##             Estimate Standard_error t_statistic pvalue
## (Intercept)  -5.3174         0.2885           0      1
## x1            2.0226         0.0269           0      1
## x2            3.0091         0.0061           0      1
## x3            2.0226         0.0269          NA     NA
## x4           -6.0181         0.0122          NA     NA
## x5            0.0000         0.0000          NA     NA

Note: In case a warning Warning in model.matrix.default(md, cal, contrasts): non-list contrasts \n argument ignored is present - if you run ?lrmest::rls and run the example in the help documentation - the warning message is also present there. This warning can most likely be safely ignored, as discussed here. Future package updates may address this warning.

We first need to have the original model estimated not as OLS, but as a GLM (Generalized Linear Model). We can use the formula, that was already specified in the previous OLS regression directly from the variable itself:

mdl_glm = sm.GLM.from_formula(formula = y_mdl.formula, data = data_smpl)
mdl_glm_fit = mdl_glm.fit()
print(mdl_glm_fit.summary().tables)
## ==============================================================================
##                  coef    std err          z      P>|z|      [0.025      0.975]
## ------------------------------------------------------------------------------
## Intercept     -4.3625      0.481     -9.066      0.000      -5.306      -3.419
## x1             2.0135      0.048     42.292      0.000       1.920       2.107
## x2             2.9994      0.007    450.490      0.000       2.986       3.012
## x3             2.0157      0.033     61.061      0.000       1.951       2.080
## x4            -6.0537      0.032   -186.394      0.000      -6.117      -5.990
## x5             0.0129      0.032      0.403      0.687      -0.050       0.076
## ==============================================================================

We can see that the output is very similar to OLS. In fact the GLM in this default specification estimates the parameters by maximizing the likelihood function, instead of using the exact OLS formulas.

Next, we can specify the restrictions:

mdl_rls = mdl_glm.fit_constrained("x1 - x3 = 0, 2*x2 + x4 = 0, x5 = 0")
print(mdl_rls.summary().tables)
## ==============================================================================
##                  coef    std err          z      P>|z|      [0.025      0.975]
## ------------------------------------------------------------------------------
## Intercept     -4.9523      0.291    -17.047      0.000      -5.522      -4.383
## x1             2.0138      0.027     73.299      0.000       1.960       2.068
## x2             3.0035      0.006    489.058      0.000       2.991       3.016
## x3             2.0138      0.027     73.299      0.000       1.960       2.068
## x4            -6.0070      0.012   -489.058      0.000      -6.031      -5.983
## x5                  0          0        nan        nan           0           0
## ==============================================================================

Note: since we have not excluded the insignificant variable, but instead set its value to zero in the restriction - we might get some warnings.

We can also use the same restriction matrix as for the test:

L = np.array(([0,1,0,-1,0,0],
[0,0,2,0,1,0],
[0,0,0,0,0,1]))
rr = [0, 0, 0]
#
mdl_rls = mdl_glm.fit_constrained((L, rr))
print(mdl_rls.summary().tables)
## ==============================================================================
##                  coef    std err          z      P>|z|      [0.025      0.975]
## ------------------------------------------------------------------------------
## Intercept     -4.9523      0.291    -17.047      0.000      -5.522      -4.383
## x1             2.0138      0.027     73.299      0.000       1.960       2.068
## x2             3.0035      0.006    489.058      0.000       2.991       3.016
## x3             2.0138      0.027     73.299      0.000       1.960       2.068
## x4            -6.0070      0.012   -489.058      0.000      -6.031      -5.983
## x5                  0          0        nan        nan           0           0
## ==============================================================================

As always, we can estimate the coefficients manually. This is especially useful when the output contains warnings, errors, NA, or nan values and we are not sure, whether we trust the output. Looking at the RLS formula, we will re-specify all of the required coefficients in one place, so as to make it easier to understand:

L <- matrix(c(0, 1, 0, -1, 0, 0,
0, 0, 2, 0, 1, 0,
0, 0, 0, 0, 0, 1), nrow = 3, byrow = TRUE)
rr <- c(0, 0, 0)
# Design matrix
X_d <- cbind(1, x_mat)
L = np.array(([0,1,0,-1,0,0],
[0,0,2,0,1,0],
[0,0,0,0,0,1]))
rr = [0, 0, 0]
# Design matrix
X_d = sm.add_constant(x_mat)

Next, we need to estimate the parameters via OLS:

xtx_inv  <- solve(t(X_d) %*% X_d)
beta_ols <- xtx_inv %*% t(X_d) %*% y
#
print(beta_ols)
##           [,1]
##    -5.49251716
## x1  2.01919172
## x2  3.01064730
## x3  2.02373361
## x4 -6.00084216
## x5  0.01270398
xtx_inv  = np.linalg.inv(np.dot(np.transpose(X_d), X_d))
beta_ols = np.dot(xtx_inv, np.dot(np.transpose(X_d), y))
#
print(pd.DataFrame(beta_ols, index = ["", "x1", "x2", "x3", "x4", "x5"]))
##            0
##    -4.362491
## x1  2.013499
## x2  2.999398
## x3  2.015684
## x4 -6.053676
## x5  0.012935

Then, since the RLS estimator can be expressed as $$\widehat{\boldsymbol{\beta}}^{(RLS)} = \widehat{\boldsymbol{\beta}}^{(OLS)} + \text{"Restriction Adjustment"}$$, we will calculate the adjustment component separately:

RA_1 <- xtx_inv %*% t(L)
RA_2 <- solve(L %*% xtx_inv %*% t(L))
RA_3 <- L %*% beta_ols - rr
RA <- RA_1 %*% RA_2 %*% RA_3
RA_1 = np.dot(xtx_inv, np.transpose(L))
RA_2 = np.linalg.inv(np.linalg.multi_dot([L, xtx_inv, np.transpose(L)]))
RA_3 = np.dot(L, beta_ols) - np.array(rr)
RA   = RA_1.dot(RA_2).dot(RA_3)

Having calculated the adjustment component, we can finally calculate the RLS estimator:

beta_rls <- beta_ols - RA
print(beta_rls)
##         [,1]
##    -5.317401
## x1  2.022635
## x2  3.009071
## x3  2.022635
## x4 -6.018142
## x5  0.000000
beta_rls = beta_ols - RA
print(pd.DataFrame(beta_rls, index = ["", "x1", "x2", "x3", "x4", "x5"]))
##            0
##    -4.952267
## x1  2.013822
## x2  3.003503
## x3  2.013822
## x4 -6.007007
## x5  0.000000

Since we know the asymptotic distribution, we can calculate the variance of the estimates:

y_fit <- X_d %*% beta_rls
#
resid <- y - y_fit
#
sigma2_rls <- sum(resid^2) / (nrow(data_smpl) - (length(beta_rls) - length(rr)))
print(paste0("Estimated RLS residual standard error: ", round(sqrt(sigma2_rls), 5)))
##  "Estimated RLS residual standard error: 2.97853"
y_fit = np.dot(X_d, beta_rls)
resid = np.array(y) - y_fit
#
sigma2_rls = np.sum(resid**2) / (len(data_smpl.index) - (len(beta_rls) - len(rr)))
print("Estimated RLS residual standard error: " + str(np.round(sigma2_rls, 5)))
## Estimated RLS residual standard error: 9.06007

Next, we will calculate the matrix $$\mathbf{D}$$, which we will need to estimate the RLS standard errors. Fortunately, we can use some of the partial expressions that we have calculated for the adjustment component in RLS estimator calculation:

D_mat <- diag(length(beta_rls)) - RA_1 %*% RA_2 %*% L
rls_vcov <- sigma2_rls * D_mat %*% xtx_inv
#
beta_rls_se <- sqrt(diag(rls_vcov))
D_mat = np.identity(len(beta_rls)) - RA_1.dot(RA_2).dot(L)
rls_vcov = sigma2_rls * D_mat.dot(xtx_inv)
#
beta_rls_se = np.sqrt(np.diag(rls_vcov))
print(data.frame(est = beta_rls,
se = beta_rls_se))
##          est           se
##    -5.317401 2.881625e-01
## x1  2.022635 2.687947e-02
## x2  3.009071 6.113634e-03
## x3  2.022635 2.687947e-02
## x4 -6.018142 1.222727e-02
## x5  0.000000 1.851409e-11
print(pd.DataFrame(np.column_stack([beta_rls, beta_rls_se]), columns = ["est", "se"],
index = ["", "x1", "x2", "x3", "x4", "x5"]))
##          est        se
##    -4.952267  0.290500
## x1  2.013822  0.027474
## x2  3.003503  0.006141
## x3  2.013822  0.027474
## x4 -6.007007  0.012283
## x5  0.000000       NaN

Note: we may have received a warning (which resulted in a NA, or nan value), this is because of the zero-value restriction on the last coefficient. Because of that restriction the variance of x5 coefficient is very close to zero (as it should be) but it is can also be negative:

print(tail(diag(rls_vcov), 1))
##           x5
## 3.427715e-22
print(np.diag(rls_vcov)[-1])
## -1.094085679017355e-22

Again, for cases where we have an insignificant parameter, instead of imposing a zero-value separate restriction in the RLS, it would be better to remove it first, and then re-estimate the model via RLS for any other restrictions (like parameter equality, difference, sum, ratio and other linear restrictions).

We would like to stress that, knowing how the estimates are calculated is very useful, since it is possible to identify the source of any errors, or warnings, that we may get using parameter estimator functions from various packages, which may vary in functionality, speed and quality.