4.2 OLS Estimation, Confidence Intervals and Hypothesis Testing

Most of the properties and formula expressions presented in this chapter are identical to the simple univariate regression case in chapter 3.3.

Following 4.1.6, the multiple regression model in this section is specified in the following matrix form: \[ \mathbf{Y} = \mathbf{X} \boldsymbol{\beta} + \boldsymbol{\varepsilon} \] We will further assume that the multiple regression assumptions (MR.1) - (MR.6) hold true.

Example 4.7 We will use the following model to aid our presented methodology:

\[ \begin{aligned} \log(Y_i) = \beta_0 &+ \beta_1 X_{1i} + \beta_2 \log(X_{2i}) + \beta_3 \text{MARRIED}_{i} + \beta_4 \text{AGE}\_\text{GROUP}_{1i} + \beta_5 \text{AGE}\_\text{GROUP}_{2i} \\ &+ \beta_6 (\text{AGE}\_\text{GROUP}_{2i} \times X_{1i}) + \beta_7 (\text{MARRIED}_{i} \times \text{AGE}\_\text{GROUP}_{1i}) + \epsilon_i \end{aligned} \] where \(MARRIED_i = 1\), if the \(i\)-th person is married, 0 otherwise; \(\text{AGE}\_\text{GROUP}_{ji}\) are different age groups: if \(j = 1\) - between \(20-30\); if \(j = 2\) - between \(31-65\), the base group, \(\text{AGE}\_\text{GROUP}_{OTHER}\), consists the people with ages in the remaining age brackets, not covered by \(j = 1,2\).

In this example the specified model has several distinctions:

  • The dependent variable \(Y\) is log-transformed;
  • Some independent variables are log-transformed;
  • Inclusion of indicator variables;
  • Cross-products (i.e. interaction terms) of some independent variables;
  • Not all indicator variable cross-products have a significant effect - for example \((\text{MARRIED}_{i} \times \text{AGE}\_\text{GROUP}_{2i})\) is not included, which means that married people, aged \(31-65\) do not have any additional effects on \(\log(Y_i)\), compared to non-married people in the base age group.

We begin by specifying the parameter vector and sample size:

We then generate the variables in the following way:

The different age groups can be generated randomly as well. We can further create separate indicator variables for two of the three groups. Doing it this way automatically classifies the remaining group of other ages as the base group and we will avoid the dummy variable trap:

Finally, we can create our dependent variable and combine all the data into a single dataset:

##           y        x1       x2 married age_gr1 age_gr2  age_group
## 1 29.313052 10.206100 2.450563       1       0       1 aged_31_65
## 2 14.656237 10.164646 2.976220       0       0       1 aged_31_65
## 3 28.228720 10.816951 2.255319       1       0       0      other
## 4  7.741518 11.713026 4.286608       1       0       1 aged_31_65
## 5  9.333522  6.220738 2.514393       1       1       0 aged_20_30
## 6  2.165643  5.813722 4.237797       1       0       1 aged_31_65

We may also want to re-level the categorical (factor) variable so that the age group - other - would be the base level (this is equivalent to being the first level):

## [1] aged_31_65 aged_31_65 other      aged_31_65 aged_20_30 aged_31_65
## Levels: aged_20_30 aged_31_65 other
## 0    aged_31_65
## 1         other
## 2    aged_20_30
## 3         other
## 4    aged_20_30
## Name: age_group, dtype: category
## Categories (3, object): [aged_20_30, aged_31_65, other]
## [1] aged_31_65 aged_31_65 other      aged_31_65 aged_20_30 aged_31_65
## Levels: other aged_20_30 aged_31_65
## 0    aged_31_65
## 1         other
## 2    aged_20_30
## 3         other
## 4    aged_20_30
## Name: age_group, dtype: category
## Categories (3, object): [other, aged_20_30, aged_31_65]

Note that in R categorical variables are treated as categorical variables (and not as text strings) automatically, while in Python they need to be separately created.

4.2.1 OLS Estimation of Regression Parameters

As before, we want to minimize the sum of squared residuals: \[ \begin{aligned} RSS(\boldsymbol{\beta}) &= \boldsymbol{\varepsilon}^\top \boldsymbol{\varepsilon} \\ &= \left( \mathbf{Y} - \mathbf{X} \boldsymbol{\beta} \right)^\top \left( \mathbf{Y} - \mathbf{X} \boldsymbol{\beta} \right) \\ &= \mathbf{Y} ^\top \mathbf{Y} - \boldsymbol{\beta}^\top \mathbf{X}^\top \mathbf{Y} - \mathbf{Y}^\top \mathbf{X} \boldsymbol{\beta} + \boldsymbol{\beta}^\top \mathbf{X}^\top \mathbf{X} \boldsymbol{\beta} \rightarrow \min_{\beta_0, \beta_1, ..., \beta_k} \end{aligned} \] Then, equating the partial derivative to zero: \[ \dfrac{\partial RSS(\widehat{\boldsymbol{\beta}})}{\partial \widehat{\boldsymbol{\beta}}} = -2 \mathbf{X}^\top \mathbf{Y} + 2 \mathbf{X}^\top \mathbf{X} \widehat{\boldsymbol{\beta}} = 0 \] gives us the OLS estimator:

\[ \widehat{\boldsymbol{\beta}} = \left( \mathbf{X}^\top \mathbf{X}\right)^{-1} \mathbf{X}^\top \mathbf{Y} \]

The Gauss-Markov Theorem for the multiple regression states that: if the conditions (MR.1)-(MR.5) hold true, the OLS estimators \(\widehat{\boldsymbol{\beta}}\) are the Best Linear Unbiased Estimators and they are consistent (BLUE&C) with the true parameter values of the multiple regression model.

Note that the proofs are analogous to the proofs in the simple univariate regression case since we are using the same matrix notation as before. This means that the variance-covariance matrix of the OLS estimator vector is: \[ \begin{aligned} \mathbb{V}{\rm ar} (\widehat{\boldsymbol{\beta}}) = \begin{bmatrix} \mathbb{V}{\rm ar} (\widehat{\beta}_0) & \mathbb{C}{\rm ov} (\widehat{\beta}_0, \widehat{\beta}_1) & ... & \mathbb{C}{\rm ov} (\widehat{\beta}_0, \widehat{\beta}_k)\\ \mathbb{C}{\rm ov} (\widehat{\beta}_1, \widehat{\beta}_0) & \mathbb{V}{\rm ar} (\widehat{\beta}_1) & ... & \mathbb{C}{\rm ov} (\widehat{\beta}_1, \widehat{\beta}_k) \\ \vdots & \vdots & \ddots & \vdots \\ \mathbb{C}{\rm ov} (\widehat{\beta}_k, \widehat{\beta}_0) & \mathbb{C}{\rm ov} (\widehat{\beta}_k, \widehat{\beta}_1) & ... & \mathbb{V}{\rm ar} (\widehat{\beta}_k) \end{bmatrix} &= \sigma^2 \left( \mathbf{X}^\top \mathbf{X}\right)^{-1} \end{aligned} \] The difference from the univariate case is in the estimation of the unknown error variance parameter \(\sigma^2\).

Example 4.8 For our example data, we can estimate the coefficients in the following way:

4.2.2 Estimation of The Error Variance Parameter

Define the residual vector as the difference between the actual and the fitted values: \[ \widehat{\boldsymbol{\varepsilon}} = \left[ \widehat{\epsilon}_1, ..., \widehat{\epsilon}_N\right]^\top = \mathbf{Y} - \widehat{\mathbf{Y}} = \mathbf{Y} - \mathbf{X} \widehat{\boldsymbol{\beta}} \] Then:

An estimator of \(\sigma^2\), that uses the information from \(\widehat{\epsilon}_i^2\) is: \[ \widehat{\sigma}^2 = \dfrac{1}{N-(k+1)} \sum_{i = 1}^N \widehat{\epsilon}_i^2 = \dfrac{\widehat{\boldsymbol{\varepsilon}}^\top \widehat{\boldsymbol{\varepsilon}}}{N-(k+1)} \] where \(k+1\) is the number of parameters in \(\widehat{\boldsymbol{\beta}} = \left[\widehat{\beta}_0, \widehat{\beta}_1,...,\widehat{\beta}_k \right]^\top\).

Having estimated the unknown parameters via OLS and the variance parameters allows us to calculate various confidence intervals, as discussed in sections 3.5 and 3.7 of the simple univariate regression case.

It should be noted that we require \(N \gt k + 1\). That is, as long as we have more data points than unknown parameters, we will be able to carry out OLS estimation and calculate any relevant test statistics or confidence intervals.

  • If \(N = k + 1\), then our standard errors are very large. This is because standard errors reflect the degree of uncertainty of our estimates and there is less information per parameter to get an accurate estimate (remember that \(\widehat{Y}\) is the mean response).
  • If \(N\) is close to \(k + 1\), then we would need to reduce the number of parameters in our model.

Another way to look at this is by examining section 4.1.6 - the case of \(N \lt k + 1\) means that we have more unknown parameters, \(k+1\), than we have equations, \(N\) - hence our system of equations has infinitely many solutions (or, in a rare case - no solution).

Ideally, as a rule of thumb, we would need at least 20 - 30 observations for every parameter that we want to estimate. On the other hand, while the sample size itself is important, a more pressing concern is whether the sample is representative of the overall population.

If, additionally to (MR.1) - (MR.5), condition (MR.6) holds true, the conditional distribution of the OLS estimators is: \[ \widehat{\boldsymbol{\beta}} | \mathbf{X} \sim \mathcal{N} \left(\boldsymbol{\beta}, \sigma^2 \left( \mathbf{X}^\top \mathbf{X}\right)^{-1} \right) \] This allows us to calculate the confidence intervals for our parameters, their predicted values and the mean response.

Example 4.9 For our example dataset, the variance can be easily estimated from the residuals:

4.2.3 Confidence Intervals

We will begin by examining the confidence intervals for a single parameter. Then, we will generalize to a linear combination of parameters. Finally, we will look at the mean response confidence intervals and prediction intervals for the multiple regression model.

4.2.3.1 Parameter Confidence Intervals

The \(100 \cdot (1 - \alpha)\%\) interval estimate for parameter \(\beta_i\), \(i = 0,...,k\) is calculated in the same way as for the simple univariate regression: \[ \left[ \widehat{\beta}_i - t_c \cdot \text{se}(\widehat{\beta}_i),\ \widehat{\beta}_i + t_c \cdot \text{se}(\widehat{\beta}_i) \right] \] where \(t_c = t_{(1-\alpha/2, N-(k + 1))}\) and \(\text{se}(\widehat{\beta}_i) = \sqrt{\widehat{\mathbb{V}{\rm ar}} (\mathbf{\widehat{\beta}_i})}\). If this interval estimator is used in many samples from the population, then \(100 \cdot (1 - \alpha)\%\) of them will contain the true parameter \(\beta_i\).

In general, if an interval estimate is uninformative because it is too wide, there is nothing immediate that can be done. A narrower interval can only be obtained by reducing the variance of the estimator - this could be done by obtaining additional (and higher quality) data, which exhibits more variation. On the other hand, we cannot way what constitutes a wide interval, as this depends on the problem being investigated.

Alternatively, we might introduce some kind of non-sample information on the coefficients in the form of linear restrictions.

4.2.3.2 Interval Estimation for a Linear Combination of Coefficients

In general, a linear combination of coefficients can be specified as: \[ \sum_{j = 0}^k c_j \widehat{\beta}_j = c_0 \widehat{\beta}_0 + ... + c_k \widehat{\beta}_k = \widehat{r} \] and has the following variance: \[ \begin{aligned} \widehat{\mathbb{V}{\rm ar}}\left(\widehat{r} \right) &= \widehat{\mathbb{V}{\rm ar}}\left( \sum_{j = 0}^k c_j \widehat{\beta}_j \right) = \sum_{j = 0}^k c_j^2 \cdot \widehat{\mathbb{V}{\rm ar}}\left( \widehat{\beta}_j \right) + 2\cdot \sum_{i < j} c_i c_j \cdot \widehat{\mathbb{C}{\rm ov}}\left( \widehat{\beta}_i,\ \widehat{\beta}_j \right) \end{aligned} \] This allows estimating the confidence interval of the specified linear combination as: \[ \left[ \widehat{r} - t_c \cdot \text{se}(\widehat{r}),\ \widehat{r} + t_c \cdot \text{se}(\widehat{r}) \right] \] where \(t_c = t_{(1 - \alpha/2, N-(k+1))}\)

We are interested in different parameter linear combinations when we are considering the mean response \(\mathbb{E} (Y | \mathbf{X} = \mathbf{X}_0)\) for some explanatory variables, \(\mathbf{X}_0\). Alternatively, we may be interested in the effect of changing two or more explanatory variables simultaneously.

Finally, parameter linear combinations are especially relevant if the effect of an explanatory variable depends on two or more parameters - i.e. in models with polynomial variables, or models with interaction terms.

4.2.3.3 Mean Response Confidence Intervals

The mean response estimator \(\widehat{\mathbb{E}}(\mathbf{Y} | \mathbf{X}= \mathbf{X}_0) = \widehat{\mathbf{Y}} = \mathbf{X}_0 \widehat{\boldsymbol{\beta}}\) follows the same distribution as highlighted in the simple univariate regression: \[ \left(\widehat{\mathbf{Y}}|\mathbf{X}_0, \mathbf{X}\right) \sim \mathcal{N}\left( \mathbf{X}_0\boldsymbol{\beta},\quad \sigma^2 \mathbf{X}_0 \left( \mathbf{X}^\top \mathbf{X}\right)^{-1} \mathbf{X}_0^\top\right) \] which means that the \(100 \cdot (1 - \alpha)\%\) confidence intervals for the mean response are: \[ \left[ \widehat{Y}_i - t_{(1 - \alpha/2, N-(k + 1))} \cdot \text{se}(\widehat{Y}_i) ,\ \widehat{Y}_i + t_{(1 - \alpha/2, N-(k + 1))} \cdot \text{se}(\widehat{Y}_i) \right] \] where \(\text{se}(\widehat{Y}_i) = \sqrt{\widehat{\mathbb{V}{\rm ar}} (\widehat{Y}_i)}\) is the square root of the corresponding \(i\)-th diagonal element of \(\widehat{\mathbb{V}{\rm ar}} (\widehat{\mathbf{Y}}) = \widehat{\sigma}^2 \mathbf{X}_0 \left( \mathbf{X}^\top \mathbf{X}\right)^{-1} \mathbf{X}_0^\top\), where \(\widehat{\sigma}^2\) is estimated for the multiple regression model case as presented in this section.

4.2.3.4 Prediction Intervals

Following the simple univariate regression case, the \(100 \cdot (1 - \alpha) \%\) prediction interval for \(\widehat{Y}_i\) is: \[ \left[ \widehat{Y}_i - t_{(1 - \alpha/2, N-(k + 1))} \cdot \text{se}(\widetilde{e}_i) ,\ \widehat{Y}_i + t_{(1 - \alpha/2, N-(k + 1))} \cdot \text{se}(\widetilde{e}_i) \right] \] where the standard forecast error \(\text{se}(\widetilde{e}_i) = \sqrt{\widehat{\mathbb{V}{\rm ar}} (\widetilde{e}_i)}\) is the square root of the corresponding \(i\)-th diagonal element of \(\widehat{\mathbb{V}{\rm ar}} (\widetilde{\boldsymbol{e}}) = \widehat{\sigma}^2 \left( \mathbf{I} + \mathbf{X}_0 \left( \mathbf{X}^\top \mathbf{X}\right)^{-1} \mathbf{X}_0^\top\right)\)

4.2.4 Hypothesis Testing

We will begin by presenting hypothesis testing for a single parameter. We will then examine testing the join hypothesis of a number of parameters, as well as the hypothesis of a linear combination of parameters.

4.2.4.1 Testing For Significance of a Single Parameter

If we wanted to test a hypothesis for \(\beta_j\): \[ \begin{cases} H_0&: \widehat{\beta}_j = c\\\\ H_1&: \widehat{\beta}_j \neq c\quad \text{(or } < c, \quad \text{or } >c\text{)} \end{cases} \] We firstly need to calculate the \(t\)-statistic: \[ t_j = \dfrac{\widehat{\beta}_j - c}{\text{s.e}(\widehat{\beta}_j)} \sim t_{(N-(k+1))} \] where \(\text{s.e}(\widehat{\beta}_j) = \sqrt{\widehat{\mathbb{V}{\rm ar}} (\widehat{\beta}_0)}\) and \(\widehat{\mathbb{V}{\rm ar}} (\widehat{\beta}_j)\) is the corresponding diagonal element from \(\widehat{\mathbb{V}{\rm ar}} (\widehat{\boldsymbol{\beta}}) = \widehat{\sigma}^2 \left( \mathbf{X}^\top \mathbf{X}\right)^{-1}\).

While the formula for the \(t\)-statistic remains the same, but its distribution depends on the number of estimated unknown parameters - a \(t\) distribution with \(N-(k+1)\) degrees of freedom.

The critical value \(t_c\) also depends on the number of estimated parameters:

  • If the alternative is \(H_1: \widehat{\beta}_j > c\), we reject \(H_0\) and accept the alternative \(H_1\), if \(t_i \geq t_c\), where \(t_c = t_{(1 - \alpha, N-(k+1))}\);
  • If the alternative is \(H_1: \widehat{\beta}_j < c\), we reject \(H_0\) and accept the alternative \(H_1\), if \(t_i \leq t_c\), where \(t_c = t_{(\alpha, N-(k+1))}\) ;
  • If the alternative is \(H_1: \widehat{\beta}_j \neq c\), we reject \(H_0\) and accept the alternative \(H_1\), if \(t_i \leq t_{(\alpha/2, N-(k+1))}\), or \(t_i \geq t_{(1 - \alpha/2, N-(k+1))}\);
  • We can also calculate the associated \(p\)-value: if \(p \leq \alpha\), we reject \(H_0\); if \(p \geq \alpha\), we do not reject \(H_0\);
Example 4.10 Continuing our example, we want to test the two-tail hypothesis \(H_0:\beta_j = 0\) against the alternative \(H_1: \beta_j \neq 0\) for each coefficient \(j = 0, ..., 7\) for \(\alpha = 0.05\) significance level. Note that we have always used the matrix notation - this is helpful when we want to calculate the results for multiple values:
##                      t_stat         p_val
## intercept        326.510512  0.000000e+00
## x1               169.415700  0.000000e+00
## log_x2          -502.842822  0.000000e+00
## married           12.035132  3.091084e-31
## age_gr1            4.776168  2.055881e-06
## age_gr2           -8.620667  2.598470e-17
## age_gr2_x1        30.619696 1.744506e-145
## married_age_gr1   -4.054822  5.410364e-05

We see that all of the specified coefficients are statistically significantly different from zero since we do not reject the null hypothesis for each of the coefficients.

Example 4.11 Additionally, we want to test the following hypothesis:

\[ \text{A one percent increase in } X_{2} \text{ would result in a } 3\% \text{ reduction in } Y \] which can be written as the following hypothesis: \[ \begin{cases} H_0&: \beta_2 = -3\\ H_1&: \beta_2 \neq -3 \end{cases} \]

Since the \(p-value > 0.05\), we have no grounds to reject the null hypothesis and conclude that \(\beta_2\) is not statistically significantly different from \(-3\).

4.2.4.2 Joint Hypothesis Test for Multiple Coefficient Significance

The hypothesis tests with \(t\)-statistics allow for testing of a single equality in the null hypothesis. But what if we want to test a joint hypothesis, where multiple values were to be equal to some values?

One of the more popular types of joint hypothesis tests involves checking whether a group of variables is statistically significantly different from zero in a particular model.

If we wanted to test, whether \(M\) coefficients with index \(i_1, ..., i_M \in \{0, 1, ..., k \}\) are statistically significantly different from zero, we would specify the following hypothesis: \[ \begin{cases} H_0&: \beta_{i_1} = 0,\ \beta_{i_2} = 0,\ ...,\ \beta_{i_M} = 0 \\ H_1&: \beta_{i_j} \neq 0, \quad \text{for some } j \end{cases} \] We can test the hypothesis with an \(F\)-test, which evaluates, whether a reduction in the residual sum of squares (RSS) is significantly different. If adding the additional variables does not significantly reduce the residual sum of squares, then those variable contribute little to the explanation of the variation in the dependent variable and we would not reject the null hypothesis.

Define the following RSS:

  • \(\text{RSS}_{UR}\) - the residual sum of squares of the unrestricted (i.e. the full) model under the alternative hypothesis. The coefficient of determination in the unrestricted model is \(R^2_{UR}\);
  • \(\text{RSS}_R\) - the residual sum of squares of the restricted model under the null hypothesis (i.e. when some of the parameters are not statistically significantly different from zero). The coefficient of determination in the restricted model is \(R^2_R\);

Then, the \(F\)-statistic is given by: \[ F = \dfrac{(\text{RSS}_R - \text{RSS}_{UR}) / M}{\text{RSS}_{UR} / (N-(k+1))} = \dfrac{(R^2_{UR} - R^2_{R}) / M}{(1 - R^2_{UR}) / (N-(k+1))} \sim F_{(M, N-(k+1))} \]

We then select the significance level \(\alpha\) and calculate the critical value \(F_c = F_{(1 - \alpha, M, N-(k+1))}\).

If \(F \geq F_c\), we reject the null hypothesis and conclude that at least one of the coefficients in the null is not zero. We can also calculate the associated \(p\)-value.

Example 4.12 We again turn to our example data, where we will estimate an unrestricted model with an additional coefficients. For interest, we also calculate the \(t\)-statistic and \(p\)-values for tests of individual coefficient significance:

\[ \begin{aligned} \log(Y_i) = \beta_0 &+ \beta_1 X_{1i} + \beta_2 \log(X_{2i}) + \beta_3 \text{MARRIED}_{i} + \beta_4 \text{AGE}\_\text{GROUP}_{1i} + \beta_5 \text{AGE}\_\text{GROUP}_{2i} \\ &+ \beta_6 (\text{AGE}\_\text{GROUP}_{2i} \times X_{1i}) + \beta_7 (\text{MARRIED}_{i} \times \text{AGE}\_\text{GROUP}_{1i}) \\ &+ \beta_8 (\text{MARRIED}_{i} \times \text{AGE}\_\text{GROUP}_{2i}) + \beta_9 (\text{AGE}\_\text{GROUP}_{1i} \times X_{1i}) + \epsilon_i \end{aligned} \]

We begin by creating the appropriate design matrix:

we then estimate the model coefficients and their variance-covariance matrix:

and finally perform significance testing separately for each estimated parameter:

##                 coef_est   coef_est coef_est
## intercept        4.02536  258.76200  0.00000
## x1               0.15772  119.40341  0.00000
## log_x2          -3.00231 -502.99025  0.00000
## married          0.04293    7.83446  0.00000
## age_gr1         -0.00368   -0.18757  0.85125
## age_gr2         -0.16403   -8.21466  0.00000
## age_gr2_x1       0.05174   27.37776  0.00000
## married_age_gr1 -0.02278   -2.97211  0.00303
## married_age_gr2  0.00702    0.90482  0.36578
## age_gr1_x1       0.00266    1.41740  0.15668
##                  coef_est     t_stat    p_val
## intercept         3.99175  241.50412  0.00000
## x1                0.15954  112.10817  0.00000
## log_x2           -2.99432 -490.24452  0.00000
## married           0.05546    9.83853  0.00000
## age_gr1           0.04103    1.94595  0.05194
## age_gr2          -0.15242   -7.32694  0.00000
## age_gr2_x1        0.05055   25.24035  0.00000
## married_age_gr1  -0.03966   -5.06695  0.00000
## married_age_gr2   0.00258    0.32976  0.74165
## age_gr1_x1       -0.00172   -0.83967  0.40130

Again, we will consider, mdl_UR_out, the above as the unrestricted model.

The restricted model will is the true model regression which we have previously estimated:

##                 coef_est     t_stat p_val
## intercept        4.01018  326.51051 0e+00
## x1               0.15906  169.41570 0e+00
## log_x2          -3.00237 -502.84282 0e+00
## married          0.04664   12.03513 0e+00
## age_gr1          0.02473    4.77617 0e+00
## age_gr2         -0.14683   -8.62067 0e+00
## age_gr2_x1       0.05037   30.61970 0e+00
## married_age_gr1 -0.02679   -4.05482 5e-05
##                  coef_est     t_stat  p_val
## intercept         3.99941  305.86296    0.0
## x1                0.15870  155.52553    0.0
## log_x2           -2.99430 -490.54593    0.0
## married           0.05682   14.55234    0.0
## age_gr1           0.02463    4.80698    0.0
## age_gr2          -0.15953   -8.93295    0.0
## age_gr2_x1        0.05138   29.54908    0.0
## married_age_gr1  -0.04120   -6.16706    0.0

In other words, we want to test the hypothesis with two restrictions: \[ \begin{cases} H_0&: \beta_{8} = 0,\ \beta_{9} = 0 \\ H_1&: \beta_{8} \neq 0,\quad \text{or} \quad \beta_{9} \neq 0,\quad \text{or both} \end{cases} \] So, we can calculate the unrestricted and restricted residual sum of squares and the \(F\)-statistic along with the associated \(p\)-value:

In this case, the \(p\)-value is greater than 0.05, so we do not reject the null hypothesis that both coefficients are not statistically significantly different from zero. (Note that if we were to reject the null hypothesis - it would mean that at least one coefficient is statistically significantly different from zero - however, we would not know which one, or if both of the coefficients are significant.)

Alternatively, we can specify the models with the built-in OLS estimation functions and carry out an ANOVA (Analysis of variance) test:

## Analysis of Variance Table
## 
## Model 1: log(y) ~ x1 + log(x2) + married + age_gr1 + age_gr2 + age_gr2 * 
##     x1 + married * age_gr1
## Model 2: log(y) ~ x1 + log(x2) + married + age_gr1 + age_gr2 + age_gr2 * 
##     x1 + married * age_gr1 + married * age_gr2 + age_gr1 * x1
##   Res.Df    RSS Df Sum of Sq      F Pr(>F)
## 1    992 2.4314                           
## 2    990 2.4247  2 0.0066927 1.3663 0.2555
##    df_resid       ssr  df_diff   ss_diff         F    Pr(>F)
## 0     992.0  2.482998      0.0       NaN       NaN       NaN
## 1     990.0  2.480965      2.0  0.002033  0.405591  0.666693

In case of RunTime Warning - these specific RuntimeWarnings are coming from scipy.stats.distributions, but are “by design”. In statsmodels these “invalid” RuntimeWarnings should not cause problems.

which give us the same \(F\)-statistic and \(p\)-values.

4.2.4.3 Testing for a Single Linear Restriction

Suppose that we are interested in testing the hypothesis that a linear combination of parameters: \[ \sum_{j = 0}^k c_j \widehat{\beta}_j = c_0 \widehat{\beta}_0 + ... + c_k \widehat{\beta}_k = \widehat{r} \] is equal to \(r\): \[ \begin{cases} H_0&: \widehat{r} = r \\ H_1&: \widehat{r} \neq r \end{cases} \] Then, the associated \(t\)-statistic is calculated by: \[ t_r = \dfrac{\widehat{r} - r}{\text{s.e.}(\widehat{r})} = \dfrac{\sum_{j = 0}^k c_j \widehat{\beta}_j - \sum_{j = 0}^k c_j \beta_j}{\text{s.e.}\left( \sum_{j = 0}^k c_j \widehat{\beta}_j \right)} \sim t_{(N-(k+1))} \] where \(\text{s.e.}\left( \sum_{j = 0}^k c_j \widehat{\beta}_j \right) = \sqrt{ \widehat{\mathbb{V}{\rm ar}}\left( \sum_{j = 0}^k c_j \widehat{\beta}_j \right)}\), and: \[ \begin{aligned} \widehat{\mathbb{V}{\rm ar}}\left( \sum_{j = 0}^k c_j \widehat{\beta}_j \right) &= \sum_{j = 0}^k c_j^2 \cdot \widehat{\mathbb{V}{\rm ar}}\left( \widehat{\beta}_j \right) + 2\cdot \sum_{i < j} c_i c_j \cdot \widehat{\mathbb{C}{\rm ov}}\left( \widehat{\beta}_i,\ \widehat{\beta}_j \right) \end{aligned} \] Note that we can get the relevant values from the variance-covariance matrix estimate, \(\widehat{\mathbb{V}{\rm ar}} (\widehat{\boldsymbol{\beta}}) = \widehat{\sigma}^2 \left( \mathbf{X}^\top \mathbf{X}\right)^{-1}\).

If the sample size is large then, the errors will be approximately normally distributed.

Since the \(t\)-statistic has the same distribution as when testing for a single parameter, we can use the equivalent \(t_c\) values when testing either one-tail or two-tail hypothesis.

Note, that testing this linear constraint is equivalent to testing the following constraint on the parameter vector: \[ \mathbf{L} \widehat{\boldsymbol{\beta}} = \begin{bmatrix} c_0 & c_1 & ... & c_k \end{bmatrix} \begin{bmatrix} \beta_0 \\ \beta_1 \\ \vdots \\ \beta_k \end{bmatrix} = r \] We will talk about linear constraints in the next subsection.

Example 4.13 We may be interested in testing whether one coefficient is eight times the magnitude of another coefficient. For example, if our null hypothesis is:

\[ \begin{aligned} \text{A unit increase in } X_{1} \left( \text{ for a person from } \text{AGE}\_\text{GROUP}_{OTHER} \right) \text{ has an 8 times larger effect on } \\ \text{ the change in } Y \text{ as the fact that a person is between 20 and 30 years old i.e. from } \text{AGE}\_\text{GROUP}_{1} \end{aligned} \] which can be written as the following hypothesis: \[ \begin{cases} H_0&: \beta_1 = 8\cdot\beta_4\\ H_1&: \beta_1 \neq 8\cdot\beta_4 \end{cases} \] or, alternatively: \[ \begin{cases} H_0&: \beta_1 - 8\cdot\beta_4 = 0\\ H_1&: \beta_1 - 8\cdot\beta_4 \neq 0 \end{cases} \]

Then, our test \(t\)-statistic is: \[ t = \dfrac{\beta_1 - 8\cdot\beta_4 - 0}{\text{s.e.}\left( \beta_1 - 8\cdot\beta_4 \right)}= \dfrac{\beta_1 - 8\cdot\beta_4}{\text{s.e.}\left( \beta_1 - 8\cdot\beta_4 \right)} \] We can then calculate the critical value \(t_c\) and test the hypothesis as we would for the single parameter case.

In our example dataset this can be done with the following code:

So, we do not reject the null hypothesis and conclude that the estimated coefficient \(\widehat{\beta}_1\) (i.e. the coefficient of \(X_1\)) is eight times larger than the estimated coefficient \(\widehat{\beta}_4\) (i.e. the coefficient of \(\text{AGE}\_\text{GROUP}_{1}\)).

Thankfully, we have can automatically carry out this test:

## 
##   Simultaneous Tests for General Linear Hypotheses
## 
## Fit: lm(formula = log(y) ~ x1 + log(x2) + married + age_gr1 + age_gr2 + 
##     age_gr2 * x1 + married * age_gr1, data = data_mat)
## 
## Linear Hypotheses:
##                       Estimate Std. Error t value Pr(>|t|)
## x1 - 8 * age_gr1 == 0 -0.03881    0.04145  -0.936    0.349
## (Adjusted p values reported -- single-step method)
##                              Test for Constraints                             
## ==============================================================================
##                  coef    std err          t      P>|t|      [0.025      0.975]
## ------------------------------------------------------------------------------
## c0            -0.0383      0.041     -0.935      0.350      -0.119       0.042
## ==============================================================================

We note that the values are identical to our manual calculation. While in general, there may be at least one package available for most types of tests, this may not be the case across multiple different software. In such cases, it is good to have an example of manual calculation across multiple software and have a built-in method in at least one of them for checking.

Example 4.14 Additionally, we want to test the following hypothesis that we have tested before:

\[ \begin{cases} H_0&: \beta_2 = -3\\ H_1&: \beta_2 \neq -3 \end{cases} \] But this time, we formulate it as a linear restriction.

## 
##   Simultaneous Tests for General Linear Hypotheses
## 
## Fit: lm(formula = log(y) ~ x1 + log_x2 + married + age_gr1 + age_gr2 + 
##     age_gr2 * x1 + married * age_gr1, data = data.frame(y = data_mat$y, 
##     x_mat_1))
## 
## Linear Hypotheses:
##               Estimate Std. Error t value Pr(>|t|)
## log_x2 == -3 -3.002369   0.005971  -0.397    0.692
## (Adjusted p values reported -- single-step method)
##                              Test for Constraints                             
## ==============================================================================
##                  coef    std err          t      P>|t|      [0.025      0.975]
## ------------------------------------------------------------------------------
## c0            -2.9943      0.006      0.934      0.350      -3.006      -2.982
## ==============================================================================

Since the \(p-value > 0.05\), we have no grounds to reject the null hypothesis and conclude that \(\beta_2\) is not statistically significantly different from \(-3\).

Note that this is identical to what we have manually calculated:

##            t_stat     p_val
## log_x2 -0.3967519 0.6916357
##      t_stat     p_val
## 0  0.934279  0.350387

We note that in R, using log(x2) instead of log_x2 produces an error (though this may be different for different package versions):

## Error: multcomp:::expression2coef::walkCode::eval: the expression 'log(x2)' did not evaluate to a real valued constant. Result is '0.896317877324181''1.09065413148445''0.813291492402008''1.45549580493005''0.92203141904866''1.44404361692681''1.01330613576019''0.819928684908807''1.1192571176764''1.39035369534358''1.11680212462669''1.41801372546231''0.999582224112017''0.900903872622784''0.994039484748533''1.02145097716632''1.22060468826589''0.985667342823408''0.866774396722155''1.46854892717463''0.869925659834144''1.12414909644992''1.57272718804863''1.09443168057256''1.08051034205395''1.00371921001686''1.04681384947995''1.2932344635503''1.11803037452521''1.09568769915928''1.58125988364494''1.38660720329357''0.955800225892898''1.0415274935568''1.26608347448435''1.010576379743''1.40150968337192''1.37147927533475''1.38754514302154''1.56490650168063''1.39035369534358''1.40058475652935''1.20046226912588''1.47027654061901''0.833072357149358''1.56018456216198''0.798194756143242''1.22502621352377''0.987067576424418''1.32865729339279''0.911523441450245''1.39128812930038''1.15300881477885''1.31765901555738''1.51086835905354''1.22281789462854''1.4189226787138''1.32167239887084''1.56412105851942''0.905468932541538''1.13988474491925''1.01330613576019''1.46073751757225''1.52322266855448''0.908500769123659''1.41801372546231''1.26608347448435''1.59582883142873''0.69502276723044''1.09819501346907''1.19365694985612''0.72274243547074''0.818273512117139''0.933906867862072''1.58588357541678''0.891710753741286''0.914537004755509''1.39315438178605''1.13144248372308''1.29529278331646''1.25651063059561''1.25864586272358''1.49583885265681''0.744377894766619''0.938324052958336''1.05469135192304''1.58665211650751''1.37338282920904''1.25115251811599''1.21727566725173''1.0969421421421''1.11063814373772''0.85566611005772''0.784581386519577''1.46508472047696''0.854069090820234''1.60265652263849''0.819928684908807''0.816615595185847''0.80325236737368''1.09317408241897''1.42796705752046''1.52322266855448''1.51252444554852''1.09568769915928''1.15419341511481''0.865195033403598''0.930951200685725''1.25651063059561''1.35708593553683''1.27136238873163''1.49499721983956''0.984265145818716''1.41528189799314''1.59048598701823''1.26819838538898''1.60567615251221''1.58818742898924''1.15182280950047''0.932430126269558''1.010576379743''1.26608347448435''0.93685382480818''1.15537661383301''0.85566611005772''1.31564626747768''1.09065413148445''1.57350590320804''1.4941548780798''1.4616084713216''1.02684424866626''0.902427875965759''1.58818742898924''1.08939177070247''1.40335697453763''1.29220371272893''1.32467190120303''0.854069090820234''1.55702416975006''1.59125100044328''0.724563376793324''0.816615595185847''1.58511444321632''1.22171190373239''1.56647553965943''0.897848880430072''0.873067023673978''0.819928684908807''1.16946686651547''1.55067326192577''1.16830023212299''1.59430521508523''1.48484198446978''0.877760643763963''1.4642167904355''1.18105895922829''1.35321246394104''1.3724315052115''1.21949624551431''1.5950673134328''1.19137817940554''1.43871342959594''1.58357440184418''1.44934554392699''1.44492922584078''0.972976220663908''1.21838657275387''1.06639246050816''0.704348293314766''1.60868669166508''1.06768417220553''1.35998122467533''1.1799057782433''1.24143496819016''0.742592711311266''1.11063814373772''1.39780483195916''1.19023684391927''0.88243233666224''1.41528189799314''1.41254258719483''1.59201542907013''0.941258040393124''0.939792122710628''1.35127008715683''1.56018456216198''1.1204823577725''0.927986771637346''1.20834371356383''1.60718255500647''1.4642167904355''1.36382861640423''0.910013247355016''0.899377543148137''1.32666658271546''1.43871342959594''1.47286237742835''1.60793490613913''0.989862175355434''1.25115251811599''1.00646772297021''1.54668336435984''0.720918172270994''1.18795025755713''1.37717910077237''1.40243375551529''1.24034938820296''1.13506932630438''1.52485843773056''1.14228378645371''1.19706539852613''0.860441921735235''1.47027654061901''1.5434799446423''1.53623450841081''1.48313937338878''1.60793490613913''1.34834942955577''0.932430126269558''0.724563376793324''1.37052613785195''1.19933126052275''1.41345652480604''1.31362946007138''1.28287883079319''1.55226477419804''1.03220858875393''1.55146933467578''1.48569220416107''1.15655841424631''0.816615595185847''1.50087382832855''1.6094379124341''1.04549487808627''1.25329920945915''1.52893619246349''0.806609953068526''1.13868306241934''1.49331182618218''1.01466822450653''1.42886701258345''1.58125988364494''1.0375443061705''1.5950673134328''1.35029748211988''1.19479439073705''1.05207240673341''0.75854578228535''1.46247866717057''0.839579972306096''1.42255025222023''1.30450315031941''0.981454839519461''1.12536836097812''1.32666658271546''1.26925416588473''1.32666658271546''0.724563376793324''0.71725963160486''1.42435911614426''0.786293227165854''1.30958360155183''0.916040387649541''1.40150968337192''1.41254258719483''1.39594724628569''0.985667342823408''1.5750615166926''1.17411984117625''1.14467708635955''1.18565843073275''1.28079477307466''1.59201542907013''1.35029748211988''1.03354518953211''1.32566973930346''1.29632035662147''1.60793490613913''1.3724315052115''1.45022647189469''0.880877529404216''1.40796032776414''1.50754793541624''1.52404088761008''0.704348293314766''1.13386183974189''0.77251574172555''1.37147927533475''1.17179606011507''1.42255025222023''1.44492922584078''1.19706539852613''1.4475813564656''1.38096101514453''1.39965897340505''1.54187437833539''1.15419341511481''1.43514412182082''1.53865549143592''1.38660720329357''1.07154932625516''1.4762997912971''0.793111435397631''1.02684424866626''1.55067326192577''1.27136238873163''1.12658614071052''0.995428052432879''1.54267748371921''1.40887846095794''1.49835950936183''1.28287883079319''1.5133514614468''1.4762997912971''1.57116793617347''1.38096101514453''1.15655841424631''0.883984730246965''0.700628512212234''1.5338076499998''1.10319086066244''0.836331458350265''1.43960376946779''1.14706467205406''1.38284662223999''0.873067023673978''0.726381008314649''0.996814694670316''1.41436962789794''1.43156202843832''1.49751999623012''1.35805196362677''1.4642167904355''1.53056264985647''1.50171053176601''0.767299850445265''1.53865549143592''1.53865549143592''0.97439432536859''1.36861713313602''1.30450315031941''0.977224515936981''0.929470084641082''0.793111435397631''0.954355486898524''1.08305595225338''1.27030883288352''1.16479214043325''1.23052576908981''1.28703396142515''0.938324052958336''0.958683457106983''1.03621304349331''1.27661358231426''1.02415124883407''0.837957034432965''1.13023060718616''1.20497355497238''1.17295862564031''1.53946118504739''0.850867380507815''1.20722158889026''1.48484198446978''1.29220371272893''0.831438814722396''0.824877828912388''1.49919831830239''1.34150120452014''1.4762997912971''1.60341228536498''0.971556102082046''1.4694131069778''1.50588357996963''1.22060468826589''1.37147927533475''0.897848880430072''1.33460575772732''1.42073811037731''0.92203141904866''0.747938729389628''1.42164459178097''1.01602846049046''0.987067576424418''1.02953001572385''1.18105895922829''1.27451641272354''0.958683457106983''0.719090575051969''1.29939675624484''1.53946118504739''1.0402015265117''1.41436962789794''1.3405190469686''1.34248239838463''0.98286097989645''1.59201542907013''1.38848220384288''1.25437082949383''1.33953592383518''1.37147927533475''1.45549580493005''1.43782229631255''0.961558399192582''0.784581386519577''1.01330613576019''1.36094446122436''1.05990875506273''1.47458256133876''1.17063214145797''0.828163702624591''1.39965897340505''1.41619333661009''1.40979575195636''1.47113922938567''1.22612854690748''1.07667973511411''0.951459731979268''1.13627535660132''1.0415274935568''1.29939675624484''1.4261647139866''1.57583841690001''1.38190426313146''0.900903872622784''1.45724609709218''1.05076035756203''1.42796705752046''1.19137817940554''1.50421644982524''1.29529278331646''1.07283439818184''0.977224515936981''1.11310829731978''1.29632035662147''1.45286460914219''1.5133514614468''1.13023060718616''1.57272718804863''1.42706629180875''0.798194756143242''1.33361680099599''0.828163702624591''1.25007744186121''1.14108498511134''0.92947008464108

So knowing how to manually calculate this hypothesis test is much more useful than simply applying some black-box functions.

Example 4.15 We want to test the following hypothesis that one coefficient is two times larger than the other:

\[ \begin{cases} H_0&: \beta_1 = 2\cdot\beta_6\\ H_1&: \beta_1 \neq 2\cdot\beta_6 \end{cases} \]

## [1] "(Intercept)"     "x1"              "log(x2)"         "married"         "age_gr1"         "age_gr2"         "x1:age_gr2"      "married:age_gr1"
## ['Intercept', 'x1', 'np.log(x2)', 'married', 'age_gr1', 'age_gr2', 'age_gr2:x1', 'married:age_gr1']
## 
##   Simultaneous Tests for General Linear Hypotheses
## 
## Fit: lm(formula = log(y) ~ x1 + log(x2) + married + age_gr1 + age_gr2 + 
##     age_gr2 * x1 + married * age_gr1, data = data_mat)
## 
## Linear Hypotheses:
##                          Estimate Std. Error t value Pr(>|t|)    
## x1 - 2 * x1:age_gr2 == 0 0.058333   0.003902   14.95   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)
##                              Test for Constraints                             
## ==============================================================================
##                  coef    std err          t      P>|t|      [0.025      0.975]
## ------------------------------------------------------------------------------
## c0             0.0559      0.004     13.447      0.000       0.048       0.064
## ==============================================================================

In this case, we would reject the null hypothesis and conclude that the coefficients are different by a magnitude, which is either greater, or lower than \(2\).

On the other hand, we know that the true parameter values are \(\beta_1 = 0.16\) and \(\beta_6 = 0.05\), so the true magnitude is \(3.2\). If we specify the hypothesis with the true magnitude: \[ \begin{cases} H_0&: \beta_1 = 3.2\cdot\beta_6\\ H_1&: \beta_1 \neq 3.2\cdot\beta_6 \end{cases} \] then:

## 
##   Simultaneous Tests for General Linear Hypotheses
## 
## Fit: lm(formula = log(y) ~ x1 + log(x2) + married + age_gr1 + age_gr2 + 
##     age_gr2 * x1 + married * age_gr1, data = data_mat)
## 
## Linear Hypotheses:
##                             Estimate Std. Error t value Pr(>|t|)
## x1 - 3.2 * x1:age_gr2 == 0 -0.002106   0.005850   -0.36    0.719
## (Adjusted p values reported -- single-step method)
##                              Test for Constraints                             
## ==============================================================================
##                  coef    std err          t      P>|t|      [0.025      0.975]
## ------------------------------------------------------------------------------
## c0            -0.0057      0.006     -0.921      0.357      -0.018       0.006
## ==============================================================================

We would not reject the null hypothesis that \(\beta_1\) is \(3.2\) times larger than \(\beta_6\).

If we were to estimate a model of the effects of \(price\) and \(advertising\) expenditure (in dollars) on the \(revenue\). In such a case \(\beta_{price}\) would be the change in revenue from a one dollar increase in price, and \(\beta_{advertising}\) would be the change in revenue from a one dollar increase in advertising. So, one Hypothesis could be formulated as: \[ \begin{aligned} &\text{reducing the price by 10 cents would have the same effect on revenue as}\\ &\text{increasing the advertising by 100 dollars,} \end{aligned} \] which would translate to the following hypothesis: \(H_0: -0.1 \beta_{price} = 100 \beta_{advertising}\). The alternative could be that \(H_1: -0.1 \beta_{price} > 100 \beta_{advertising}\), i.e.: \[ \begin{aligned} &\text{reducing the price by 10 cents would be more effective than}\\ &\text{increasing the advertising by 100 dollars.} \end{aligned} \]

Finally, it is usually more common to use the tests provided in the next subsection, as they can be used for more than one linear restriction. In our first example, under the null hypothesis we have that our regression could be re-written as the following restricted regression: \[ \begin{aligned} \log(Y_i) = \beta_0 &+ \beta_1 \left(X_{1i} + 8 \cdot \text{AGE}\_\text{GROUP}_{1i}\right) + \beta_2 \log(X_{2i}) + \beta_3 \text{MARRIED}_{i} + \beta_5 \text{AGE}\_\text{GROUP}_{2i} \\ &+ \beta_6 (\text{AGE}\_\text{GROUP}_{2i} \times X_{1i}) + \beta_7 (\text{MARRIED}_{i} \times \text{AGE}\_\text{GROUP}_{1i}) + \epsilon_i \end{aligned} \] Then, we could compare the residual sum of squares from this model with the ones from the unrestricted regression (i.e. the regression under the alternative) via the \(F\)-test.

4.2.4.4 Testing for Multiple Linear Restrictions

If we want to test \(M < k + 1\) different linear restriction on the coefficients, then we can define \(\mathbf{L}\) and the value vector of the linear restrictions as: \[ \mathbf{L} = \begin{bmatrix} c_{10} & c_{11} & ... & c_{1k} \\ c_{20} & c_{21} & ... & c_{2k} \\ \vdots & \vdots & \ddots & \vdots \\ c_{M0} & c_{M1} & ... & c_{Mk} \\ \end{bmatrix},\quad \boldsymbol{r} = \begin{bmatrix} r_1\\ r_2\\ \vdots \\ r_M \end{bmatrix} \] We want to test the following hypothesis: \[ \begin{cases} H_0&: \mathbf{L} \boldsymbol{\beta} = \boldsymbol{r}\\ H_1&: \mathbf{L} \boldsymbol{\beta} \neq \boldsymbol{r} \end{cases} \]

The distribution of \(\mathbf{L} \widehat{\boldsymbol{\beta}}\) is: \[ \mathbf{L} \widehat{\boldsymbol{\beta}} \sim \mathcal{N}\left( \mathbf{L} \boldsymbol{\beta},\ \mathbf{L} \mathbb{V}{\rm ar} (\widehat{\boldsymbol{\beta}}) \mathbf{L}^\top \right) \] where: \[ \begin{aligned} \mathbf{L} \boldsymbol{\beta} &= \boldsymbol{r} \\ \mathbf{L} \mathbb{V}{\rm ar} (\widehat{\boldsymbol{\beta}}) \mathbf{L}^\top &= \sigma^2 \mathbf{L} \left( \mathbf{X}^\top \mathbf{X}\right)^{-1} \mathbf{L}^\top \end{aligned} \] Where the variances of the linearly restricted parameters are the diagonal elements: \[ \begin{aligned} \text{diag} \left( \mathbf{L} \mathbb{V}{\rm ar} (\widehat{\boldsymbol{\beta}}) \mathbf{L}^\top \right)&= \left \{ \sum_{j = 0}^k c_{i, j}^2 \cdot \mathbb{V}{\rm ar}\left( \widehat{\beta}_j \right) + \sum_{j_1 \neq j_2} c_{i, j_1} c_{i, j_2} \cdot \mathbb{C}{\rm ov}\left( \widehat{\beta}_{j_1},\ \widehat{\beta}_{j_2} \right) \right \}_{i = 1,...,M} \\ &= \left \{ \sum_{j = 0}^k c_{i, j}^2 \cdot \mathbb{V}{\rm ar}\left( \widehat{\beta}_j \right) + 2 \cdot\sum_{j_1 < j_2} c_{i, j_1} c_{i, j_2} \cdot \mathbb{C}{\rm ov}\left( \widehat{\beta}_{j_1},\ \widehat{\beta}_{j_2} \right) \right \}_{i = 1,...,M} \end{aligned} \]

In practice, we replace \(\sigma^2\) with \(\widehat{\sigma}^2\).

Since we have more than one restriction, a \(t\)-test is not applicable. Nevertheless, there are a number of alternative test statistics, that we can use

4.2.4.4.1 Wald test for Multiple Linear Restrictions
One can calculate the Wald test statistic, which is applicable to large samples:

\[ W = \left( \mathbf{L} \widehat{\boldsymbol{\beta}} - \boldsymbol{r} \right)^\top \left[ \mathbf{L} \mathbb{V}{\rm ar} (\widehat{\boldsymbol{\beta}}) \mathbf{L}^\top \right]^{-1} \left( \mathbf{L} \widehat{\boldsymbol{\beta}} - \boldsymbol{r} \right) \sim \chi^2_M \]

Note that we do not know the true \(\mathbb{V}{\rm ar} (\widehat{\boldsymbol{\beta}})\). If the sample size is large, then the estimated \(\widehat{\sigma}^2\) is close to the true population variance and the Wald test may be applicable.

4.2.4.4.2 \(F\)-test for Multiple Linear Restrictions
In practice, if we replace \(\sigma^2\) with \(\widehat{\sigma}^2\), and divide the statistic by the number of restrictions, \(M\) - which is applicable for smaller samples - we get the following \(F-statistic\):

\[ F = \dfrac{1}{M} \left( \mathbf{L} \widehat{\boldsymbol{\beta}} - \boldsymbol{r} \right)^\top \left[ \mathbf{L} \widehat{\mathbb{V}{\rm ar}} (\widehat{\boldsymbol{\beta}}) \mathbf{L}^\top \right]^{-1} \left( \mathbf{L} \widehat{\boldsymbol{\beta}} - \boldsymbol{r} \right) \sim F_{(M, N - (k+1))} \]

or, alternatively: \[ F = \dfrac{(\text{RSS}_R - \text{RSS}_{UR}) / M}{\text{RSS}_{UR} / (N-(k+1))} = \dfrac{(R^2_{UR} - R^2_{R}) / M}{(1 - R^2_{UR}) / (N-(k+1))} \sim F_{(M, N-(k+1))} \] where \(\text{RSS}_R\) is the residual sum of squares of the restricted model (i.e. under the null hypothesis), and \(\text{RSS}_{UR}\) is the residual sum of squares of the unrestricted model (i.e. under the alternative hypothesis).

Take note that, regardless of the restrictions, the \(R\)-squared must still be calculated for the same dependent variable. E.g. setting a restriction that \(\beta_j = 1\) would allow us to create a restricted model on \(Y_i - X_{j,i}\), instead of \(Y_i\). This may be software-dependent, so it may sometimes be a good idea to examine the fitted-values, or compare to manually calculated estimates, to make sure that the same dependent variable is estimated in both cases.

Regardless of the chosen formula, we then need to calculate the relevant \(F_c = F_{(1 - \alpha, M, N-(k+1))}\) and the associated \(p\)-value.

Example 4.16 If we have the following multiple regression:

\[ Y_i = \beta_0 + \beta_1 X_{1i} + \beta_2 X_{2i} + \beta_3 X_{3i} + \beta_4 X_{4i} + \beta_5 X_{i5} + \epsilon_i \] We may be interested in testing if two pairs of parameters have the same pair-wise effect on \(Y\): \[ \begin{cases} H_0&: \beta_2 = \beta_3, \quad \beta_4 = 2 \cdot \beta_5\\ H_1&: \beta_2 - \beta_3 \neq 0\quad \text{or}\quad \beta_4 - 2 \cdot \beta_5 \neq 0\quad \text{or both} \end{cases} \] (Note: we have written the alternative hypothesis differently to highlight how we are going to specify the \(\mathbf{L}\) matrix).

In this case, our constraint matrix and the associated value vector is: \[ \mathbf{L} = \begin{bmatrix} 0 & 0 & 1 & -1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & -2 \end{bmatrix},\quad \boldsymbol{r} = \begin{bmatrix} 0 \\ 0\\ \end{bmatrix} \]

Example 4.17 The joint hypothesis for model parameter significance hypothesis testing is equivalent to \(k\) different restrictions:

\[ \begin{cases} H_0&: \beta_1 = ... = \beta_k = 0\\ H_1&: \beta_j \neq 0, \text{ for some } j \end{cases} \] Our constraint matrix and the associated value vector is: \[ \mathbf{L} = \begin{bmatrix} 0 & 1 & 0 & 0 & ... & 0 \\ 0 & 0 & 1 & 0 & ... & 0 \\ \vdots & \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & 0 & ... & 1 \end{bmatrix},\quad \boldsymbol{r} = \begin{bmatrix} 0 \\ 0\\ \vdots \\ 0 \end{bmatrix} \] Note that this is equivalent to the joint hypothesis test for multiple coefficient significance.

What happens if we cannot reject the null hypothesis? This means that there is a significant linear combination of some of our model parameters.

We would like to try to incorporate this information into our coefficient estimator. We can do this by carrying out a Restricted Least Squares (or Constrained Least Squares) procedure.

If we ignore this information, then the OLS estimates are still unbiased but not as effective as RLS.

On the other hand, how do we even know what kind of restrictions we should be testing for? Generally, we may sometimes want to impose (close-to-)zero-value restrictions on some specific coefficients, which we cannot easily remove from the model specification.

Example 4.18 We can test the following hypothesis with an \(F\)-test, instead of a \(t\)-test:

\[ \begin{cases} H_0&: \beta_2 = -3\\ H_1&: \beta_2 \neq -3 \end{cases} \]

## Linear hypothesis test
## 
## Hypothesis:
## log(x2) = - 3
## 
## Model 1: restricted model
## Model 2: log(y) ~ x1 + log(x2) + married + age_gr1 + age_gr2 + age_gr2 * 
##     x1 + married * age_gr1
## 
##   Res.Df    RSS Df  Sum of Sq      F Pr(>F)
## 1    993 2.4318                            
## 2    992 2.4314  1 0.00038582 0.1574 0.6916
## <F test: F=array([[0.87287713]]), p=0.3503873845345189, df_denom=992, df_num=1>

The advantage of the \(t\)-test is that we can do it directly from the usual regression output, even if we weren’t sure whether we would need to perform any hypothesis testing.

The relationship between a \(t\)-statistic and an \(F\)-statistic that has one degree of freedom in the numerator (i.e. one restriction) is: \[ F_{(1, N - (k+1))} = t^2_{(N - (k+1))} \]

If we look at the squared \(t\)-statistic from our previous \(t\)-test:

we see that it is the same as the \(F\)-statistic from our \(F\)-test.

Example 4.19 Say we want to test two linear restrictions:

\[ \begin{cases} H_0&: \beta_2 = -3,\ \beta_3 + \beta_6 = 0.09, \\ H_1&: \beta_2 \neq -3, \text{ or } \beta_3 + \beta_6 \neq 0.09, \text{ or both} \end{cases} \]

We will first show how to carry out this test manually. We begin by specifying the linear restriction matrix and the value vector for \(M = 2\) restrictions: \[ \mathbf{L} = \begin{bmatrix} 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 1 & 0 & 0 & 1 & 0 \end{bmatrix},\quad \boldsymbol{r} = \begin{bmatrix} -3 \\ 0.09\\ \end{bmatrix} \]

## [1] -3.00  0.09
## [-3.    0.09]

Next, since we have already estimated the variance-covariance matrix of our beta_est variable, we can calculate the \(F\)-statistic:

We can also do this automatically:

## Linear hypothesis test
## 
## Hypothesis:
## log(x2) = - 3
## married  + x1:age_gr2 = 0.09
## 
## Model 1: restricted model
## Model 2: log(y) ~ x1 + log(x2) + married + age_gr1 + age_gr2 + age_gr2 * 
##     x1 + married * age_gr1
## 
##   Res.Df    RSS Df Sum of Sq      F Pr(>F)
## 1    994 2.4386                           
## 2    992 2.4314  2 0.0072291 1.4747 0.2293

We have no grounds to reject the null hypothesis (\(p-value > 0.05\)).

## <F test: F=array([[9.39003313]]), p=9.121826098238056e-05, df_denom=992, df_num=2>

We reject the null hypothesis (\(p-value < 0.05\)), however, specifying with \(r_2 = 0.1\) (which is the true value of the sum \(\beta_3 + \beta_6\)), instead of \(0.09\):

## <F test: F=array([[2.21166713]]), p=0.1100576650116352, df_denom=992, df_num=2>

yields \(p-value > 0.05\), so we have no grounds to reject the null hypothesis: \(H_0: \beta_2 = -3,\ \beta_3 + \beta_6 = 0.1\).

We can also carry out the Wald test:

and we can compare it to the built-in functions:

## Linear hypothesis test
## 
## Hypothesis:
## log(x2) = - 3
## married  + x1:age_gr2 = 0.09
## 
## Model 1: restricted model
## Model 2: log(y) ~ x1 + log(x2) + married + age_gr1 + age_gr2 + age_gr2 * 
##     x1 + married * age_gr1
## 
##   Res.Df    RSS Df Sum of Sq  Chisq Pr(>Chisq)
## 1    994 2.4386                               
## 2    992 2.4314  2 0.0072291 2.9494     0.2288
## <Wald test (chi2): statistic=[[18.78006626]], p-value=8.355268804528794e-05, df_denom=2>

In this example, the Wald test results give the same conclusions as the \(F\)-test.

4.2.5 Built-in Estimation Functions

Finally, for comparison, we can use the built-in functions to estimate the relevant model:

4.2.5.1 Parameter OLS Estimation, Significance Testing

## 
## Call:
## lm(formula = log(y) ~ x1 + log(x2) + married + age_gr1 + age_gr2 + 
##     age_gr2 * x1 + married * age_gr1, data = data_mat)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.129248 -0.034729  0.001534  0.034763  0.163707 
## 
## Coefficients:
##                   Estimate Std. Error  t value Pr(>|t|)    
## (Intercept)      4.0101765  0.0122819  326.511  < 2e-16 ***
## x1               0.1590640  0.0009389  169.416  < 2e-16 ***
## log(x2)         -3.0023689  0.0059708 -502.843  < 2e-16 ***
## married          0.0466433  0.0038756   12.035  < 2e-16 ***
## age_gr1          0.0247338  0.0051786    4.776 2.06e-06 ***
## age_gr2         -0.1468251  0.0170318   -8.621  < 2e-16 ***
## x1:age_gr2       0.0503657  0.0016449   30.620  < 2e-16 ***
## married:age_gr1 -0.0267870  0.0066062   -4.055 5.41e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.04951 on 992 degrees of freedom
## Multiple R-squared:  0.997,  Adjusted R-squared:  0.9969 
## F-statistic: 4.647e+04 on 7 and 992 DF,  p-value: < 2.2e-16
##                             OLS Regression Results                            
## ==============================================================================
## Dep. Variable:              np.log(y)   R-squared:                       0.997
## Model:                            OLS   Adj. R-squared:                  0.997
## Method:                 Least Squares   F-statistic:                 4.224e+04
## Date:                Tue, 13 Oct 2020   Prob (F-statistic):               0.00
## Time:                        21:40:50   Log-Likelihood:                 1580.2
## No. Observations:                1000   AIC:                            -3144.
## Df Residuals:                     992   BIC:                            -3105.
## Df Model:                           7                                         
## Covariance Type:            nonrobust                                         
## ===================================================================================
##                       coef    std err          t      P>|t|      [0.025      0.975]
## -----------------------------------------------------------------------------------
## Intercept           3.9994      0.013    305.863      0.000       3.974       4.025
## x1                  0.1587      0.001    155.526      0.000       0.157       0.161
## np.log(x2)         -2.9943      0.006   -490.546      0.000      -3.006      -2.982
## married             0.0568      0.004     14.552      0.000       0.049       0.064
## age_gr1             0.0246      0.005      4.807      0.000       0.015       0.035
## age_gr2            -0.1595      0.018     -8.933      0.000      -0.195      -0.124
## age_gr2:x1          0.0514      0.002     29.549      0.000       0.048       0.055
## married:age_gr1    -0.0412      0.007     -6.167      0.000      -0.054      -0.028
## ==============================================================================
## Omnibus:                        0.209   Durbin-Watson:                   1.934
## Prob(Omnibus):                  0.901   Jarque-Bera (JB):                0.269
## Skew:                          -0.029   Prob(JB):                        0.874
## Kurtosis:                       2.944   Cond. No.                         137.
## ==============================================================================
## 
## Warnings:
## [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
## 0.0025

We can also extract the variance-covariance matrix of the parameters:

##                   (Intercept)            x1       log(x2)       married       age_gr1       age_gr2    x1:age_gr2 married:age_gr1
## (Intercept)      1.508456e-04 -8.758739e-06 -4.316298e-05 -9.329513e-06 -9.713703e-06 -8.840137e-05  8.300376e-06    5.680298e-06
## x1              -8.758739e-06  8.815287e-07  7.508965e-09  1.431190e-07 -7.153594e-08  8.664538e-06 -8.803845e-07    5.500742e-08
## log(x2)         -4.316298e-05  7.508965e-09  3.565033e-05 -1.114429e-07 -1.090457e-06 -3.808553e-06  3.122693e-07    1.503668e-06
## married         -9.329513e-06  1.431190e-07 -1.114429e-07  1.502027e-05  8.035502e-06  6.299750e-07 -3.112915e-08   -1.499250e-05
## age_gr1         -9.713703e-06 -7.153594e-08 -1.090457e-06  8.035502e-06  2.681785e-05  6.419738e-06  1.221773e-07   -2.313111e-05
## age_gr2         -8.840137e-05  8.664538e-06 -3.808553e-06  6.299750e-07  6.419738e-06  2.900805e-04 -2.727918e-05    1.165977e-06
## x1:age_gr2       8.300376e-06 -8.803845e-07  3.122693e-07 -3.112915e-08  1.221773e-07 -2.727918e-05  2.705623e-06   -1.542676e-07
## married:age_gr1  5.680298e-06  5.500742e-08  1.503668e-06 -1.499250e-05 -2.313111e-05  1.165977e-06 -1.542676e-07    4.364213e-05
##                  Intercept            x1    np.log(x2)       married       age_gr1       age_gr2    age_gr2:x1  married:age_gr1
## Intercept         0.000171 -1.034165e-05 -4.488210e-05 -8.991604e-06 -1.301488e-05 -1.129952e-04  1.050144e-05     9.725488e-06
## x1               -0.000010  1.041272e-06 -9.393588e-08 -2.159422e-08  1.568162e-07  1.046432e-05 -1.040926e-06    -2.008797e-07
## np.log(x2)       -0.000045 -9.393588e-08  3.725894e-05  1.148208e-06 -3.884450e-07 -2.451767e-07 -1.344392e-08     7.167193e-08
## married          -0.000009 -2.159422e-08  1.148208e-06  1.524325e-05  7.781495e-06  3.619783e-07 -4.256679e-08    -1.520171e-05
## age_gr1          -0.000013  1.568162e-07 -3.884450e-07  7.781495e-06  2.624813e-05  9.775666e-06 -1.868810e-07    -2.213004e-05
## age_gr2          -0.000113  1.046432e-05 -2.451767e-07  3.619783e-07  9.775666e-06  3.189139e-04 -3.030426e-05    -2.575233e-06
## age_gr2:x1        0.000011 -1.040926e-06 -1.344392e-08 -4.256679e-08 -1.868810e-07 -3.030426e-05  3.023966e-06     2.615093e-07
## married:age_gr1   0.000010 -2.008797e-07  7.167193e-08 -1.520171e-05 -2.213004e-05 -2.575233e-06  2.615093e-07     4.462778e-05

We can compare it to the manually calculated variance-covariance matrix:

##                     intercept            x1        log_x2       married       age_gr1       age_gr2    age_gr2_x1 married_age_gr1
## intercept        1.508456e-04 -8.758739e-06 -4.316298e-05 -9.329513e-06 -9.713703e-06 -8.840137e-05  8.300376e-06    5.680298e-06
## x1              -8.758739e-06  8.815287e-07  7.508965e-09  1.431190e-07 -7.153594e-08  8.664538e-06 -8.803845e-07    5.500742e-08
## log_x2          -4.316298e-05  7.508965e-09  3.565033e-05 -1.114429e-07 -1.090457e-06 -3.808553e-06  3.122693e-07    1.503668e-06
## married         -9.329513e-06  1.431190e-07 -1.114429e-07  1.502027e-05  8.035502e-06  6.299750e-07 -3.112915e-08   -1.499250e-05
## age_gr1         -9.713703e-06 -7.153594e-08 -1.090457e-06  8.035502e-06  2.681785e-05  6.419738e-06  1.221773e-07   -2.313111e-05
## age_gr2         -8.840137e-05  8.664538e-06 -3.808553e-06  6.299750e-07  6.419738e-06  2.900805e-04 -2.727918e-05    1.165977e-06
## age_gr2_x1       8.300376e-06 -8.803845e-07  3.122693e-07 -3.112915e-08  1.221773e-07 -2.727918e-05  2.705623e-06   -1.542676e-07
## married_age_gr1  5.680298e-06  5.500742e-08  1.503668e-06 -1.499250e-05 -2.313111e-05  1.165977e-06 -1.542676e-07    4.364213e-05
##                  intercept            x1        log_x2       married       age_gr1       age_gr2    age_gr2_x1  married_age_gr1
## intercept         0.000171 -1.034165e-05 -4.488210e-05 -8.991604e-06 -1.301488e-05 -1.129952e-04  1.050144e-05     9.725488e-06
## x1               -0.000010  1.041272e-06 -9.393588e-08 -2.159422e-08  1.568162e-07  1.046432e-05 -1.040926e-06    -2.008797e-07
## log_x2           -0.000045 -9.393588e-08  3.725894e-05  1.148208e-06 -3.884450e-07 -2.451767e-07 -1.344392e-08     7.167193e-08
## married          -0.000009 -2.159422e-08  1.148208e-06  1.524325e-05  7.781495e-06  3.619783e-07 -4.256679e-08    -1.520171e-05
## age_gr1          -0.000013  1.568162e-07 -3.884450e-07  7.781495e-06  2.624813e-05  9.775666e-06 -1.868810e-07    -2.213004e-05
## age_gr2          -0.000113  1.046432e-05 -2.451767e-07  3.619783e-07  9.775666e-06  3.189139e-04 -3.030426e-05    -2.575233e-06
## age_gr2_x1        0.000011 -1.040926e-06 -1.344392e-08 -4.256679e-08 -1.868810e-07 -3.030426e-05  3.023966e-06     2.615093e-07
## married_age_gr1   0.000010 -2.008797e-07  7.167193e-08 -1.520171e-05 -2.213004e-05 -2.575233e-06  2.615093e-07     4.462778e-05

We can compare the relevant values: the estimated coefficients and their standard errors, the corresponding \(t\)-value for the null hypothesis tests, the residual standard error, degrees of freedom and the \(F\)-statistic for the hypothesis that all the slope coefficients (i.e. excluding \(\beta_0\)) are not statistically significantly different from zero, against the alternative that at least one is statistically significantly different.

4.2.5.2 Categorical Data Handling

Additionally, we have a categorical variable in our data_mat matrix - it is the age_group variable. Instead of manually calculating the dummy indicator variables, we can directly include it in the model:

## 
## Call:
## lm(formula = log(y) ~ x1 + log(x2) + married + age_group + age_group * 
##     x1 + married * age_group, data = data_mat)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.130645 -0.033463  0.001745  0.034883  0.159463 
## 
## Coefficients:
##                              Estimate Std. Error  t value Pr(>|t|)    
## (Intercept)                  4.025357   0.015556  258.762  < 2e-16 ***
## x1                           0.157716   0.001321  119.403  < 2e-16 ***
## log(x2)                     -3.002311   0.005969 -502.990  < 2e-16 ***
## married                      0.042931   0.005480    7.834 1.21e-14 ***
## age_groupaged_20_30         -0.003679   0.019615   -0.188  0.85125    
## age_groupaged_31_65         -0.164027   0.019968   -8.215 6.61e-16 ***
## x1:age_groupaged_20_30       0.002663   0.001879    1.417  0.15668    
## x1:age_groupaged_31_65       0.051739   0.001890   27.378  < 2e-16 ***
## married:age_groupaged_20_30 -0.022777   0.007664   -2.972  0.00303 ** 
## married:age_groupaged_31_65  0.007016   0.007754    0.905  0.36578    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.04949 on 990 degrees of freedom
## Multiple R-squared:  0.997,  Adjusted R-squared:  0.9969 
## F-statistic: 3.617e+04 on 9 and 990 DF,  p-value: < 2.2e-16
##                          Results: Ordinary least squares
## =================================================================================
## Model:                    OLS                  Adj. R-squared:         0.997     
## Dependent Variable:       np.log(y)            AIC:                    -3141.2307
## Date:                     2020-10-13 21:40     BIC:                    -3092.1532
## No. Observations:         1000                 Log-Likelihood:         1580.6    
## Df Model:                 9                    F-statistic:            3.281e+04 
## Df Residuals:             990                  Prob (F-statistic):     0.00      
## R-squared:                0.997                Scale:                  0.0025060 
## ---------------------------------------------------------------------------------
##                                  Coef.  Std.Err.     t     P>|t|   [0.025  0.975]
## ---------------------------------------------------------------------------------
## Intercept                        3.9918   0.0165  241.5041 0.0000  3.9593  4.0242
## age_group[T.aged_20_30]          0.0410   0.0211    1.9460 0.0519 -0.0003  0.0824
## age_group[T.aged_31_65]         -0.1524   0.0208   -7.3269 0.0000 -0.1932 -0.1116
## x1                               0.1595   0.0014  112.1082 0.0000  0.1567  0.1623
## age_group[T.aged_20_30]:x1      -0.0017   0.0020   -0.8397 0.4013 -0.0057  0.0023
## age_group[T.aged_31_65]:x1       0.0505   0.0020   25.2403 0.0000  0.0466  0.0545
## np.log(x2)                      -2.9943   0.0061 -490.2445 0.0000 -3.0063 -2.9823
## married                          0.0555   0.0056    9.8385 0.0000  0.0444  0.0665
## married:age_group[T.aged_20_30] -0.0397   0.0078   -5.0669 0.0000 -0.0550 -0.0243
## married:age_group[T.aged_31_65]  0.0026   0.0078    0.3298 0.7417 -0.0128  0.0179
## ---------------------------------------------------------------------------------
## Omnibus:                   0.170              Durbin-Watson:                1.929
## Prob(Omnibus):             0.919              Jarque-Bera (JB):             0.224
## Skew:                      -0.027             Prob(JB):                     0.894
## Kurtosis:                  2.950              Condition No.:                212  
## =================================================================================

On the other hand, if we want to remove insignificant interaction terms and keep only the significant ones - we will need to use our indicator variables:

## 
## Call:
## lm(formula = log(y) ~ x1 + log(x2) + married + age_group + age_gr2:x1 + 
##     married:age_gr1, data = data_mat)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.129248 -0.034729  0.001534  0.034763  0.163707 
## 
## Coefficients:
##                       Estimate Std. Error  t value Pr(>|t|)    
## (Intercept)          4.0101765  0.0122819  326.511  < 2e-16 ***
## x1                   0.1590640  0.0009389  169.416  < 2e-16 ***
## log(x2)             -3.0023689  0.0059708 -502.843  < 2e-16 ***
## married              0.0466433  0.0038756   12.035  < 2e-16 ***
## age_groupaged_20_30  0.0247338  0.0051786    4.776 2.06e-06 ***
## age_groupaged_31_65 -0.1468251  0.0170318   -8.621  < 2e-16 ***
## x1:age_gr2           0.0503657  0.0016449   30.620  < 2e-16 ***
## married:age_gr1     -0.0267870  0.0066062   -4.055 5.41e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.04951 on 992 degrees of freedom
## Multiple R-squared:  0.997,  Adjusted R-squared:  0.9969 
## F-statistic: 4.647e+04 on 7 and 992 DF,  p-value: < 2.2e-16
##                      Results: Ordinary least squares
## =========================================================================
## Model:                OLS                Adj. R-squared:       0.997     
## Dependent Variable:   np.log(y)          AIC:                  -3144.4117
## Date:                 2020-10-13 21:40   BIC:                  -3105.1497
## No. Observations:     1000               Log-Likelihood:       1580.2    
## Df Model:             7                  F-statistic:          4.224e+04 
## Df Residuals:         992                Prob (F-statistic):   0.00      
## R-squared:            0.997              Scale:                0.0025030 
## -------------------------------------------------------------------------
##                          Coef.  Std.Err.     t     P>|t|   [0.025  0.975]
## -------------------------------------------------------------------------
## Intercept                3.9994   0.0131  305.8630 0.0000  3.9738  4.0251
## age_group[T.aged_20_30]  0.0246   0.0051    4.8070 0.0000  0.0146  0.0347
## age_group[T.aged_31_65] -0.1595   0.0179   -8.9329 0.0000 -0.1946 -0.1245
## x1                       0.1587   0.0010  155.5255 0.0000  0.1567  0.1607
## np.log(x2)              -2.9943   0.0061 -490.5459 0.0000 -3.0063 -2.9823
## married                  0.0568   0.0039   14.5523 0.0000  0.0492  0.0645
## age_gr2:x1               0.0514   0.0017   29.5491 0.0000  0.0480  0.0548
## married:age_gr1         -0.0412   0.0067   -6.1671 0.0000 -0.0543 -0.0281
## -------------------------------------------------------------------------
## Omnibus:                 0.209           Durbin-Watson:             1.934
## Prob(Omnibus):           0.901           Jarque-Bera (JB):          0.269
## Skew:                    -0.029          Prob(JB):                  0.874
## Kurtosis:                2.944           Condition No.:             137  
## =========================================================================
4.2.5.2.1 Consequences of removing insignificant indicator variable interaction terms

Generally, if some interaction terms are insignificant as additional explanatory variables, removing them from the model changes the interpretation of the base category. For example, in the following model: \[ \log(Y_i) = \beta_0 + \beta_1 X_{1i} + \beta_2 \log(X_{2i}) + \beta_3 \text{AGE}\_\text{GROUP}_{1i} + \beta_4 \text{AGE}\_\text{GROUP}_{2i} + \beta_5 (\text{AGE}\_\text{GROUP}_{2i} \times X_{1i}) + \epsilon_i \] since our base age category is \(\text{AGE}\_\text{GROUP}_{OTHER}\), excluding the interaction term \((\text{AGE}\_\text{GROUP}_{1i} \times X_{1i})\) implies that the coefficient \(\beta_5\) of the interaction variable now compares to a different base group: \(\text{AGE}\_\text{GROUP}_{OTHER} + \text{AGE}\_\text{GROUP}_{1}\).

In other words:

  • coefficients \(\beta_3\) and \(\beta_4\) are the effects of different age groups compared to the \(\text{AGE}\_\text{GROUP}_{OTHER}\) group;
  • coefficient \(\beta_5\) is the additional effect of a unit increase in \(X_1\) from people in \(\text{AGE}\_\text{GROUP}_{2}\), compared to the remaining two groups \(\text{AGE}\_\text{GROUP}_{OTHER} + \text{AGE}\_\text{GROUP}_{1}\).

If we were to leave the interaction variable \((\text{AGE}\_\text{GROUP}_{1i} \times X_{1i})\), then \(\beta_5\) would be the additional effect of a unit increase in \(X_1\) from people in \(\text{AGE}\_\text{GROUP}_{1}\), compared to the base age group \(\text{AGE}\_\text{GROUP}_{OTHER}\).

For this reason, if there are many insignificant interaction terms, or insignificant category levels, we need to decide on the following:

  • Re-categorize the data - combine some categories together, or combine insignificant categories into a new, other, category, as long as that grouping makes economic sense.
  • Leave the groups as they are, along with insignificant interaction terms and/or insignificant category levels. This makes interpretation consistent, since we will have the same base group for individual category effects and interaction effects.

Finally, inclusion of \(\text{MARRIED}\) indicator variable, indicating whether a person is married, would compare to the base group of unmarried people, regardless of their age. Consequently, the base age group \(\text{AGE}\_\text{GROUP}_{OTHER}\) ignores whether a person is single or married - it only indicates their age group.

In other words, the categorical variables \(\text{MARRIED}\) (with its two categories) and \(\text{AGE}\_\text{GROUP}\) (with its three categories) will have base groups, that are not necessarily interpreted the same - marriage status does not take into account age, and age does not take into account marriage status.