## 4.2 OLS Estimation, Confidence Intervals and Hypothesis Testing

Most of the properties and formula expressions presented in this chapter are identical to the simple univariate regression case in chapter 3.3.

Following 4.1.6, the multiple regression model in this section is specified in the following matrix form: $\mathbf{Y} = \mathbf{X} \boldsymbol{\beta} + \boldsymbol{\varepsilon}$ We will further assume that the multiple regression assumptions (MR.1) - (MR.6) hold true.

Example 4.7 We will use the following model to aid our presented methodology:

\begin{aligned} \log(Y_i) = \beta_0 &+ \beta_1 X_{1i} + \beta_2 \log(X_{2i}) + \beta_3 \text{MARRIED}_{i} + \beta_4 \text{AGE}\_\text{GROUP}_{1i} + \beta_5 \text{AGE}\_\text{GROUP}_{2i} \\ &+ \beta_6 (\text{AGE}\_\text{GROUP}_{2i} \times X_{1i}) + \beta_7 (\text{MARRIED}_{i} \times \text{AGE}\_\text{GROUP}_{1i}) + \epsilon_i \end{aligned} where $$MARRIED_i = 1$$, if the $$i$$-th person is married, 0 otherwise; $$\text{AGE}\_\text{GROUP}_{ji}$$ are different age groups: if $$j = 1$$ - between $$20-30$$; if $$j = 2$$ - between $$31-65$$, the base group, $$\text{AGE}\_\text{GROUP}_{OTHER}$$, consists the people with ages in the remaining age brackets, not covered by $$j = 1,2$$.

In this example the specified model has several distinctions:

• The dependent variable $$Y$$ is log-transformed;
• Some independent variables are log-transformed;
• Inclusion of indicator variables;
• Cross-products (i.e. interaction terms) of some independent variables;
• Not all indicator variable cross-products have a significant effect - for example $$(\text{MARRIED}_{i} \times \text{AGE}\_\text{GROUP}_{2i})$$ is not included, which means that married people, aged $$31-65$$ do not have any additional effects on $$\log(Y_i)$$, compared to non-married people in the base age group.

We begin by specifying the parameter vector and sample size:

#
#
set.seed(132)
#
N <- 1000
#
beta_vec <- c(4, 0.16, -3, 0.05, 0.02, -0.15, 0.05, -0.03)
import numpy as np
#
np.random.seed(123)
#
N = 1000
#
beta_vec = np.array([4, 0.16, -3, 0.05, 0.02, -0.15, 0.05, -0.03])

We then generate the variables in the following way:

e <- rnorm(mean = 0, sd = 0.05, n = N)
x1<- rnorm(mean = 10, sd = 2, n = N)
x2<- sample(seq(from = 2, to = 5, length.out = floor(N * 0.8)),
size = N, replace = TRUE)
#
married <- sample(c(0, 1), size = N, replace = TRUE)
e = np.random.normal(loc = 0, scale = 0.05, size = N)
x1= np.random.normal(loc = 10, scale = 2, size = N)
x2= np.random.choice(np.linspace(start = 2, stop = 5, num = int(np.floor(N * 0.8))), size = N, replace = True)
#
married = np.random.choice([0, 1], size = N, replace = True)

The different age groups can be generated randomly as well. We can further create separate indicator variables for two of the three groups. Doing it this way automatically classifies the remaining group of other ages as the base group and we will avoid the dummy variable trap:

age_group <- sample(c("other", "aged_20_30", "aged_31_65"),
size = N, replace = TRUE)
age_gr1 <- rep(0, N)
age_gr1[age_group %in% "aged_20_30"] <- 1
age_gr2 <- rep(0, N)
age_gr2[age_group %in% "aged_31_65"] <- 1
age_group = np.random.choice(["other", "aged_20_30", "aged_31_65"],
size = N, replace = True)
age_gr1 = np.array([0]*N)
age_gr1[age_group == "aged_20_30"] = 1
age_gr2 = np.array([0]*N)
age_gr2[age_group == "aged_31_65"] = 1

Finally, we can create our dependent variable and combine all the data into a single dataset:

#
#
#
x_mat <- cbind(1, x1, log(x2), married,
age_gr1, age_gr2, age_gr2 * x1 , married * age_gr1)
colnames(x_mat) <- c("intercept", "x1", "log_x2", "married",
"age_gr1", "age_gr2", "age_gr2_x1", "married_age_gr1")
#
y <- exp(x_mat %*% beta_vec + e)
#
data_mat <- data.frame(y, x1, x2, married, age_gr1, age_gr2, age_group)
#
#
#
head(data_mat)
##           y        x1       x2 married age_gr1 age_gr2  age_group
## 1 29.313052 10.206100 2.450563       1       0       1 aged_31_65
## 2 14.656237 10.164646 2.976220       0       0       1 aged_31_65
## 3 28.228720 10.816951 2.255319       1       0       0      other
## 4  7.741518 11.713026 4.286608       1       0       1 aged_31_65
## 5  9.333522  6.220738 2.514393       1       1       0 aged_20_30
## 6  2.165643  5.813722 4.237797       1       0       1 aged_31_65
import statsmodels.api as sm
import pandas as pd
#
age_gr1, age_gr2, age_gr2 * x1, married * age_gr1]))
x_col_names = ["intercept", "x1", "log_x2", "married",
"age_gr1", "age_gr2", "age_gr2_x1", "married_age_gr1"]
#
y = np.exp(np.dot(x_mat, beta_vec) + e)
#
data_mat = pd.DataFrame(np.column_stack([y, x1, x2, married, age_gr1, age_gr2, age_group]),
columns = ["y", "x1", "x2", "married", "age_gr1", "age_gr2", "age_group"])
# convert data to float:
data_mat[data_mat.columns[:-1]] = data_mat[data_mat.columns[:-1]].astype(float)
#
print(data_mat.head())                       
##            y         x1        x2  married  age_gr1  age_gr2   age_group
## 0   5.710176   8.502345  3.655820      1.0      0.0      1.0  aged_31_65
## 1  10.572763  11.135189  3.182728      0.0      0.0      0.0       other
## 2  17.613645  11.436301  2.732165      1.0      1.0      0.0  aged_20_30
## 3   4.586854   8.001239  3.411765      0.0      0.0      0.0       other
## 4  10.625736  10.949797  3.085106      0.0      1.0      0.0  aged_20_30

We may also want to re-level the categorical (factor) variable so that the age group - other - would be the base level (this is equivalent to being the first level):

#
print(head(data_mat$age_group)) ## [1] aged_31_65 aged_31_65 other aged_31_65 aged_20_30 aged_31_65 ## Levels: aged_20_30 aged_31_65 other data_mat["age_group"] = pd.Categorical(data_mat["age_group"]) print(data_mat["age_group"].head()) ## 0 aged_31_65 ## 1 other ## 2 aged_20_30 ## 3 other ## 4 aged_20_30 ## Name: age_group, dtype: category ## Categories (3, object): [aged_20_30, aged_31_65, other] # data_mat$age_group <- relevel(data_mat$age_group, "other") print(head(data_mat$age_group))
## [1] aged_31_65 aged_31_65 other      aged_31_65 aged_20_30 aged_31_65
## Levels: other aged_20_30 aged_31_65
data_mat["age_group"] = pd.Categorical(data_mat["age_group"],
categories = ["other", "aged_20_30", "aged_31_65"])
print(data_mat["age_group"].head())
## 0    aged_31_65
## 1         other
## 2    aged_20_30
## 3         other
## 4    aged_20_30
## Name: age_group, dtype: category
## Categories (3, object): [other, aged_20_30, aged_31_65]

Note that in R categorical variables are treated as categorical variables (and not as text strings) automatically, while in Python they need to be separately created.

### 4.2.1 OLS Estimation of Regression Parameters

As before, we want to minimize the sum of squared residuals: \begin{aligned} RSS(\boldsymbol{\beta}) &= \boldsymbol{\varepsilon}^\top \boldsymbol{\varepsilon} \\ &= \left( \mathbf{Y} - \mathbf{X} \boldsymbol{\beta} \right)^\top \left( \mathbf{Y} - \mathbf{X} \boldsymbol{\beta} \right) \\ &= \mathbf{Y} ^\top \mathbf{Y} - \boldsymbol{\beta}^\top \mathbf{X}^\top \mathbf{Y} - \mathbf{Y}^\top \mathbf{X} \boldsymbol{\beta} + \boldsymbol{\beta}^\top \mathbf{X}^\top \mathbf{X} \boldsymbol{\beta} \rightarrow \min_{\beta_0, \beta_1, ..., \beta_k} \end{aligned} Then, equating the partial derivative to zero: $\dfrac{\partial RSS(\widehat{\boldsymbol{\beta}})}{\partial \widehat{\boldsymbol{\beta}}} = -2 \mathbf{X}^\top \mathbf{Y} + 2 \mathbf{X}^\top \mathbf{X} \widehat{\boldsymbol{\beta}} = 0$ gives us the OLS estimator:

$\widehat{\boldsymbol{\beta}} = \left( \mathbf{X}^\top \mathbf{X}\right)^{-1} \mathbf{X}^\top \mathbf{Y}$

The Gauss-Markov Theorem for the multiple regression states that: if the conditions (MR.1)-(MR.5) hold true, the OLS estimators $$\widehat{\boldsymbol{\beta}}$$ are the Best Linear Unbiased Estimators and they are consistent (BLUE&C) with the true parameter values of the multiple regression model.

Note that the proofs are analogous to the proofs in the simple univariate regression case since we are using the same matrix notation as before. This means that the variance-covariance matrix of the OLS estimator vector is: \begin{aligned} \mathbb{V}{\rm ar} (\widehat{\boldsymbol{\beta}}) = \begin{bmatrix} \mathbb{V}{\rm ar} (\widehat{\beta}_0) & \mathbb{C}{\rm ov} (\widehat{\beta}_0, \widehat{\beta}_1) & ... & \mathbb{C}{\rm ov} (\widehat{\beta}_0, \widehat{\beta}_k)\\ \mathbb{C}{\rm ov} (\widehat{\beta}_1, \widehat{\beta}_0) & \mathbb{V}{\rm ar} (\widehat{\beta}_1) & ... & \mathbb{C}{\rm ov} (\widehat{\beta}_1, \widehat{\beta}_k) \\ \vdots & \vdots & \ddots & \vdots \\ \mathbb{C}{\rm ov} (\widehat{\beta}_k, \widehat{\beta}_0) & \mathbb{C}{\rm ov} (\widehat{\beta}_k, \widehat{\beta}_1) & ... & \mathbb{V}{\rm ar} (\widehat{\beta}_k) \end{bmatrix} &= \sigma^2 \left( \mathbf{X}^\top \mathbf{X}\right)^{-1} \end{aligned} The difference from the univariate case is in the estimation of the unknown error variance parameter $$\sigma^2$$.

Example 4.8 For our example data, we can estimate the coefficients in the following way:
x_mat_1 <- cbind(1, data_mat$x1, log(data_mat$x2), data_mat$married, data_mat$age_gr1, data_mat$age_gr2, data_mat$age_gr2 * x1, data_mat$age_gr1 * data_mat$married)
colnames(x_mat_1) <- colnames(x_mat)
#
#
beta_est <- solve(t(x_mat_1) %*% x_mat_1) %*% t(x_mat_1) %*% log(data_mat$y) # colnames(beta_est) <- "coef_est" # print(beta_est) ## coef_est ## intercept 4.01017653 ## x1 0.15906399 ## log_x2 -3.00236892 ## married 0.04664335 ## age_gr1 0.02473384 ## age_gr2 -0.14682506 ## age_gr2_x1 0.05036566 ## married_age_gr1 -0.02678704 x_mat_1 = np.column_stack([data_mat["x1"], np.log(data_mat["x2"]), data_mat["married"], data_mat["age_gr1"], data_mat["age_gr2"], data_mat["age_gr2"] * data_mat["x1"], data_mat["married"] * data_mat["age_gr1"]]) x_mat_1 = sm.add_constant(x_mat_1) # beta_est = np.dot(np.linalg.inv(np.dot(np.transpose(x_mat_1), x_mat_1)), np.dot(np.transpose(x_mat_1), np.log(data_mat["y"]))) beta_est = pd.DataFrame(beta_est, columns = ["coef_est"], index = x_col_names) # print(beta_est) ## coef_est ## intercept 3.999410 ## x1 0.158702 ## log_x2 -2.994297 ## married 0.056816 ## age_gr1 0.024628 ## age_gr2 -0.159526 ## age_gr2_x1 0.051385 ## married_age_gr1 -0.041198 ### 4.2.2 Estimation of The Error Variance Parameter Define the residual vector as the difference between the actual and the fitted values: $\widehat{\boldsymbol{\varepsilon}} = \left[ \widehat{\epsilon}_1, ..., \widehat{\epsilon}_N\right]^\top = \mathbf{Y} - \widehat{\mathbf{Y}} = \mathbf{Y} - \mathbf{X} \widehat{\boldsymbol{\beta}}$ Then: An estimator of $$\sigma^2$$, that uses the information from $$\widehat{\epsilon}_i^2$$ is: $\widehat{\sigma}^2 = \dfrac{1}{N-(k+1)} \sum_{i = 1}^N \widehat{\epsilon}_i^2 = \dfrac{\widehat{\boldsymbol{\varepsilon}}^\top \widehat{\boldsymbol{\varepsilon}}}{N-(k+1)}$ where $$k+1$$ is the number of parameters in $$\widehat{\boldsymbol{\beta}} = \left[\widehat{\beta}_0, \widehat{\beta}_1,...,\widehat{\beta}_k \right]^\top$$. Having estimated the unknown parameters via OLS and the variance parameters allows us to calculate various confidence intervals, as discussed in sections 3.5 and 3.7 of the simple univariate regression case. It should be noted that we require $$N \gt k + 1$$. That is, as long as we have more data points than unknown parameters, we will be able to carry out OLS estimation and calculate any relevant test statistics or confidence intervals. • If $$N = k + 1$$, then our standard errors are very large. This is because standard errors reflect the degree of uncertainty of our estimates and there is less information per parameter to get an accurate estimate (remember that $$\widehat{Y}$$ is the mean response). • If $$N$$ is close to $$k + 1$$, then we would need to reduce the number of parameters in our model. Another way to look at this is by examining section 4.1.6 - the case of $$N \lt k + 1$$ means that we have more unknown parameters, $$k+1$$, than we have equations, $$N$$ - hence our system of equations has infinitely many solutions (or, in a rare case - no solution). Ideally, as a rule of thumb, we would need at least 20 - 30 observations for every parameter that we want to estimate. On the other hand, while the sample size itself is important, a more pressing concern is whether the sample is representative of the overall population. If, additionally to (MR.1) - (MR.5), condition (MR.6) holds true, the conditional distribution of the OLS estimators is: $\widehat{\boldsymbol{\beta}} | \mathbf{X} \sim \mathcal{N} \left(\boldsymbol{\beta}, \sigma^2 \left( \mathbf{X}^\top \mathbf{X}\right)^{-1} \right)$ This allows us to calculate the confidence intervals for our parameters, their predicted values and the mean response. Example 4.9 For our example dataset, the variance can be easily estimated from the residuals: y_fit <- x_mat_1 %*% beta_est # resid <- log(data_mat$y) - y_fit
#
sigma2_est <- sum(resid^2) / (nrow(data_mat) - length(beta_est))
print(paste0("Estimated residual standard error: ", round(sqrt(sigma2_est), 5)))
## [1] "Estimated residual standard error: 0.04951"
y_fit = np.dot(x_mat_1, beta_est["coef_est"].values)
#
resid = np.array(np.log(data_mat["y"].values)) - y_fit
#
sigma2_est = np.sum(resid**2) / (len(data_mat.index) - len(beta_est.index))
print("Estimated residual standard error: " + str(np.round(sigma2_est, 5)))
## Estimated residual standard error: 0.0025

### 4.2.3 Confidence Intervals

We will begin by examining the confidence intervals for a single parameter. Then, we will generalize to a linear combination of parameters. Finally, we will look at the mean response confidence intervals and prediction intervals for the multiple regression model.

#### 4.2.3.1 Parameter Confidence Intervals

The $$100 \cdot (1 - \alpha)\%$$ interval estimate for parameter $$\beta_i$$, $$i = 0,...,k$$ is calculated in the same way as for the simple univariate regression: $\left[ \widehat{\beta}_i - t_c \cdot \text{se}(\widehat{\beta}_i),\ \widehat{\beta}_i + t_c \cdot \text{se}(\widehat{\beta}_i) \right]$ where $$t_c = t_{(1-\alpha/2, N-(k + 1))}$$ and $$\text{se}(\widehat{\beta}_i) = \sqrt{\widehat{\mathbb{V}{\rm ar}} (\mathbf{\widehat{\beta}_i})}$$. If this interval estimator is used in many samples from the population, then $$100 \cdot (1 - \alpha)\%$$ of them will contain the true parameter $$\beta_i$$.

In general, if an interval estimate is uninformative because it is too wide, there is nothing immediate that can be done. A narrower interval can only be obtained by reducing the variance of the estimator - this could be done by obtaining additional (and higher quality) data, which exhibits more variation. On the other hand, we cannot way what constitutes a wide interval, as this depends on the problem being investigated.

Alternatively, we might introduce some kind of non-sample information on the coefficients in the form of linear restrictions.

#### 4.2.3.2 Interval Estimation for a Linear Combination of Coefficients

In general, a linear combination of coefficients can be specified as: $\sum_{j = 0}^k c_j \widehat{\beta}_j = c_0 \widehat{\beta}_0 + ... + c_k \widehat{\beta}_k = \widehat{r}$ and has the following variance: \begin{aligned} \widehat{\mathbb{V}{\rm ar}}\left(\widehat{r} \right) &= \widehat{\mathbb{V}{\rm ar}}\left( \sum_{j = 0}^k c_j \widehat{\beta}_j \right) = \sum_{j = 0}^k c_j^2 \cdot \widehat{\mathbb{V}{\rm ar}}\left( \widehat{\beta}_j \right) + 2\cdot \sum_{i < j} c_i c_j \cdot \widehat{\mathbb{C}{\rm ov}}\left( \widehat{\beta}_i,\ \widehat{\beta}_j \right) \end{aligned} This allows estimating the confidence interval of the specified linear combination as: $\left[ \widehat{r} - t_c \cdot \text{se}(\widehat{r}),\ \widehat{r} + t_c \cdot \text{se}(\widehat{r}) \right]$ where $$t_c = t_{(1 - \alpha/2, N-(k+1))}$$

We are interested in different parameter linear combinations when we are considering the mean response $$\mathbb{E} (Y | \mathbf{X} = \mathbf{X}_0)$$ for some explanatory variables, $$\mathbf{X}_0$$. Alternatively, we may be interested in the effect of changing two or more explanatory variables simultaneously.

Finally, parameter linear combinations are especially relevant if the effect of an explanatory variable depends on two or more parameters - i.e. in models with polynomial variables, or models with interaction terms.

#### 4.2.3.3 Mean Response Confidence Intervals

The mean response estimator $$\widehat{\mathbb{E}}(\mathbf{Y} | \mathbf{X}= \mathbf{X}_0) = \widehat{\mathbf{Y}} = \mathbf{X}_0 \widehat{\boldsymbol{\beta}}$$ follows the same distribution as highlighted in the simple univariate regression: $\left(\widehat{\mathbf{Y}}|\mathbf{X}_0, \mathbf{X}\right) \sim \mathcal{N}\left( \mathbf{X}_0\boldsymbol{\beta},\quad \sigma^2 \mathbf{X}_0 \left( \mathbf{X}^\top \mathbf{X}\right)^{-1} \mathbf{X}_0^\top\right)$ which means that the $$100 \cdot (1 - \alpha)\%$$ confidence intervals for the mean response are: $\left[ \widehat{Y}_i - t_{(1 - \alpha/2, N-(k + 1))} \cdot \text{se}(\widehat{Y}_i) ,\ \widehat{Y}_i + t_{(1 - \alpha/2, N-(k + 1))} \cdot \text{se}(\widehat{Y}_i) \right]$ where $$\text{se}(\widehat{Y}_i) = \sqrt{\widehat{\mathbb{V}{\rm ar}} (\widehat{Y}_i)}$$ is the square root of the corresponding $$i$$-th diagonal element of $$\widehat{\mathbb{V}{\rm ar}} (\widehat{\mathbf{Y}}) = \widehat{\sigma}^2 \mathbf{X}_0 \left( \mathbf{X}^\top \mathbf{X}\right)^{-1} \mathbf{X}_0^\top$$, where $$\widehat{\sigma}^2$$ is estimated for the multiple regression model case as presented in this section.

#### 4.2.3.4 Prediction Intervals

Following the simple univariate regression case, the $$100 \cdot (1 - \alpha) \%$$ prediction interval for $$\widehat{Y}_i$$ is: $\left[ \widehat{Y}_i - t_{(1 - \alpha/2, N-(k + 1))} \cdot \text{se}(\widetilde{e}_i) ,\ \widehat{Y}_i + t_{(1 - \alpha/2, N-(k + 1))} \cdot \text{se}(\widetilde{e}_i) \right]$ where the standard forecast error $$\text{se}(\widetilde{e}_i) = \sqrt{\widehat{\mathbb{V}{\rm ar}} (\widetilde{e}_i)}$$ is the square root of the corresponding $$i$$-th diagonal element of $$\widehat{\mathbb{V}{\rm ar}} (\widetilde{\boldsymbol{e}}) = \widehat{\sigma}^2 \left( \mathbf{I} + \mathbf{X}_0 \left( \mathbf{X}^\top \mathbf{X}\right)^{-1} \mathbf{X}_0^\top\right)$$

### 4.2.4 Hypothesis Testing

We will begin by presenting hypothesis testing for a single parameter. We will then examine testing the join hypothesis of a number of parameters, as well as the hypothesis of a linear combination of parameters.

#### 4.2.4.1 Testing For Significance of a Single Parameter

If we wanted to test a hypothesis for $$\beta_j$$: $\begin{cases} H_0&: \widehat{\beta}_j = c\\\\ H_1&: \widehat{\beta}_j \neq c\quad \text{(or } < c, \quad \text{or } >c\text{)} \end{cases}$ We firstly need to calculate the $$t$$-statistic: $t_j = \dfrac{\widehat{\beta}_j - c}{\text{s.e}(\widehat{\beta}_j)} \sim t_{(N-(k+1))}$ where $$\text{s.e}(\widehat{\beta}_j) = \sqrt{\widehat{\mathbb{V}{\rm ar}} (\widehat{\beta}_0)}$$ and $$\widehat{\mathbb{V}{\rm ar}} (\widehat{\beta}_j)$$ is the corresponding diagonal element from $$\widehat{\mathbb{V}{\rm ar}} (\widehat{\boldsymbol{\beta}}) = \widehat{\sigma}^2 \left( \mathbf{X}^\top \mathbf{X}\right)^{-1}$$.

While the formula for the $$t$$-statistic remains the same, but its distribution depends on the number of estimated unknown parameters - a $$t$$ distribution with $$N-(k+1)$$ degrees of freedom.

The critical value $$t_c$$ also depends on the number of estimated parameters:

• If the alternative is $$H_1: \widehat{\beta}_j > c$$, we reject $$H_0$$ and accept the alternative $$H_1$$, if $$t_i \geq t_c$$, where $$t_c = t_{(1 - \alpha, N-(k+1))}$$;
• If the alternative is $$H_1: \widehat{\beta}_j < c$$, we reject $$H_0$$ and accept the alternative $$H_1$$, if $$t_i \leq t_c$$, where $$t_c = t_{(\alpha, N-(k+1))}$$ ;
• If the alternative is $$H_1: \widehat{\beta}_j \neq c$$, we reject $$H_0$$ and accept the alternative $$H_1$$, if $$t_i \leq t_{(\alpha/2, N-(k+1))}$$, or $$t_i \geq t_{(1 - \alpha/2, N-(k+1))}$$;
• We can also calculate the associated $$p$$-value: if $$p \leq \alpha$$, we reject $$H_0$$; if $$p \geq \alpha$$, we do not reject $$H_0$$;
Example 4.10 Continuing our example, we want to test the two-tail hypothesis $$H_0:\beta_j = 0$$ against the alternative $$H_1: \beta_j \neq 0$$ for each coefficient $$j = 0, ..., 7$$ for $$\alpha = 0.05$$ significance level. Note that we have always used the matrix notation - this is helpful when we want to calculate the results for multiple values:
#
#
#Parameter variance-covariance matrix
beta_vcov <- sigma2_est * solve(t(x_mat_1) %*% x_mat_1)
# Calculate the test statistic for each coefficient:
t_stat <- beta_est / sqrt(diag(beta_vcov))
# Calculate the associated p-value:
p_val <- 2 * pt(-abs(t_stat), df = nrow(data_mat) - length(beta_est), lower = TRUE)
# Combine the output:
test_res <- cbind(t_stat, p_val)
colnames(test_res) <- c("t_stat", "p_val")
print(test_res)
##                      t_stat         p_val
## intercept        326.510512  0.000000e+00
## x1               169.415700  0.000000e+00
## log_x2          -502.842822  0.000000e+00
## married           12.035132  3.091084e-31
## age_gr1            4.776168  2.055881e-06
## age_gr2           -8.620667  2.598470e-17
## age_gr2_x1        30.619696 1.744506e-145
## married_age_gr1   -4.054822  5.410364e-05
import scipy.stats as stats
#
#Parameter variance-covariance matrix
beta_vcov = sigma2_est * np.linalg.inv(np.dot(np.transpose(x_mat_1), x_mat_1))
# Calculate the test statistic for each coefficient:
t_stat = beta_est["coef_est"].values / np.sqrt(np.diag(beta_vcov))
# Calculate the associated p-value:
p_val = 2 * stats.t.cdf(x = -np.abs(t_stat), df = len(data_mat.index) - len(beta_est.index))
# Combine the output:
test_res = np.column_stack([t_stat, p_val])
test_res = pd.DataFrame(test_res, columns = ["t_stat", "p_val"], index = x_col_names)
print(test_res)
##                      t_stat          p_val
## intercept        305.862960   0.000000e+00
## x1               155.525526   0.000000e+00
## log_x2          -490.545932   0.000000e+00
## married           14.552338   1.256024e-43
## age_gr1            4.806984   1.769267e-06
## age_gr2           -8.932949   1.977971e-18
## age_gr2_x1        29.549076  3.663531e-138
## married_age_gr1   -6.167061   1.011321e-09

We see that all of the specified coefficients are statistically significantly different from zero since we do not reject the null hypothesis for each of the coefficients.

Example 4.11 Additionally, we want to test the following hypothesis:

$\text{A one percent increase in } X_{2} \text{ would result in a } 3\% \text{ reduction in } Y$ which can be written as the following hypothesis: $\begin{cases} H_0&: \beta_2 = -3\\ H_1&: \beta_2 \neq -3 \end{cases}$

# Calculate the test statistic for beta[2] each coefficient:
t_stat <- (beta_est["log_x2", ] - (-3)) / sqrt(diag(beta_vcov)["log_x2"])
# Calculate the associated p-value:
p_val <- 2 * pt(-abs(t_stat), df = nrow(data_mat) - length(beta_est), lower = TRUE)
# combine the output
hyp_b2 <- cbind(t_stat, p_val)
print(hyp_b2)
##            t_stat     p_val
## log_x2 -0.3967519 0.6916357
# Calculate the test statistic for beta[2] each coefficient:
t_stat = (beta_est.loc[["log_x2"], "coef_est"].values- (-3)) / np.sqrt(np.diag(beta_vcov)[np.array(x_col_names) == "log_x2"])
# Calculate the associated p-value:
p_val = 2 * stats.t.cdf(x = -np.abs(t_stat), df = len(data_mat.index) - len(beta_est.index))
# combine the output
hyp_b2 = pd.DataFrame(np.column_stack([t_stat, p_val]), columns = ["t_stat", "p_val"])
print(hyp_b2)
##      t_stat     p_val
## 0  0.934279  0.350387

Since the $$p-value > 0.05$$, we have no grounds to reject the null hypothesis and conclude that $$\beta_2$$ is not statistically significantly different from $$-3$$.

#### 4.2.4.2 Joint Hypothesis Test for Multiple Coefficient Significance

The hypothesis tests with $$t$$-statistics allow for testing of a single equality in the null hypothesis. But what if we want to test a joint hypothesis, where multiple values were to be equal to some values?

One of the more popular types of joint hypothesis tests involves checking whether a group of variables is statistically significantly different from zero in a particular model.

If we wanted to test, whether $$M$$ coefficients with index $$i_1, ..., i_M \in \{0, 1, ..., k \}$$ are statistically significantly different from zero, we would specify the following hypothesis: $\begin{cases} H_0&: \beta_{i_1} = 0,\ \beta_{i_2} = 0,\ ...,\ \beta_{i_M} = 0 \\ H_1&: \beta_{i_j} \neq 0, \quad \text{for some } j \end{cases}$ We can test the hypothesis with an $$F$$-test, which evaluates, whether a reduction in the residual sum of squares (RSS) is significantly different. If adding the additional variables does not significantly reduce the residual sum of squares, then those variable contribute little to the explanation of the variation in the dependent variable and we would not reject the null hypothesis.

• $$\text{RSS}_{UR}$$ - the residual sum of squares of the unrestricted (i.e. the full) model under the alternative hypothesis. The coefficient of determination in the unrestricted model is $$R^2_{UR}$$;
• $$\text{RSS}_R$$ - the residual sum of squares of the restricted model under the null hypothesis (i.e. when some of the parameters are not statistically significantly different from zero). The coefficient of determination in the restricted model is $$R^2_R$$;

Then, the $$F$$-statistic is given by: $F = \dfrac{(\text{RSS}_R - \text{RSS}_{UR}) / M}{\text{RSS}_{UR} / (N-(k+1))} = \dfrac{(R^2_{UR} - R^2_{R}) / M}{(1 - R^2_{UR}) / (N-(k+1))} \sim F_{(M, N-(k+1))}$

We then select the significance level $$\alpha$$ and calculate the critical value $$F_c = F_{(1 - \alpha, M, N-(k+1))}$$.

If $$F \geq F_c$$, we reject the null hypothesis and conclude that at least one of the coefficients in the null is not zero. We can also calculate the associated $$p$$-value.

Example 4.12 We again turn to our example data, where we will estimate an unrestricted model with an additional coefficients. For interest, we also calculate the $$t$$-statistic and $$p$$-values for tests of individual coefficient significance:

\begin{aligned} \log(Y_i) = \beta_0 &+ \beta_1 X_{1i} + \beta_2 \log(X_{2i}) + \beta_3 \text{MARRIED}_{i} + \beta_4 \text{AGE}\_\text{GROUP}_{1i} + \beta_5 \text{AGE}\_\text{GROUP}_{2i} \\ &+ \beta_6 (\text{AGE}\_\text{GROUP}_{2i} \times X_{1i}) + \beta_7 (\text{MARRIED}_{i} \times \text{AGE}\_\text{GROUP}_{1i}) \\ &+ \beta_8 (\text{MARRIED}_{i} \times \text{AGE}\_\text{GROUP}_{2i}) + \beta_9 (\text{AGE}\_\text{GROUP}_{1i} \times X_{1i}) + \epsilon_i \end{aligned}

We begin by creating the appropriate design matrix:

x_mat_UR <- cbind(1, data_mat$x1, log(data_mat$x2), data_mat$married, data_mat$age_gr1, data_mat$age_gr2, data_mat$age_gr2 * x1, data_mat$age_gr1 * data_mat$married,
data_mat$age_gr2 * data_mat$married, data_mat$age_gr1 * x1) colnames(x_mat_UR) <- c(colnames(x_mat), "married_age_gr2", "age_gr1_x1") # # # x_mat_UR = np.column_stack([data_mat["x1"], np.log(data_mat["x2"]), data_mat["married"], data_mat["age_gr1"], data_mat["age_gr2"], data_mat["age_gr2"] * data_mat["x1"], data_mat["married"] * data_mat["age_gr1"], data_mat["married"] * data_mat["age_gr2"], data_mat["age_gr1"] * data_mat["x1"]]) x_mat_UR = sm.add_constant(x_mat_UR) x_col_names_UR = np.append(x_col_names, ["married_age_gr2", "age_gr1_x1"]) we then estimate the model coefficients and their variance-covariance matrix: # parameter estimation beta_est_UR <- solve(t(x_mat_UR) %*% x_mat_UR) %*% t(x_mat_UR) %*% log(data_mat$y)
#
colnames(beta_est_UR) <- "coef_est"
# fitted value and error calculation
y_fit_UR <- x_mat_UR %*% beta_est_UR
resid_UR <- log(data_maty) - y_fit_UR sigma2_est_UR <- sum(resid_UR^2) / (nrow(data_mat) - length(beta_est_UR)) beta_vcov_UR <- sigma2_est_UR * solve(t(x_mat_UR) %*% x_mat_UR) # parameter estimation beta_est_UR = np.dot(np.linalg.inv(np.dot(np.transpose(x_mat_UR), x_mat_UR)), np.dot(np.transpose(x_mat_UR), np.log(data_mat["y"]))) beta_est_UR = pd.DataFrame(beta_est_UR, columns = ["coef_est"], index = x_col_names_UR) # fitted value and error calculation y_fit_UR = np.dot(x_mat_UR, beta_est_UR["coef_est"].values) resid_UR = np.array(np.log(data_mat["y"].values)) - y_fit_UR sigma2_est_UR = np.sum(resid_UR**2) / (len(data_mat.index) - len(beta_est_UR.index)) beta_vcov_UR = sigma2_est_UR * np.linalg.inv(np.dot(np.transpose(x_mat_UR), x_mat_UR)) and finally perform significance testing separately for each estimated parameter: # significance testing of individual parameters t_stat_UR <- beta_est_UR / sqrt(diag(beta_vcov_UR)) p_val_UR <- 2 * pt(-abs(t_stat_UR), df = nrow(data_mat) - length(beta_est_UR), lower = TRUE) # combined output mdl_UR_out <- cbind(beta_est_UR, t_stat_UR, p_val_UR) # print(round(mdl_UR_out, 5)) ## coef_est coef_est coef_est ## intercept 4.02536 258.76200 0.00000 ## x1 0.15772 119.40341 0.00000 ## log_x2 -3.00231 -502.99025 0.00000 ## married 0.04293 7.83446 0.00000 ## age_gr1 -0.00368 -0.18757 0.85125 ## age_gr2 -0.16403 -8.21466 0.00000 ## age_gr2_x1 0.05174 27.37776 0.00000 ## married_age_gr1 -0.02278 -2.97211 0.00303 ## married_age_gr2 0.00702 0.90482 0.36578 ## age_gr1_x1 0.00266 1.41740 0.15668 # significance testing of individual parameters t_stat_UR = beta_est_UR["coef_est"].values / np.sqrt(np.diag(beta_vcov_UR)) p_val_UR = 2 * stats.t.cdf(x = -np.abs(t_stat_UR), df = len(data_mat.index) - len(beta_est_UR.index)) # combine the output mdl_UR_out = np.column_stack([beta_est_UR, t_stat_UR, p_val_UR]) mdl_UR_out = pd.DataFrame(mdl_UR_out, columns = ["coef_est", "t_stat", "p_val"], index = x_col_names_UR) print(np.round(mdl_UR_out, 5)) ## coef_est t_stat p_val ## intercept 3.99175 241.50412 0.00000 ## x1 0.15954 112.10817 0.00000 ## log_x2 -2.99432 -490.24452 0.00000 ## married 0.05546 9.83853 0.00000 ## age_gr1 0.04103 1.94595 0.05194 ## age_gr2 -0.15242 -7.32694 0.00000 ## age_gr2_x1 0.05055 25.24035 0.00000 ## married_age_gr1 -0.03966 -5.06695 0.00000 ## married_age_gr2 0.00258 0.32976 0.74165 ## age_gr1_x1 -0.00172 -0.83967 0.40130 Again, we will consider, mdl_UR_out, the above as the unrestricted model. The restricted model will is the true model regression which we have previously estimated: # combined output mdl_R_out <- cbind(beta_est, test_res) # print(round(mdl_R_out, 5)) ## coef_est t_stat p_val ## intercept 4.01018 326.51051 0e+00 ## x1 0.15906 169.41570 0e+00 ## log_x2 -3.00237 -502.84282 0e+00 ## married 0.04664 12.03513 0e+00 ## age_gr1 0.02473 4.77617 0e+00 ## age_gr2 -0.14683 -8.62067 0e+00 ## age_gr2_x1 0.05037 30.61970 0e+00 ## married_age_gr1 -0.02679 -4.05482 5e-05 # combine the output mdl_R_out = np.column_stack([beta_est, test_res]) mdl_R_out = pd.DataFrame(mdl_R_out, columns = ["coef_est", "t_stat", "p_val"], index = beta_est.index) print(np.round(mdl_R_out, 5)) ## coef_est t_stat p_val ## intercept 3.99941 305.86296 0.0 ## x1 0.15870 155.52553 0.0 ## log_x2 -2.99430 -490.54593 0.0 ## married 0.05682 14.55234 0.0 ## age_gr1 0.02463 4.80698 0.0 ## age_gr2 -0.15953 -8.93295 0.0 ## age_gr2_x1 0.05138 29.54908 0.0 ## married_age_gr1 -0.04120 -6.16706 0.0 In other words, we want to test the hypothesis with two restrictions: $\begin{cases} H_0&: \beta_{8} = 0,\ \beta_{9} = 0 \\ H_1&: \beta_{8} \neq 0,\quad \text{or} \quad \beta_{9} \neq 0,\quad \text{or both} \end{cases}$ So, we can calculate the unrestricted and restricted residual sum of squares and the $$F$$-statistic along with the associated $$p$$-value: RSS_UR <- sum(resid_UR^2) RSS_R <- sum(resid^2) # F-statistic and its p-value F_stat <- ((RSS_R - RSS_UR) / 2) / (RSS_UR / (nrow(data_mat) - length(beta_est_UR))) p_val_F<- pf(F_stat, df1 = 2, df2 = nrow(data_mat) - length(beta_est_UR), lower.tail = FALSE) # print(cbind(F_stat, p_val_F)) ## F_stat p_val_F ## [1,] 1.366289 0.2555323 RSS_UR = np.sum(resid_UR**2) RSS_R = np.sum(resid**2) # F-statistic and its p-value F_stat = ((RSS_R - RSS_UR) / 2) / (RSS_UR / (len(data_mat.index) - len(beta_est_UR.index))) p_val_F= stats.f.sf(F_stat, dfn = 2, dfd = len(data_mat.index) - len(beta_est_UR.index)) # # print(pd.DataFrame(np.column_stack([F_stat, p_val_F]), columns = ["F_stat", "p_val_F"])) ## F_stat p_val_F ## 0 0.405591 0.666693 In this case, the $$p$$-value is greater than 0.05, so we do not reject the null hypothesis that both coefficients are not statistically significantly different from zero. (Note that if we were to reject the null hypothesis - it would mean that at least one coefficient is statistically significantly different from zero - however, we would not know which one, or if both of the coefficients are significant.) Alternatively, we can specify the models with the built-in OLS estimation functions and carry out an ANOVA (Analysis of variance) test: # # lm_UR <- lm(log(y) ~ x1 + log(x2) + married + age_gr1 + age_gr2 + age_gr2 * x1 + married * age_gr1 + married * age_gr2 + age_gr1 * x1, data = data_mat) lm_R <- lm(log(y) ~ x1 + log(x2) + married + age_gr1 + age_gr2 + age_gr2 * x1 + married * age_gr1, data = data_mat) # print(anova(lm_R, lm_UR)) ## Analysis of Variance Table ## ## Model 1: log(y) ~ x1 + log(x2) + married + age_gr1 + age_gr2 + age_gr2 * ## x1 + married * age_gr1 ## Model 2: log(y) ~ x1 + log(x2) + married + age_gr1 + age_gr2 + age_gr2 * ## x1 + married * age_gr1 + married * age_gr2 + age_gr1 * x1 ## Res.Df RSS Df Sum of Sq F Pr(>F) ## 1 992 2.4314 ## 2 990 2.4247 2 0.0066927 1.3663 0.2555 import statsmodels.formula.api as smf # lm_UR = smf.ols(formula = "np.log(y) ~ x1 + np.log(x2) + married + age_gr1 + age_gr2 +\ age_gr2 * x1 + married * age_gr1 + married * age_gr2 + age_gr1 * x1", data = data_mat).fit() lm_R = smf.ols(formula='np.log(y) ~ x1 + np.log(x2) + married + age_gr1 + age_gr2 +\ age_gr2 * x1 + married * age_gr1', data = data_mat).fit() print(sm.stats.anova_lm(lm_R, lm_UR)) ## df_resid ssr df_diff ss_diff F Pr(>F) ## 0 992.0 2.482998 0.0 NaN NaN NaN ## 1 990.0 2.480965 2.0 0.002033 0.405591 0.666693 In case of RunTime Warning - these specific RuntimeWarnings are coming from scipy.stats.distributions, but are “by design”. In statsmodels these “invalid” RuntimeWarnings should not cause problems. which give us the same $$F$$-statistic and $$p$$-values. #### 4.2.4.3 Testing for a Single Linear Restriction Suppose that we are interested in testing the hypothesis that a linear combination of parameters: $\sum_{j = 0}^k c_j \widehat{\beta}_j = c_0 \widehat{\beta}_0 + ... + c_k \widehat{\beta}_k = \widehat{r}$ is equal to $$r$$: $\begin{cases} H_0&: \widehat{r} = r \\ H_1&: \widehat{r} \neq r \end{cases}$ Then, the associated $$t$$-statistic is calculated by: $t_r = \dfrac{\widehat{r} - r}{\text{s.e.}(\widehat{r})} = \dfrac{\sum_{j = 0}^k c_j \widehat{\beta}_j - \sum_{j = 0}^k c_j \beta_j}{\text{s.e.}\left( \sum_{j = 0}^k c_j \widehat{\beta}_j \right)} \sim t_{(N-(k+1))}$ where $$\text{s.e.}\left( \sum_{j = 0}^k c_j \widehat{\beta}_j \right) = \sqrt{ \widehat{\mathbb{V}{\rm ar}}\left( \sum_{j = 0}^k c_j \widehat{\beta}_j \right)}$$, and: \begin{aligned} \widehat{\mathbb{V}{\rm ar}}\left( \sum_{j = 0}^k c_j \widehat{\beta}_j \right) &= \sum_{j = 0}^k c_j^2 \cdot \widehat{\mathbb{V}{\rm ar}}\left( \widehat{\beta}_j \right) + 2\cdot \sum_{i < j} c_i c_j \cdot \widehat{\mathbb{C}{\rm ov}}\left( \widehat{\beta}_i,\ \widehat{\beta}_j \right) \end{aligned} Note that we can get the relevant values from the variance-covariance matrix estimate, $$\widehat{\mathbb{V}{\rm ar}} (\widehat{\boldsymbol{\beta}}) = \widehat{\sigma}^2 \left( \mathbf{X}^\top \mathbf{X}\right)^{-1}$$. If the sample size is large then, the errors will be approximately normally distributed. Since the $$t$$-statistic has the same distribution as when testing for a single parameter, we can use the equivalent $$t_c$$ values when testing either one-tail or two-tail hypothesis. Note, that testing this linear constraint is equivalent to testing the following constraint on the parameter vector: $\mathbf{L} \widehat{\boldsymbol{\beta}} = \begin{bmatrix} c_0 & c_1 & ... & c_k \end{bmatrix} \begin{bmatrix} \beta_0 \\ \beta_1 \\ \vdots \\ \beta_k \end{bmatrix} = r$ We will talk about linear constraints in the next subsection. Example 4.13 We may be interested in testing whether one coefficient is eight times the magnitude of another coefficient. For example, if our null hypothesis is: \begin{aligned} \text{A unit increase in } X_{1} \left( \text{ for a person from } \text{AGE}\_\text{GROUP}_{OTHER} \right) \text{ has an 8 times larger effect on } \\ \text{ the change in } Y \text{ as the fact that a person is between 20 and 30 years old i.e. from } \text{AGE}\_\text{GROUP}_{1} \end{aligned} which can be written as the following hypothesis: $\begin{cases} H_0&: \beta_1 = 8\cdot\beta_4\\ H_1&: \beta_1 \neq 8\cdot\beta_4 \end{cases}$ or, alternatively: $\begin{cases} H_0&: \beta_1 - 8\cdot\beta_4 = 0\\ H_1&: \beta_1 - 8\cdot\beta_4 \neq 0 \end{cases}$ Then, our test $$t$$-statistic is: $t = \dfrac{\beta_1 - 8\cdot\beta_4 - 0}{\text{s.e.}\left( \beta_1 - 8\cdot\beta_4 \right)}= \dfrac{\beta_1 - 8\cdot\beta_4}{\text{s.e.}\left( \beta_1 - 8\cdot\beta_4 \right)}$ We can then calculate the critical value $$t_c$$ and test the hypothesis as we would for the single parameter case. In our example dataset this can be done with the following code: # The variance of the linear restriction: var_restr <- sum(diag(beta_vcov)[c(2, 5)] * c(1, -8)^2) + 2 * 1 * (-8) * beta_vcov[2, 5] # Intermediate output: print(paste0("beta[1] - 8 * beta[4] = ", round(beta_est[2] - 8 * beta_est[5], 5))) ## [1] "beta[1] - 8 * beta[4] = -0.03881" print(paste0("The s.e. of the difference = ", round(sqrt(var_restr), 5))) ## [1] "The s.e. of the difference = 0.04145" # The variance of the linear restriction: var_restr = np.sum(np.diag(beta_vcov)[np.isin(np.array(x_col_names), ["x1", "age_gr1"])] * np.array([1, -8])**2) + 2 * 1 * (-8) * beta_vcov[1, 4] # Intermediate output: print("beta[1] - 8 * beta[4] = " + str(np.round(beta_est.iloc[1,:].values - 8 * beta_est.iloc[4,:].values, 5))) ## beta[1] - 8 * beta[4] = [-0.03832] print("The s.e. of the difference = " + str(np.round(np.sqrt(var_restr), 5))) ## The s.e. of the difference = 0.04097 # The t-statistic of the linear restriction: t_stat_restr <- (beta_est[2] - 8 * beta_est[5] - 0) / sqrt(var_restr) # The associated p-value: p_val_restr <- 2 * pt(-abs(t_stat_restr), df = nrow(data_mat) - length(beta_est), lower = TRUE) # print(cbind(t_stat_restr, p_val_restr)) ## t_stat_restr p_val_restr ## [1,] -0.9361572 0.3494201 # The t-statistic of the linear restriction: t_stat_restr = (beta_est.iloc[1,:].values - 8 * beta_est.iloc[4,:].values - 0) / np.sqrt(var_restr) # The associated p-value: p_val_restr = 2 * stats.t.cdf(x = -np.abs(t_stat_restr), df = len(data_mat.index) - len(beta_est.index)) # print(pd.DataFrame(np.column_stack([t_stat_restr, p_val_restr]), columns = ["t_stat", "p_val"])) ## t_stat p_val ## 0 -0.935311 0.349856 So, we do not reject the null hypothesis and conclude that the estimated coefficient $$\widehat{\beta}_1$$ (i.e. the coefficient of $$X_1$$) is eight times larger than the estimated coefficient $$\widehat{\beta}_4$$ (i.e. the coefficient of $$\text{AGE}\_\text{GROUP}_{1}$$). Thankfully, we have can automatically carry out this test: lin_rest <- multcomp::glht(lm_R, linfct = c("x1 - 8 * age_gr1 = 0")) print(summary(lin_rest)) ## ## Simultaneous Tests for General Linear Hypotheses ## ## Fit: lm(formula = log(y) ~ x1 + log(x2) + married + age_gr1 + age_gr2 + ## age_gr2 * x1 + married * age_gr1, data = data_mat) ## ## Linear Hypotheses: ## Estimate Std. Error t value Pr(>|t|) ## x1 - 8 * age_gr1 == 0 -0.03881 0.04145 -0.936 0.349 ## (Adjusted p values reported -- single-step method) # print(lm_R.t_test("x1 - 8 * age_gr1 = 0")) ## Test for Constraints ## ============================================================================== ## coef std err t P>|t| [0.025 0.975] ## ------------------------------------------------------------------------------ ## c0 -0.0383 0.041 -0.935 0.350 -0.119 0.042 ## ============================================================================== We note that the values are identical to our manual calculation. While in general, there may be at least one package available for most types of tests, this may not be the case across multiple different software. In such cases, it is good to have an example of manual calculation across multiple software and have a built-in method in at least one of them for checking. Example 4.14 Additionally, we want to test the following hypothesis that we have tested before: $\begin{cases} H_0&: \beta_2 = -3\\ H_1&: \beta_2 \neq -3 \end{cases}$ But this time, we formulate it as a linear restriction. # We need to specify the formula WITHOUT any logarithm transformations to avoid possible errors: lm_ht <- lm(log(y) ~ x1 + log_x2 + married + age_gr1 + age_gr2 + age_gr2 * x1 + married * age_gr1, data = data.frame(y = data_maty, x_mat_1))
#
summary(multcomp::glht(lm_ht, linfct = c("log_x2 = -3")))
##
##   Simultaneous Tests for General Linear Hypotheses
##
## Fit: lm(formula = log(y) ~ x1 + log_x2 + married + age_gr1 + age_gr2 +
##     age_gr2 * x1 + married * age_gr1, data = data.frame(y = data_mat$y, ## x_mat_1)) ## ## Linear Hypotheses: ## Estimate Std. Error t value Pr(>|t|) ## log_x2 == -3 -3.002369 0.005971 -0.397 0.692 ## (Adjusted p values reported -- single-step method) # We can specify the formula with logarithms as functions: lm_ht = smf.ols(formula='np.log(y) ~ x1 + np.log(x2) + married + age_gr1 + age_gr2 + age_gr2 * x1 + married * age_gr1', data = data_mat).fit() # print(lm_ht.t_test("np.log(x2) = -3")) ## Test for Constraints ## ============================================================================== ## coef std err t P>|t| [0.025 0.975] ## ------------------------------------------------------------------------------ ## c0 -2.9943 0.006 0.934 0.350 -3.006 -2.982 ## ============================================================================== Since the $$p-value > 0.05$$, we have no grounds to reject the null hypothesis and conclude that $$\beta_2$$ is not statistically significantly different from $$-3$$. Note that this is identical to what we have manually calculated: print(hyp_b2) ## t_stat p_val ## log_x2 -0.3967519 0.6916357 print(hyp_b2) ## t_stat p_val ## 0 0.934279 0.350387 We note that in R, using log(x2) instead of log_x2 produces an error (though this may be different for different package versions): lm_ht_v2 <- lm(log(y) ~ x1 + log(x2) + married + age_gr1 + age_gr2 + age_gr2 * x1 + married * age_gr1, data = data.frame(y = data_mat$y, x_mat_1))
#
summary(multcomp::glht(lm_ht_v2, linfct = c("log(x2) = -3")))
## Error: multcomp:::expression2coef::walkCode::eval: the expression 'log(x2)' did not evaluate to a real valued constant. Result is '0.896317877324181''1.09065413148445''0.813291492402008''1.45549580493005''0.92203141904866''1.44404361692681''1.01330613576019''0.819928684908807''1.1192571176764''1.39035369534358''1.11680212462669''1.41801372546231''0.999582224112017''0.900903872622784''0.994039484748533''1.02145097716632''1.22060468826589''0.985667342823408''0.866774396722155''1.46854892717463''0.869925659834144''1.12414909644992''1.57272718804863''1.09443168057256''1.08051034205395''1.00371921001686''1.04681384947995''1.2932344635503''1.11803037452521''1.09568769915928''1.58125988364494''1.38660720329357''0.955800225892898''1.0415274935568''1.26608347448435''1.010576379743''1.40150968337192''1.37147927533475''1.38754514302154''1.56490650168063''1.39035369534358''1.40058475652935''1.20046226912588''1.47027654061901''0.833072357149358''1.56018456216198''0.798194756143242''1.22502621352377''0.987067576424418''1.32865729339279''0.911523441450245''1.39128812930038''1.15300881477885''1.31765901555738''1.51086835905354''1.22281789462854''1.4189226787138''1.32167239887084''1.56412105851942''0.905468932541538''1.13988474491925''1.01330613576019''1.46073751757225''1.52322266855448''0.908500769123659''1.41801372546231''1.26608347448435''1.59582883142873''0.69502276723044''1.09819501346907''1.19365694985612''0.72274243547074''0.818273512117139''0.933906867862072''1.58588357541678''0.891710753741286''0.914537004755509''1.39315438178605''1.13144248372308''1.29529278331646''1.25651063059561''1.25864586272358''1.49583885265681''0.744377894766619''0.938324052958336''1.05469135192304''1.58665211650751''1.37338282920904''1.25115251811599''1.21727566725173''1.0969421421421''1.11063814373772''0.85566611005772''0.784581386519577''1.46508472047696''0.854069090820234''1.60265652263849''0.819928684908807''0.816615595185847''0.80325236737368''1.09317408241897''1.42796705752046''1.52322266855448''1.51252444554852''1.09568769915928''1.15419341511481''0.865195033403598''0.930951200685725''1.25651063059561''1.35708593553683''1.27136238873163''1.49499721983956''0.984265145818716''1.41528189799314''1.59048598701823''1.26819838538898''1.60567615251221''1.58818742898924''1.15182280950047''0.932430126269558''1.010576379743''1.26608347448435''0.93685382480818''1.15537661383301''0.85566611005772''1.31564626747768''1.09065413148445''1.57350590320804''1.4941548780798''1.4616084713216''1.02684424866626''0.902427875965759''1.58818742898924''1.08939177070247''1.40335697453763''1.29220371272893''1.32467190120303''0.854069090820234''1.55702416975006''1.59125100044328''0.724563376793324''0.816615595185847''1.58511444321632''1.22171190373239''1.56647553965943''0.897848880430072''0.873067023673978''0.819928684908807''1.16946686651547''1.55067326192577''1.16830023212299''1.59430521508523''1.48484198446978''0.877760643763963''1.4642167904355''1.18105895922829''1.35321246394104''1.3724315052115''1.21949624551431''1.5950673134328''1.19137817940554''1.43871342959594''1.58357440184418''1.44934554392699''1.44492922584078''0.972976220663908''1.21838657275387''1.06639246050816''0.704348293314766''1.60868669166508''1.06768417220553''1.35998122467533''1.1799057782433''1.24143496819016''0.742592711311266''1.11063814373772''1.39780483195916''1.19023684391927''0.88243233666224''1.41528189799314''1.41254258719483''1.59201542907013''0.941258040393124''0.939792122710628''1.35127008715683''1.56018456216198''1.1204823577725''0.927986771637346''1.20834371356383''1.60718255500647''1.4642167904355''1.36382861640423''0.910013247355016''0.899377543148137''1.32666658271546''1.43871342959594''1.47286237742835''1.60793490613913''0.989862175355434''1.25115251811599''1.00646772297021''1.54668336435984''0.720918172270994''1.18795025755713''1.37717910077237''1.40243375551529''1.24034938820296''1.13506932630438''1.52485843773056''1.14228378645371''1.19706539852613''0.860441921735235''1.47027654061901''1.5434799446423''1.53623450841081''1.48313937338878''1.60793490613913''1.34834942955577''0.932430126269558''0.724563376793324''1.37052613785195''1.19933126052275''1.41345652480604''1.31362946007138''1.28287883079319''1.55226477419804''1.03220858875393''1.55146933467578''1.48569220416107''1.15655841424631''0.816615595185847''1.50087382832855''1.6094379124341''1.04549487808627''1.25329920945915''1.52893619246349''0.806609953068526''1.13868306241934''1.49331182618218''1.01466822450653''1.42886701258345''1.58125988364494''1.0375443061705''1.5950673134328''1.35029748211988''1.19479439073705''1.05207240673341''0.75854578228535''1.46247866717057''0.839579972306096''1.42255025222023''1.30450315031941''0.981454839519461''1.12536836097812''1.32666658271546''1.26925416588473''1.32666658271546''0.724563376793324''0.71725963160486''1.42435911614426''0.786293227165854''1.30958360155183''0.916040387649541''1.40150968337192''1.41254258719483''1.39594724628569''0.985667342823408''1.5750615166926''1.17411984117625''1.14467708635955''1.18565843073275''1.28079477307466''1.59201542907013''1.35029748211988''1.03354518953211''1.32566973930346''1.29632035662147''1.60793490613913''1.3724315052115''1.45022647189469''0.880877529404216''1.40796032776414''1.50754793541624''1.52404088761008''0.704348293314766''1.13386183974189''0.77251574172555''1.37147927533475''1.17179606011507''1.42255025222023''1.44492922584078''1.19706539852613''1.4475813564656''1.38096101514453''1.39965897340505''1.54187437833539''1.15419341511481''1.43514412182082''1.53865549143592''1.38660720329357''1.07154932625516''1.4762997912971''0.793111435397631''1.02684424866626''1.55067326192577''1.27136238873163''1.12658614071052''0.995428052432879''1.54267748371921''1.40887846095794''1.49835950936183''1.28287883079319''1.5133514614468''1.4762997912971''1.57116793617347''1.38096101514453''1.15655841424631''0.883984730246965''0.700628512212234''1.5338076499998''1.10319086066244''0.836331458350265''1.43960376946779''1.14706467205406''1.38284662223999''0.873067023673978''0.726381008314649''0.996814694670316''1.41436962789794''1.43156202843832''1.49751999623012''1.35805196362677''1.4642167904355''1.53056264985647''1.50171053176601''0.767299850445265''1.53865549143592''1.53865549143592''0.97439432536859''1.36861713313602''1.30450315031941''0.977224515936981''0.929470084641082''0.793111435397631''0.954355486898524''1.08305595225338''1.27030883288352''1.16479214043325''1.23052576908981''1.28703396142515''0.938324052958336''0.958683457106983''1.03621304349331''1.27661358231426''1.02415124883407''0.837957034432965''1.13023060718616''1.20497355497238''1.17295862564031''1.53946118504739''0.850867380507815''1.20722158889026''1.48484198446978''1.29220371272893''0.831438814722396''0.824877828912388''1.49919831830239''1.34150120452014''1.4762997912971''1.60341228536498''0.971556102082046''1.4694131069778''1.50588357996963''1.22060468826589''1.37147927533475''0.897848880430072''1.33460575772732''1.42073811037731''0.92203141904866''0.747938729389628''1.42164459178097''1.01602846049046''0.987067576424418''1.02953001572385''1.18105895922829''1.27451641272354''0.958683457106983''0.719090575051969''1.29939675624484''1.53946118504739''1.0402015265117''1.41436962789794''1.3405190469686''1.34248239838463''0.98286097989645''1.59201542907013''1.38848220384288''1.25437082949383''1.33953592383518''1.37147927533475''1.45549580493005''1.43782229631255''0.961558399192582''0.784581386519577''1.01330613576019''1.36094446122436''1.05990875506273''1.47458256133876''1.17063214145797''0.828163702624591''1.39965897340505''1.41619333661009''1.40979575195636''1.47113922938567''1.22612854690748''1.07667973511411''0.951459731979268''1.13627535660132''1.0415274935568''1.29939675624484''1.4261647139866''1.57583841690001''1.38190426313146''0.900903872622784''1.45724609709218''1.05076035756203''1.42796705752046''1.19137817940554''1.50421644982524''1.29529278331646''1.07283439818184''0.977224515936981''1.11310829731978''1.29632035662147''1.45286460914219''1.5133514614468''1.13023060718616''1.57272718804863''1.42706629180875''0.798194756143242''1.33361680099599''0.828163702624591''1.25007744186121''1.14108498511134''0.92947008464108

So knowing how to manually calculate this hypothesis test is much more useful than simply applying some black-box functions.

Example 4.15 We want to test the following hypothesis that one coefficient is two times larger than the other:

$\begin{cases} H_0&: \beta_1 = 2\cdot\beta_6\\ H_1&: \beta_1 \neq 2\cdot\beta_6 \end{cases}$

# Print the coefficient names:
names(coef(lm_R))
## [1] "(Intercept)"     "x1"              "log(x2)"         "married"         "age_gr1"         "age_gr2"         "x1:age_gr2"      "married:age_gr1"
# Print the coefficient names:
print(lm_R.params.index.format())
## ['Intercept', 'x1', 'np.log(x2)', 'married', 'age_gr1', 'age_gr2', 'age_gr2:x1', 'married:age_gr1']
# Specify the linear restriction
lin_rest <- multcomp::glht(lm_R, linfct = c("x1 - 2 * x1:age_gr2 = 0"))
print(summary(lin_rest))
##
##   Simultaneous Tests for General Linear Hypotheses
##
## Fit: lm(formula = log(y) ~ x1 + log(x2) + married + age_gr1 + age_gr2 +
##     age_gr2 * x1 + married * age_gr1, data = data_mat)
##
## Linear Hypotheses:
##                          Estimate Std. Error t value Pr(>|t|)
## x1 - 2 * x1:age_gr2 == 0 0.058333   0.003902   14.95   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)
# Specify the linear restriction
#
print(lm_R.t_test("x1 - 2 * age_gr2:x1 = 0"))
##                              Test for Constraints
## ==============================================================================
##                  coef    std err          t      P>|t|      [0.025      0.975]
## ------------------------------------------------------------------------------
## c0             0.0559      0.004     13.447      0.000       0.048       0.064
## ==============================================================================

In this case, we would reject the null hypothesis and conclude that the coefficients are different by a magnitude, which is either greater, or lower than $$2$$.

On the other hand, we know that the true parameter values are $$\beta_1 = 0.16$$ and $$\beta_6 = 0.05$$, so the true magnitude is $$3.2$$. If we specify the hypothesis with the true magnitude: $\begin{cases} H_0&: \beta_1 = 3.2\cdot\beta_6\\ H_1&: \beta_1 \neq 3.2\cdot\beta_6 \end{cases}$ then:

# Specify the linear restriction
lin_rest <- multcomp::glht(lm_R, linfct = c("x1 - 3.2 * x1:age_gr2 = 0"))
print(summary(lin_rest))
##
##   Simultaneous Tests for General Linear Hypotheses
##
## Fit: lm(formula = log(y) ~ x1 + log(x2) + married + age_gr1 + age_gr2 +
##     age_gr2 * x1 + married * age_gr1, data = data_mat)
##
## Linear Hypotheses:
##                             Estimate Std. Error t value Pr(>|t|)
## x1 - 3.2 * x1:age_gr2 == 0 -0.002106   0.005850   -0.36    0.719
## (Adjusted p values reported -- single-step method)
# Specify the linear restriction
#
print(lm_R.t_test("x1 - 3.2 * age_gr2:x1 = 0"))
##                              Test for Constraints
## ==============================================================================
##                  coef    std err          t      P>|t|      [0.025      0.975]
## ------------------------------------------------------------------------------
## c0            -0.0057      0.006     -0.921      0.357      -0.018       0.006
## ==============================================================================

We would not reject the null hypothesis that $$\beta_1$$ is $$3.2$$ times larger than $$\beta_6$$.

If we were to estimate a model of the effects of $$price$$ and $$advertising$$ expenditure (in dollars) on the $$revenue$$. In such a case $$\beta_{price}$$ would be the change in revenue from a one dollar increase in price, and $$\beta_{advertising}$$ would be the change in revenue from a one dollar increase in advertising. So, one Hypothesis could be formulated as: \begin{aligned} &\text{reducing the price by 10 cents would have the same effect on revenue as}\\ &\text{increasing the advertising by 100 dollars,} \end{aligned} which would translate to the following hypothesis: $$H_0: -0.1 \beta_{price} = 100 \beta_{advertising}$$. The alternative could be that $$H_1: -0.1 \beta_{price} > 100 \beta_{advertising}$$, i.e.: \begin{aligned} &\text{reducing the price by 10 cents would be more effective than}\\ &\text{increasing the advertising by 100 dollars.} \end{aligned}

Finally, it is usually more common to use the tests provided in the next subsection, as they can be used for more than one linear restriction. In our first example, under the null hypothesis we have that our regression could be re-written as the following restricted regression: \begin{aligned} \log(Y_i) = \beta_0 &+ \beta_1 \left(X_{1i} + 8 \cdot \text{AGE}\_\text{GROUP}_{1i}\right) + \beta_2 \log(X_{2i}) + \beta_3 \text{MARRIED}_{i} + \beta_5 \text{AGE}\_\text{GROUP}_{2i} \\ &+ \beta_6 (\text{AGE}\_\text{GROUP}_{2i} \times X_{1i}) + \beta_7 (\text{MARRIED}_{i} \times \text{AGE}\_\text{GROUP}_{1i}) + \epsilon_i \end{aligned} Then, we could compare the residual sum of squares from this model with the ones from the unrestricted regression (i.e. the regression under the alternative) via the $$F$$-test.

#### 4.2.4.4 Testing for Multiple Linear Restrictions

If we want to test $$M < k + 1$$ different linear restriction on the coefficients, then we can define $$\mathbf{L}$$ and the value vector of the linear restrictions as: $\mathbf{L} = \begin{bmatrix} c_{10} & c_{11} & ... & c_{1k} \\ c_{20} & c_{21} & ... & c_{2k} \\ \vdots & \vdots & \ddots & \vdots \\ c_{M0} & c_{M1} & ... & c_{Mk} \\ \end{bmatrix},\quad \boldsymbol{r} = \begin{bmatrix} r_1\\ r_2\\ \vdots \\ r_M \end{bmatrix}$ We want to test the following hypothesis: $\begin{cases} H_0&: \mathbf{L} \boldsymbol{\beta} = \boldsymbol{r}\\ H_1&: \mathbf{L} \boldsymbol{\beta} \neq \boldsymbol{r} \end{cases}$

The distribution of $$\mathbf{L} \widehat{\boldsymbol{\beta}}$$ is: $\mathbf{L} \widehat{\boldsymbol{\beta}} \sim \mathcal{N}\left( \mathbf{L} \boldsymbol{\beta},\ \mathbf{L} \mathbb{V}{\rm ar} (\widehat{\boldsymbol{\beta}}) \mathbf{L}^\top \right)$ where: \begin{aligned} \mathbf{L} \boldsymbol{\beta} &= \boldsymbol{r} \\ \mathbf{L} \mathbb{V}{\rm ar} (\widehat{\boldsymbol{\beta}}) \mathbf{L}^\top &= \sigma^2 \mathbf{L} \left( \mathbf{X}^\top \mathbf{X}\right)^{-1} \mathbf{L}^\top \end{aligned} Where the variances of the linearly restricted parameters are the diagonal elements: \begin{aligned} \text{diag} \left( \mathbf{L} \mathbb{V}{\rm ar} (\widehat{\boldsymbol{\beta}}) \mathbf{L}^\top \right)&= \left \{ \sum_{j = 0}^k c_{i, j}^2 \cdot \mathbb{V}{\rm ar}\left( \widehat{\beta}_j \right) + \sum_{j_1 \neq j_2} c_{i, j_1} c_{i, j_2} \cdot \mathbb{C}{\rm ov}\left( \widehat{\beta}_{j_1},\ \widehat{\beta}_{j_2} \right) \right \}_{i = 1,...,M} \\ &= \left \{ \sum_{j = 0}^k c_{i, j}^2 \cdot \mathbb{V}{\rm ar}\left( \widehat{\beta}_j \right) + 2 \cdot\sum_{j_1 < j_2} c_{i, j_1} c_{i, j_2} \cdot \mathbb{C}{\rm ov}\left( \widehat{\beta}_{j_1},\ \widehat{\beta}_{j_2} \right) \right \}_{i = 1,...,M} \end{aligned}

In practice, we replace $$\sigma^2$$ with $$\widehat{\sigma}^2$$.

Since we have more than one restriction, a $$t$$-test is not applicable. Nevertheless, there are a number of alternative test statistics, that we can use

##### 4.2.4.4.1 Wald test for Multiple Linear Restrictions
One can calculate the Wald test statistic, which is applicable to large samples:

$W = \left( \mathbf{L} \widehat{\boldsymbol{\beta}} - \boldsymbol{r} \right)^\top \left[ \mathbf{L} \mathbb{V}{\rm ar} (\widehat{\boldsymbol{\beta}}) \mathbf{L}^\top \right]^{-1} \left( \mathbf{L} \widehat{\boldsymbol{\beta}} - \boldsymbol{r} \right) \sim \chi^2_M$

Note that we do not know the true $$\mathbb{V}{\rm ar} (\widehat{\boldsymbol{\beta}})$$. If the sample size is large, then the estimated $$\widehat{\sigma}^2$$ is close to the true population variance and the Wald test may be applicable.

##### 4.2.4.4.2$$F$$-test for Multiple Linear Restrictions
In practice, if we replace $$\sigma^2$$ with $$\widehat{\sigma}^2$$, and divide the statistic by the number of restrictions, $$M$$ - which is applicable for smaller samples - we get the following $$F-statistic$$:

$F = \dfrac{1}{M} \left( \mathbf{L} \widehat{\boldsymbol{\beta}} - \boldsymbol{r} \right)^\top \left[ \mathbf{L} \widehat{\mathbb{V}{\rm ar}} (\widehat{\boldsymbol{\beta}}) \mathbf{L}^\top \right]^{-1} \left( \mathbf{L} \widehat{\boldsymbol{\beta}} - \boldsymbol{r} \right) \sim F_{(M, N - (k+1))}$

or, alternatively: $F = \dfrac{(\text{RSS}_R - \text{RSS}_{UR}) / M}{\text{RSS}_{UR} / (N-(k+1))} = \dfrac{(R^2_{UR} - R^2_{R}) / M}{(1 - R^2_{UR}) / (N-(k+1))} \sim F_{(M, N-(k+1))}$ where $$\text{RSS}_R$$ is the residual sum of squares of the restricted model (i.e. under the null hypothesis), and $$\text{RSS}_{UR}$$ is the residual sum of squares of the unrestricted model (i.e. under the alternative hypothesis).

Take note that, regardless of the restrictions, the $$R$$-squared must still be calculated for the same dependent variable. E.g. setting a restriction that $$\beta_j = 1$$ would allow us to create a restricted model on $$Y_i - X_{j,i}$$, instead of $$Y_i$$. This may be software-dependent, so it may sometimes be a good idea to examine the fitted-values, or compare to manually calculated estimates, to make sure that the same dependent variable is estimated in both cases.

Regardless of the chosen formula, we then need to calculate the relevant $$F_c = F_{(1 - \alpha, M, N-(k+1))}$$ and the associated $$p$$-value.

Example 4.16 If we have the following multiple regression:

$Y_i = \beta_0 + \beta_1 X_{1i} + \beta_2 X_{2i} + \beta_3 X_{3i} + \beta_4 X_{4i} + \beta_5 X_{i5} + \epsilon_i$ We may be interested in testing if two pairs of parameters have the same pair-wise effect on $$Y$$: $\begin{cases} H_0&: \beta_2 = \beta_3, \quad \beta_4 = 2 \cdot \beta_5\\ H_1&: \beta_2 - \beta_3 \neq 0\quad \text{or}\quad \beta_4 - 2 \cdot \beta_5 \neq 0\quad \text{or both} \end{cases}$ (Note: we have written the alternative hypothesis differently to highlight how we are going to specify the $$\mathbf{L}$$ matrix).

In this case, our constraint matrix and the associated value vector is: $\mathbf{L} = \begin{bmatrix} 0 & 0 & 1 & -1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & -2 \end{bmatrix},\quad \boldsymbol{r} = \begin{bmatrix} 0 \\ 0\\ \end{bmatrix}$

Example 4.17 The joint hypothesis for model parameter significance hypothesis testing is equivalent to $$k$$ different restrictions:

$\begin{cases} H_0&: \beta_1 = ... = \beta_k = 0\\ H_1&: \beta_j \neq 0, \text{ for some } j \end{cases}$ Our constraint matrix and the associated value vector is: $\mathbf{L} = \begin{bmatrix} 0 & 1 & 0 & 0 & ... & 0 \\ 0 & 0 & 1 & 0 & ... & 0 \\ \vdots & \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & 0 & ... & 1 \end{bmatrix},\quad \boldsymbol{r} = \begin{bmatrix} 0 \\ 0\\ \vdots \\ 0 \end{bmatrix}$ Note that this is equivalent to the joint hypothesis test for multiple coefficient significance.

What happens if we cannot reject the null hypothesis? This means that there is a significant linear combination of some of our model parameters.

We would like to try to incorporate this information into our coefficient estimator. We can do this by carrying out a Restricted Least Squares (or Constrained Least Squares) procedure.

If we ignore this information, then the OLS estimates are still unbiased but not as effective as RLS.

On the other hand, how do we even know what kind of restrictions we should be testing for? Generally, we may sometimes want to impose (close-to-)zero-value restrictions on some specific coefficients, which we cannot easily remove from the model specification.

Example 4.18 We can test the following hypothesis with an $$F$$-test, instead of a $$t$$-test:

$\begin{cases} H_0&: \beta_2 = -3\\ H_1&: \beta_2 \neq -3 \end{cases}$

car::linearHypothesis(lm_R, "log(x2) = -3")
## Linear hypothesis test
##
## Hypothesis:
## log(x2) = - 3
##
## Model 1: restricted model
## Model 2: log(y) ~ x1 + log(x2) + married + age_gr1 + age_gr2 + age_gr2 *
##     x1 + married * age_gr1
##
##   Res.Df    RSS Df  Sum of Sq      F Pr(>F)
## 1    993 2.4318
## 2    992 2.4314  1 0.00038582 0.1574 0.6916
print(lm_R.f_test("np.log(x2) = -3"))
## <F test: F=array([[0.87287713]]), p=0.3503873845345189, df_denom=992, df_num=1>

The advantage of the $$t$$-test is that we can do it directly from the usual regression output, even if we weren’t sure whether we would need to perform any hypothesis testing.

The relationship between a $$t$$-statistic and an $$F$$-statistic that has one degree of freedom in the numerator (i.e. one restriction) is: $F_{(1, N - (k+1))} = t^2_{(N - (k+1))}$

If we look at the squared $$t$$-statistic from our previous $$t$$-test:

print(summary(multcomp::glht(lm_ht, linfct = c("log_x2 = -3")))$test$tstat^2)
##    log_x2
## 0.1574121
print(lm_ht.t_test("np.log(x2) = -3").statistic**2)
## [[0.87287713]]

we see that it is the same as the $$F$$-statistic from our $$F$$-test.

Example 4.19 Say we want to test two linear restrictions:

$\begin{cases} H_0&: \beta_2 = -3,\ \beta_3 + \beta_6 = 0.09, \\ H_1&: \beta_2 \neq -3, \text{ or } \beta_3 + \beta_6 \neq 0.09, \text{ or both} \end{cases}$

We will first show how to carry out this test manually. We begin by specifying the linear restriction matrix and the value vector for $$M = 2$$ restrictions: $\mathbf{L} = \begin{bmatrix} 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 1 & 0 & 0 & 1 & 0 \end{bmatrix},\quad \boldsymbol{r} = \begin{bmatrix} -3 \\ 0.09\\ \end{bmatrix}$

L_mat <- matrix(0, nrow = 2, ncol = length(beta_est))
L_mat[1, 3] <- 1
L_mat[2, c(4, 7)] <- 1
r_vec <- c(-3, 0.09)
#
print(L_mat)
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
## [1,]    0    0    1    0    0    0    0    0
## [2,]    0    0    0    1    0    0    1    0
L_mat = np.vstack([[0]*len(beta_est.index), [0]*len(beta_est.index)])
L_mat[0][2] = 1
L_mat[1][[3, 6]] = 1
r_vec = np.array([-3, 0.09])
#
print(L_mat)
## [[0 0 1 0 0 0 0 0]
##  [0 0 0 1 0 0 1 0]]
print(r_vec)
## [1] -3.00  0.09
print(r_vec)
## [-3.    0.09]

Next, since we have already estimated the variance-covariance matrix of our beta_est variable, we can calculate the $$F$$-statistic:

M <- 2
F_stat_mult_lin <- 1 / M * t(L_mat %*% beta_est - r_vec)
F_stat_mult_lin <- F_stat_mult_lin %*% solve(L_mat %*% beta_vcov %*% t(L_mat))
F_stat_mult_lin <- F_stat_mult_lin %*% (L_mat %*% beta_est - r_vec)
F_stat_mult_lin <- c(F_stat_mult_lin)
#
p_val_F_mult_lin <- pf(F_stat_mult_lin,
df1 = M, df2 = nrow(data_mat) - length(beta_est_UR),
lower.tail = FALSE)
print(cbind(F_stat_mult_lin, p_val_F_mult_lin))
##      F_stat_mult_lin p_val_F_mult_lin
## [1,]        1.474699        0.2293498
M = 2
F_stat_mult_lin = 1 / M * np.transpose(np.dot(L_mat, beta_est) - np.vstack(r_vec))
F_stat_mult_lin = np.dot(F_stat_mult_lin,
np.linalg.inv(np.dot(np.dot(L_mat, beta_vcov), np.transpose(L_mat))))
F_stat_mult_lin = np.dot(F_stat_mult_lin, np.dot(L_mat, beta_est) - np.vstack(r_vec))
#
p_val_F_mult_lin = stats.f.sf(F_stat_mult_lin,
dfn = M, dfd = len(data_mat.index) - len(beta_est.index))
print(pd.DataFrame(np.column_stack([F_stat_mult_lin, p_val_F_mult_lin]),
columns = ["F_stat_mult_lin", "p_val_F_mult_lin"]))
##    F_stat_mult_lin  p_val_F_mult_lin
## 0         9.390033          0.000091

We can also do this automatically:

car::linearHypothesis(lm_R, c("log(x2) = -3", "married + x1:age_gr2 = 0.09"))
## Linear hypothesis test
##
## Hypothesis:
## log(x2) = - 3
## married  + x1:age_gr2 = 0.09
##
## Model 1: restricted model
## Model 2: log(y) ~ x1 + log(x2) + married + age_gr1 + age_gr2 + age_gr2 *
##     x1 + married * age_gr1
##
##   Res.Df    RSS Df Sum of Sq      F Pr(>F)
## 1    994 2.4386
## 2    992 2.4314  2 0.0072291 1.4747 0.2293

We have no grounds to reject the null hypothesis ($$p-value > 0.05$$).

print(lm_ht.f_test("np.log(x2) = -3, married + age_gr2:x1 = 0.09"))
## <F test: F=array([[9.39003313]]), p=9.121826098238056e-05, df_denom=992, df_num=2>

We reject the null hypothesis ($$p-value < 0.05$$), however, specifying with $$r_2 = 0.1$$ (which is the true value of the sum $$\beta_3 + \beta_6$$), instead of $$0.09$$:

print(lm_ht.f_test("np.log(x2) = -3, married + age_gr2:x1 = 0.1"))
## <F test: F=array([[2.21166713]]), p=0.1100576650116352, df_denom=992, df_num=2>

yields $$p-value > 0.05$$, so we have no grounds to reject the null hypothesis: $$H_0: \beta_2 = -3,\ \beta_3 + \beta_6 = 0.1$$.

We can also carry out the Wald test:

Wald_stat = F_stat_mult_lin * M
#
p_val_Wald <- pchisq(Wald_stat, df = M, lower.tail = FALSE)
#
print(cbind(Wald_stat, p_val_Wald))
##      Wald_stat p_val_Wald
## [1,]  2.949399  0.2288475
Wald_stat = F_stat_mult_lin * M
#
p_val_Wald = stats.chi2.sf(Wald_stat, df = M)
print(pd.DataFrame(np.column_stack([Wald_stat, p_val_Wald]),
columns = ["Wald_stat", "p_val_Wald"]))
##    Wald_stat  p_val_Wald
## 0  18.780066    0.000084

and we can compare it to the built-in functions:

car::linearHypothesis(lm_R, c("log(x2) = -3", "married + x1:age_gr2 = 0.09"), test = "Chisq")
## Linear hypothesis test
##
## Hypothesis:
## log(x2) = - 3
## married  + x1:age_gr2 = 0.09
##
## Model 1: restricted model
## Model 2: log(y) ~ x1 + log(x2) + married + age_gr1 + age_gr2 + age_gr2 *
##     x1 + married * age_gr1
##
##   Res.Df    RSS Df Sum of Sq  Chisq Pr(>Chisq)
## 1    994 2.4386
## 2    992 2.4314  2 0.0072291 2.9494     0.2288
print(lm_ht.wald_test("np.log(x2) = -3, married + age_gr2:x1 = 0.09", use_f = False))
## <Wald test (chi2): statistic=[[18.78006626]], p-value=8.355268804528794e-05, df_denom=2>

In this example, the Wald test results give the same conclusions as the $$F$$-test.

### 4.2.5 Built-in Estimation Functions

Finally, for comparison, we can use the built-in functions to estimate the relevant model:

#### 4.2.5.1 Parameter OLS Estimation, Significance Testing

mdl <- lm(log(y) ~ x1 + log(x2) + married + age_gr1 + age_gr2 + age_gr2 * x1 + married * age_gr1,
data = data_mat)
print(summary(mdl))
##
## Call:
## lm(formula = log(y) ~ x1 + log(x2) + married + age_gr1 + age_gr2 +
##     age_gr2 * x1 + married * age_gr1, data = data_mat)
##
## Residuals:
##       Min        1Q    Median        3Q       Max
## -0.129248 -0.034729  0.001534  0.034763  0.163707
##
## Coefficients:
##                   Estimate Std. Error  t value Pr(>|t|)
## (Intercept)      4.0101765  0.0122819  326.511  < 2e-16 ***
## x1               0.1590640  0.0009389  169.416  < 2e-16 ***
## log(x2)         -3.0023689  0.0059708 -502.843  < 2e-16 ***
## married          0.0466433  0.0038756   12.035  < 2e-16 ***
## age_gr1          0.0247338  0.0051786    4.776 2.06e-06 ***
## age_gr2         -0.1468251  0.0170318   -8.621  < 2e-16 ***
## x1:age_gr2       0.0503657  0.0016449   30.620  < 2e-16 ***
## married:age_gr1 -0.0267870  0.0066062   -4.055 5.41e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.04951 on 992 degrees of freedom
## Multiple R-squared:  0.997,  Adjusted R-squared:  0.9969
## F-statistic: 4.647e+04 on 7 and 992 DF,  p-value: < 2.2e-16
mdl = smf.ols(formula='np.log(y) ~ x1 + np.log(x2) + married + age_gr1 + age_gr2 + age_gr2 * x1 + married * age_gr1',
data = data_mat).fit()
print(mdl.summary())
##                             OLS Regression Results
## ==============================================================================
## Dep. Variable:              np.log(y)   R-squared:                       0.997
## Model:                            OLS   Adj. R-squared:                  0.997
## Method:                 Least Squares   F-statistic:                 4.224e+04
## Date:                Tue, 13 Oct 2020   Prob (F-statistic):               0.00
## Time:                        21:40:50   Log-Likelihood:                 1580.2
## No. Observations:                1000   AIC:                            -3144.
## Df Residuals:                     992   BIC:                            -3105.
## Df Model:                           7
## Covariance Type:            nonrobust
## ===================================================================================
##                       coef    std err          t      P>|t|      [0.025      0.975]
## -----------------------------------------------------------------------------------
## Intercept           3.9994      0.013    305.863      0.000       3.974       4.025
## x1                  0.1587      0.001    155.526      0.000       0.157       0.161
## np.log(x2)         -2.9943      0.006   -490.546      0.000      -3.006      -2.982
## married             0.0568      0.004     14.552      0.000       0.049       0.064
## age_gr1             0.0246      0.005      4.807      0.000       0.015       0.035
## age_gr2            -0.1595      0.018     -8.933      0.000      -0.195      -0.124
## age_gr2:x1          0.0514      0.002     29.549      0.000       0.048       0.055
## married:age_gr1    -0.0412      0.007     -6.167      0.000      -0.054      -0.028
## ==============================================================================
## Omnibus:                        0.209   Durbin-Watson:                   1.934
## Prob(Omnibus):                  0.901   Jarque-Bera (JB):                0.269
## Skew:                          -0.029   Prob(JB):                        0.874
## Kurtosis:                       2.944   Cond. No.                         137.
## ==============================================================================
##
## Warnings:
## [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
print(np.round(mdl.scale, 5)) # In statsmodels scale is the residual VARIANCE (no std. deviation)
## 0.0025

We can also extract the variance-covariance matrix of the parameters:

print(vcov(mdl))
##                   (Intercept)            x1       log(x2)       married       age_gr1       age_gr2    x1:age_gr2 married:age_gr1
## (Intercept)      1.508456e-04 -8.758739e-06 -4.316298e-05 -9.329513e-06 -9.713703e-06 -8.840137e-05  8.300376e-06    5.680298e-06
## x1              -8.758739e-06  8.815287e-07  7.508965e-09  1.431190e-07 -7.153594e-08  8.664538e-06 -8.803845e-07    5.500742e-08
## log(x2)         -4.316298e-05  7.508965e-09  3.565033e-05 -1.114429e-07 -1.090457e-06 -3.808553e-06  3.122693e-07    1.503668e-06
## married         -9.329513e-06  1.431190e-07 -1.114429e-07  1.502027e-05  8.035502e-06  6.299750e-07 -3.112915e-08   -1.499250e-05
## age_gr1         -9.713703e-06 -7.153594e-08 -1.090457e-06  8.035502e-06  2.681785e-05  6.419738e-06  1.221773e-07   -2.313111e-05
## age_gr2         -8.840137e-05  8.664538e-06 -3.808553e-06  6.299750e-07  6.419738e-06  2.900805e-04 -2.727918e-05    1.165977e-06
## x1:age_gr2       8.300376e-06 -8.803845e-07  3.122693e-07 -3.112915e-08  1.221773e-07 -2.727918e-05  2.705623e-06   -1.542676e-07
## married:age_gr1  5.680298e-06  5.500742e-08  1.503668e-06 -1.499250e-05 -2.313111e-05  1.165977e-06 -1.542676e-07    4.364213e-05
print(mdl.cov_params())
##                  Intercept            x1    np.log(x2)       married       age_gr1       age_gr2    age_gr2:x1  married:age_gr1
## Intercept         0.000171 -1.034165e-05 -4.488210e-05 -8.991604e-06 -1.301488e-05 -1.129952e-04  1.050144e-05     9.725488e-06
## x1               -0.000010  1.041272e-06 -9.393588e-08 -2.159422e-08  1.568162e-07  1.046432e-05 -1.040926e-06    -2.008797e-07
## np.log(x2)       -0.000045 -9.393588e-08  3.725894e-05  1.148208e-06 -3.884450e-07 -2.451767e-07 -1.344392e-08     7.167193e-08
## married          -0.000009 -2.159422e-08  1.148208e-06  1.524325e-05  7.781495e-06  3.619783e-07 -4.256679e-08    -1.520171e-05
## age_gr1          -0.000013  1.568162e-07 -3.884450e-07  7.781495e-06  2.624813e-05  9.775666e-06 -1.868810e-07    -2.213004e-05
## age_gr2          -0.000113  1.046432e-05 -2.451767e-07  3.619783e-07  9.775666e-06  3.189139e-04 -3.030426e-05    -2.575233e-06
## age_gr2:x1        0.000011 -1.040926e-06 -1.344392e-08 -4.256679e-08 -1.868810e-07 -3.030426e-05  3.023966e-06     2.615093e-07
## married:age_gr1   0.000010 -2.008797e-07  7.167193e-08 -1.520171e-05 -2.213004e-05 -2.575233e-06  2.615093e-07     4.462778e-05

We can compare it to the manually calculated variance-covariance matrix:

# manually calculatd matrix
print(beta_vcov)
##                     intercept            x1        log_x2       married       age_gr1       age_gr2    age_gr2_x1 married_age_gr1
## intercept        1.508456e-04 -8.758739e-06 -4.316298e-05 -9.329513e-06 -9.713703e-06 -8.840137e-05  8.300376e-06    5.680298e-06
## x1              -8.758739e-06  8.815287e-07  7.508965e-09  1.431190e-07 -7.153594e-08  8.664538e-06 -8.803845e-07    5.500742e-08
## log_x2          -4.316298e-05  7.508965e-09  3.565033e-05 -1.114429e-07 -1.090457e-06 -3.808553e-06  3.122693e-07    1.503668e-06
## married         -9.329513e-06  1.431190e-07 -1.114429e-07  1.502027e-05  8.035502e-06  6.299750e-07 -3.112915e-08   -1.499250e-05
## age_gr1         -9.713703e-06 -7.153594e-08 -1.090457e-06  8.035502e-06  2.681785e-05  6.419738e-06  1.221773e-07   -2.313111e-05
## age_gr2         -8.840137e-05  8.664538e-06 -3.808553e-06  6.299750e-07  6.419738e-06  2.900805e-04 -2.727918e-05    1.165977e-06
## age_gr2_x1       8.300376e-06 -8.803845e-07  3.122693e-07 -3.112915e-08  1.221773e-07 -2.727918e-05  2.705623e-06   -1.542676e-07
## married_age_gr1  5.680298e-06  5.500742e-08  1.503668e-06 -1.499250e-05 -2.313111e-05  1.165977e-06 -1.542676e-07    4.364213e-05
# manually calculatd matrix
print(pd.DataFrame(beta_vcov, columns = x_col_names, index = x_col_names))
##                  intercept            x1        log_x2       married       age_gr1       age_gr2    age_gr2_x1  married_age_gr1
## intercept         0.000171 -1.034165e-05 -4.488210e-05 -8.991604e-06 -1.301488e-05 -1.129952e-04  1.050144e-05     9.725488e-06
## x1               -0.000010  1.041272e-06 -9.393588e-08 -2.159422e-08  1.568162e-07  1.046432e-05 -1.040926e-06    -2.008797e-07
## log_x2           -0.000045 -9.393588e-08  3.725894e-05  1.148208e-06 -3.884450e-07 -2.451767e-07 -1.344392e-08     7.167193e-08
## married          -0.000009 -2.159422e-08  1.148208e-06  1.524325e-05  7.781495e-06  3.619783e-07 -4.256679e-08    -1.520171e-05
## age_gr1          -0.000013  1.568162e-07 -3.884450e-07  7.781495e-06  2.624813e-05  9.775666e-06 -1.868810e-07    -2.213004e-05
## age_gr2          -0.000113  1.046432e-05 -2.451767e-07  3.619783e-07  9.775666e-06  3.189139e-04 -3.030426e-05    -2.575233e-06
## age_gr2_x1        0.000011 -1.040926e-06 -1.344392e-08 -4.256679e-08 -1.868810e-07 -3.030426e-05  3.023966e-06     2.615093e-07
## married_age_gr1   0.000010 -2.008797e-07  7.167193e-08 -1.520171e-05 -2.213004e-05 -2.575233e-06  2.615093e-07     4.462778e-05

We can compare the relevant values: the estimated coefficients and their standard errors, the corresponding $$t$$-value for the null hypothesis tests, the residual standard error, degrees of freedom and the $$F$$-statistic for the hypothesis that all the slope coefficients (i.e. excluding $$\beta_0$$) are not statistically significantly different from zero, against the alternative that at least one is statistically significantly different.

#### 4.2.5.2 Categorical Data Handling

Additionally, we have a categorical variable in our data_mat matrix - it is the age_group variable. Instead of manually calculating the dummy indicator variables, we can directly include it in the model:

# Specify the model with categorical variable and interaction terms:
mdl2 <- lm(log(y) ~ x1 + log(x2) + married + age_group + age_group * x1 + married * age_group,
data = data_mat)
print(summary(mdl2))
##
## Call:
## lm(formula = log(y) ~ x1 + log(x2) + married + age_group + age_group *
##     x1 + married * age_group, data = data_mat)
##
## Residuals:
##       Min        1Q    Median        3Q       Max
## -0.130645 -0.033463  0.001745  0.034883  0.159463
##
## Coefficients:
##                              Estimate Std. Error  t value Pr(>|t|)
## (Intercept)                  4.025357   0.015556  258.762  < 2e-16 ***
## x1                           0.157716   0.001321  119.403  < 2e-16 ***
## log(x2)                     -3.002311   0.005969 -502.990  < 2e-16 ***
## married                      0.042931   0.005480    7.834 1.21e-14 ***
## age_groupaged_20_30         -0.003679   0.019615   -0.188  0.85125
## age_groupaged_31_65         -0.164027   0.019968   -8.215 6.61e-16 ***
## x1:age_groupaged_20_30       0.002663   0.001879    1.417  0.15668
## x1:age_groupaged_31_65       0.051739   0.001890   27.378  < 2e-16 ***
## married:age_groupaged_20_30 -0.022777   0.007664   -2.972  0.00303 **
## married:age_groupaged_31_65  0.007016   0.007754    0.905  0.36578
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.04949 on 990 degrees of freedom
## Multiple R-squared:  0.997,  Adjusted R-squared:  0.9969
## F-statistic: 3.617e+04 on 9 and 990 DF,  p-value: < 2.2e-16
# Specify the model with categorical variable and interaction terms:
mdl2 = smf.ols(formula = "np.log(y) ~ x1 + np.log(x2) + married + age_group + age_group * x1 + married * age_group",
data = data_mat).fit()
print(mdl2.summary2())
##                          Results: Ordinary least squares
## =================================================================================
## Model:                    OLS                  Adj. R-squared:         0.997
## Dependent Variable:       np.log(y)            AIC:                    -3141.2307
## Date:                     2020-10-13 21:40     BIC:                    -3092.1532
## No. Observations:         1000                 Log-Likelihood:         1580.6
## Df Model:                 9                    F-statistic:            3.281e+04
## Df Residuals:             990                  Prob (F-statistic):     0.00
## R-squared:                0.997                Scale:                  0.0025060
## ---------------------------------------------------------------------------------
##                                  Coef.  Std.Err.     t     P>|t|   [0.025  0.975]
## ---------------------------------------------------------------------------------
## Intercept                        3.9918   0.0165  241.5041 0.0000  3.9593  4.0242
## age_group[T.aged_20_30]          0.0410   0.0211    1.9460 0.0519 -0.0003  0.0824
## age_group[T.aged_31_65]         -0.1524   0.0208   -7.3269 0.0000 -0.1932 -0.1116
## x1                               0.1595   0.0014  112.1082 0.0000  0.1567  0.1623
## age_group[T.aged_20_30]:x1      -0.0017   0.0020   -0.8397 0.4013 -0.0057  0.0023
## age_group[T.aged_31_65]:x1       0.0505   0.0020   25.2403 0.0000  0.0466  0.0545
## np.log(x2)                      -2.9943   0.0061 -490.2445 0.0000 -3.0063 -2.9823
## married                          0.0555   0.0056    9.8385 0.0000  0.0444  0.0665
## married:age_group[T.aged_20_30] -0.0397   0.0078   -5.0669 0.0000 -0.0550 -0.0243
## married:age_group[T.aged_31_65]  0.0026   0.0078    0.3298 0.7417 -0.0128  0.0179
## ---------------------------------------------------------------------------------
## Omnibus:                   0.170              Durbin-Watson:                1.929
## Prob(Omnibus):             0.919              Jarque-Bera (JB):             0.224
## Skew:                      -0.027             Prob(JB):                     0.894
## Kurtosis:                  2.950              Condition No.:                212
## =================================================================================

On the other hand, if we want to remove insignificant interaction terms and keep only the significant ones - we will need to use our indicator variables:

# Remove irrelevant interaction terms
mdl3 <- lm(log(y) ~ x1 + log(x2) + married + age_group + age_gr2:x1 + married:age_gr1,
data = data_mat)
print(summary(mdl3))
##
## Call:
## lm(formula = log(y) ~ x1 + log(x2) + married + age_group + age_gr2:x1 +
##     married:age_gr1, data = data_mat)
##
## Residuals:
##       Min        1Q    Median        3Q       Max
## -0.129248 -0.034729  0.001534  0.034763  0.163707
##
## Coefficients:
##                       Estimate Std. Error  t value Pr(>|t|)
## (Intercept)          4.0101765  0.0122819  326.511  < 2e-16 ***
## x1                   0.1590640  0.0009389  169.416  < 2e-16 ***
## log(x2)             -3.0023689  0.0059708 -502.843  < 2e-16 ***
## married              0.0466433  0.0038756   12.035  < 2e-16 ***
## age_groupaged_20_30  0.0247338  0.0051786    4.776 2.06e-06 ***
## age_groupaged_31_65 -0.1468251  0.0170318   -8.621  < 2e-16 ***
## x1:age_gr2           0.0503657  0.0016449   30.620  < 2e-16 ***
## married:age_gr1     -0.0267870  0.0066062   -4.055 5.41e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.04951 on 992 degrees of freedom
## Multiple R-squared:  0.997,  Adjusted R-squared:  0.9969
## F-statistic: 4.647e+04 on 7 and 992 DF,  p-value: < 2.2e-16
# Remove irrelevant interaction terms
mdl3 = smf.ols(formula = "np.log(y) ~ x1 + np.log(x2) + married + age_group + age_gr2:x1 + married:age_gr1",
data = data_mat).fit()
print(mdl3.summary2())
##                      Results: Ordinary least squares
## =========================================================================
## Model:                OLS                Adj. R-squared:       0.997
## Dependent Variable:   np.log(y)          AIC:                  -3144.4117
## Date:                 2020-10-13 21:40   BIC:                  -3105.1497
## No. Observations:     1000               Log-Likelihood:       1580.2
## Df Model:             7                  F-statistic:          4.224e+04
## Df Residuals:         992                Prob (F-statistic):   0.00
## R-squared:            0.997              Scale:                0.0025030
## -------------------------------------------------------------------------
##                          Coef.  Std.Err.     t     P>|t|   [0.025  0.975]
## -------------------------------------------------------------------------
## Intercept                3.9994   0.0131  305.8630 0.0000  3.9738  4.0251
## age_group[T.aged_20_30]  0.0246   0.0051    4.8070 0.0000  0.0146  0.0347
## age_group[T.aged_31_65] -0.1595   0.0179   -8.9329 0.0000 -0.1946 -0.1245
## x1                       0.1587   0.0010  155.5255 0.0000  0.1567  0.1607
## np.log(x2)              -2.9943   0.0061 -490.5459 0.0000 -3.0063 -2.9823
## married                  0.0568   0.0039   14.5523 0.0000  0.0492  0.0645
## age_gr2:x1               0.0514   0.0017   29.5491 0.0000  0.0480  0.0548
## married:age_gr1         -0.0412   0.0067   -6.1671 0.0000 -0.0543 -0.0281
## -------------------------------------------------------------------------
## Omnibus:                 0.209           Durbin-Watson:             1.934
## Prob(Omnibus):           0.901           Jarque-Bera (JB):          0.269
## Skew:                    -0.029          Prob(JB):                  0.874
## Kurtosis:                2.944           Condition No.:             137
## =========================================================================
##### 4.2.5.2.1 Consequences of removing insignificant indicator variable interaction terms

Generally, if some interaction terms are insignificant as additional explanatory variables, removing them from the model changes the interpretation of the base category. For example, in the following model: $\log(Y_i) = \beta_0 + \beta_1 X_{1i} + \beta_2 \log(X_{2i}) + \beta_3 \text{AGE}\_\text{GROUP}_{1i} + \beta_4 \text{AGE}\_\text{GROUP}_{2i} + \beta_5 (\text{AGE}\_\text{GROUP}_{2i} \times X_{1i}) + \epsilon_i$ since our base age category is $$\text{AGE}\_\text{GROUP}_{OTHER}$$, excluding the interaction term $$(\text{AGE}\_\text{GROUP}_{1i} \times X_{1i})$$ implies that the coefficient $$\beta_5$$ of the interaction variable now compares to a different base group: $$\text{AGE}\_\text{GROUP}_{OTHER} + \text{AGE}\_\text{GROUP}_{1}$$.

In other words:

• coefficients $$\beta_3$$ and $$\beta_4$$ are the effects of different age groups compared to the $$\text{AGE}\_\text{GROUP}_{OTHER}$$ group;
• coefficient $$\beta_5$$ is the additional effect of a unit increase in $$X_1$$ from people in $$\text{AGE}\_\text{GROUP}_{2}$$, compared to the remaining two groups $$\text{AGE}\_\text{GROUP}_{OTHER} + \text{AGE}\_\text{GROUP}_{1}$$.

If we were to leave the interaction variable $$(\text{AGE}\_\text{GROUP}_{1i} \times X_{1i})$$, then $$\beta_5$$ would be the additional effect of a unit increase in $$X_1$$ from people in $$\text{AGE}\_\text{GROUP}_{1}$$, compared to the base age group $$\text{AGE}\_\text{GROUP}_{OTHER}$$.

For this reason, if there are many insignificant interaction terms, or insignificant category levels, we need to decide on the following:

• Re-categorize the data - combine some categories together, or combine insignificant categories into a new, other, category, as long as that grouping makes economic sense.
• Leave the groups as they are, along with insignificant interaction terms and/or insignificant category levels. This makes interpretation consistent, since we will have the same base group for individual category effects and interaction effects.

Finally, inclusion of $$\text{MARRIED}$$ indicator variable, indicating whether a person is married, would compare to the base group of unmarried people, regardless of their age. Consequently, the base age group $$\text{AGE}\_\text{GROUP}_{OTHER}$$ ignores whether a person is single or married - it only indicates their age group.

In other words, the categorical variables $$\text{MARRIED}$$ (with its two categories) and $$\text{AGE}\_\text{GROUP}$$ (with its three categories) will have base groups, that are not necessarily interpreted the same - marriage status does not take into account age, and age does not take into account marriage status.