3.6 Hypothesis Testing
We have seen how to calculate OLS estimates and evaluate their confidence intervals (which we can also interpret). However, in practice, we usually want to answer very specific questions about the effects of specific variables:
- Does income affect expenditure?
- Do more years in education lead to an increase in wage?
Hypothesis tests use the information about a parameter from the sample data to answer such \(yes/no\) questions (though not necessarily in such strong certainty).
Hypothesis Tests consist of:
- Specification of the null hypothesis, \(H_0\);
- Specification of the alternative hypothesis, \(H_1\);
- Specification of the test statistic and its distribution under the null hypothesis;
- Selection of the significance level \(\alpha\) in order to determine the rejection region;
- Calculation of the test statistic from the data sample;
- Conclusions, which are based on the test statistic and the rejection region;
We will look into each point separately.
As we have done throughout this chapter, assume that our linear regression model is of the following form: \[ \mathbf{Y} = \mathbf{X} \boldsymbol{\beta} + \boldsymbol{\varepsilon},\text{ where } \boldsymbol{\beta} = \begin{bmatrix} \beta_0 \\ \beta_1 \end{bmatrix} \] and assume that (UR.1)-(UR.4) hold true.
3.6.1 The Null Hypothesis
The null hypothesis is denoted by \(H_0\), and for the univariate regression can be stated as: \[ H_0: \beta_i = c \] where \(c\) is a constant value, which we are interested in. When testing the null hypothesis, we may either reject or fail to reject the null hypothesis.
The null hypothesis, is presumed to be true, until the data provides sufficient evidence that it is not.
If we fail to reject the null hypothesis, it does not mean the null hypothesis is true. A hypothesis test does not determine which hypothesis is true, or which is most likely: it only assesses whether available evidence exists to reject the null hypothesis.
3.6.2 The Alternative Hypothesis
Once, we state our null hypothesis, we must test it against and alternative hypothesis, denoted \(H_1\).
For the null hypothesis \(H_0: \beta_i = c\) we may specify the alternative hypothesis in thee possible ways:
- \(H_1: \beta_i > c\) - rejecting \(H_0\), leads us to “accept” the conclusion that \(\beta_i > c\). Economic theory frequently provides information about the signs of the variable parameters. For example: economic theory strongly suggests that food expenditure will rise if income increases, so we would test \(H_0: \beta_{INCOME} = 0\) against \(H_1: \beta_{INCOME} > 0\).
- \(H_1: \beta_i < c\) - rejecting \(H_0\), leads us to “accept” the conclusion that \(\beta_i < c\).
- \(H_1: \beta_i \neq c\) - rejecting \(H_0\), leads us to “accept” the conclusion that \(\beta_i\) is either greater or smaller than \(c\).
We usually talk about hypothesis testing in terms of the null, i.e. we either reject or fail to reject the null - we never accept the null. As such, if we reject the null, then we “accept” (i.e. we are left with) the alternative.
3.6.3 The Test Statistic
The test statistic is calculated under the null hypothesis (i.e. assuming the null hypothesis is true). Under the null hypothesis the distribution of the statistic is known. Based on the value of the test statistic, we decide whether to reject, or fail to reject the null.
Under the null hypothesis \(H_0: \beta_i = c\) of our univariate regression model, we can calculate the following \(t\)-statistic: \[ t_i = \dfrac{\widehat{\beta}_i - c}{\text{se}(\widehat{\beta}_i)} \sim t_{(N-2)} \] If the null hypothesis is not true, then the \(t\)-statistic does not have a \(t\)-distribution with \(N-2\) degrees of freedom, but some other distribution.
3.6.4 The Rejection Regions
The rejection region consists of values that have low probability of occurring when the null hypothesis is true. The rejection region depends on the specification of the alternative hypothesis. If the calculated test statistic value falls in the rejection region (i.e. an unlikely event to occur under the null), then it is unlikely that the null hypothesis is holds.
The size of the rejection regions are determined by choosing a level of significance \(\alpha\) - a probability of the unlikely event, usually \(0.01\), \(0.05\), \(0.1\).
To determine, whether to reject the null hypothesis or not, we will compare the calculated \(t\)-statistic \(t_i\) to the critical value \(t_c\).
3.6.4.1 Type I and Type II Errors
When deciding whether to reject the null hypothesis or not, we may commit one of two types of errors:
- Type I error - to reject the null hypothesis when it is true. The probability of committing Type I error is \(\mathbb{P}(H_0 \text{ rejected} | H_0 \text{ is true}) = \alpha\). Any time we reject the null hypothesis, it is possible that we have made such an error. We can specify the amount of Type I error, that we can tolerate, by setting the level of significance \(\alpha\). If we want to avoid making a Type I error, then we set \(\alpha\) to a very small value.
- Type II error - to not reject the null hypothesis when it is false. We cannot directly calculate the probability of this type of error, since it depends on the unknown parameter \(\beta_i\). However, we do know that by making \(\alpha\) smaller we increase the probability of Type II error.
It is believed that a Type I error is more severe, hence, it is recommended to make the probability \(\alpha\) small.
3.6.5 The \(p\)-value
When reporting the outcome of statistical hypothesis tests, we usually report the \(p\)-value of the test. The \(p\)-value is defined as the probability, under the null hypothesis, of obtaining a result, which is equal to, or more extreme, than what was actually observed.
Having the \(p\)-value allows us to easier determine the outcome of the test, as we do not need to directly compare the critical values.
- If \(p \leq \alpha\), we reject \(H_0\).
- If \(p \geq \alpha\), we do not reject \(H_0\).
3.6.6 One Tail Tests
One tail tests involve testing the null hypothesis against an alternative hypothesis, where the true regression parameter is greater, or \(H_1\) where it is less than the specified constant \(c\) from the null \(H_0: \beta_i = c\).
3.6.6.1 Alternative, \(\gt\)
We are testing the null hypothesis \(H_0: \beta_i = c\) against the alternative \(H_1: \beta_i > c\):
\[ \begin{cases} H_0&: \beta_i = c \\ H_1&: \beta_i > c \end{cases} \]We reject \(H_0\) and accept the alternative \(H_1\), if \(t_i \geq t_{(1 - \alpha, N-2)}\), where \(t_c = t_{(1 - \alpha, N-2)}\).
Regarding \(p\)-value - it is the probability to the right of the calculated \(t\)-statistic and can be calculated as: \[ p\text{-value} = \mathbb{P}(T \geq t_i) = \mathbb{P}(T > t_i) = 1 - \mathbb{P}(T \leq t_i) = 1 - \int_{-\infty}^{t_i} p(x) dx \] where \(p(x)\) is the density function of the distribution of \(t\)-statistic under the null hypothesis. For the univariate regression, under the null hypothesis it is Students \(t\)-distribution with \(N-2\) degrees of freedom.
## [1] 1.660551
## 1.6605512170440568
The \(p\)-value is:
#
#
# Our calculated t-statistic in this example:
t_stat = 2.5
# The density function of the statistic under the null
dF <- function(x, df){
dt(x, df)
}
# The associated p-value:
p_val <- 1 - integrate(dF, lower = -Inf, upper = t_stat, df = N - 2)$value
print(p_val)
## [1] 0.007039878
import numpy as np
import scipy.integrate as integrate
# Our calculated t-statistic in this example:
t_stat = 2.5
# The density function of the statistic under the null:
def dF(x, df):
return(stats.t.pdf(x, df = N - 2))
#
# The associated p-value:
p_val = 1 - integrate.quad(dF, a = -np.inf, b = t_stat, args = N - 2)[0]
print(p_val)
## 0.00703987768735348
We can verify this with the built-in functions:
## [1] 0.007039878
## 0.007039877687385987
This can also be graphically visualized as:
#
#
#
x <- seq(-4, 4, length = 1e5)
hx <- dt(x, df = N - 2)
# Plot the density
#
plot(x, hx, type = "l", yaxs = "i", ylim = c(0, 0.41),
main = expression("p-value for the right-tail test; "~H[1]~":"~beta[1]>0))
#
# Shade the probability alpha
polygon(c(t_crit, x[x >= t_crit], max(x)), border = "red", lwd = 2,
c(0, hx[x >= t_crit], 0),
col = rgb(1, 0, 0, alpha = 0.2))
# plot the critical value t_c:
abline(v = t_crit, col = "red", lty = 2)
# Shade the probability p-value
polygon(c(t_stat, x[x >= t_stat], max(x)),
c(0, hx[x >= t_stat], 0), border = NA,
col = rgb(0, 1, 0, alpha = 0.4))
# plot the observed t-statictic:
abline(v = t_stat, col = "darkgreen", lty = 2)
#
#
#
#
#
axis(side = 1, at = t_crit,
labels = expression(t[c]), col.axis = "red")
axis(side = 1, at = t_stat,
labels = expression(t[stat]), col.axis ="darkgreen")
import matplotlib.pyplot as plt
import matplotlib.colors as clrs
#
x = np.linspace(start = -4, stop = 4, num = int(1e5))
hx = stats.t.pdf(x, df = N - 2)
# Plot the density
fig, ax = plt.subplots(num = 0, figsize = (10, 8))
_ = plt.plot(x, hx, color = "black")
_ = plt.margins(y = 0)
_ = plt.ylim((0, 0.41))
# Shade the probability alpha
_ = plt.fill_between(x[x >= t_crit], hx[x >= t_crit],
edgecolor = "red", linestyle = "-", linewidth = 2,
facecolor = clrs.to_rgba(c = (1, 0, 0, 0.2)), zorder = 10)
# plot the critical value t_c:
_ = plt.axvline(x = t_crit, color = "red", linestyle = "--")
# Shade the probability p-value
_ = plt.fill_between(x[x >= t_stat], hx[x >= t_stat], edgecolor = None,
color = clrs.to_rgba(c = (0, 1, 0, 0.4)))
# plot the observed t-statictic:
_ = plt.axvline(x = t_stat, color = "darkgreen", linestyle = "--")
#
xt = np.append(ax.get_xticks(), [t_crit, t_stat])
xtl = xt.tolist()
xtl[-2:] = ["$t_c$", "$t_{stat}$"]
_ = ax.set_xticks(xt)
_ = ax.set_xticklabels(xtl)
ax.get_xticklabels()[-2].set_color("red")
ax.get_xticklabels()[-1].set_color("darkgreen")
_ = plt.title("p-value for the right-tail test; $H_1: \\beta_1 > 0$")
plt.show()
We see that the \(p\)-value is less than the \(5\%\) significance level (as well as the fact that the calculated \(t\)-statistic is greater than the critical value), so we reject the null hypothesis and conclude that \(\beta_1\) is statistically significantly greater than zero.
3.6.6.2 Alternative, \(\lt\)
We are testing the null hypothesis \(H_0: \beta_i = c\) against the alternative \(H_1: \beta_i < c\):
\[ \begin{cases} H_0&: \beta_i = c \\ H_1&: \beta_i < c \end{cases} \]
We reject \(H_0\) and accept the alternative \(H_1\), if \(t_i \leq t_{(\alpha, N-2)}\), where \(t_c = t_{(\alpha, N-2)}\).
Regarding \(p\)-value - it is the probability to the left of the calculated \(t\)-statistic and can be calculated as: \[ p\text{-value} = \mathbb{P}(T \leq t_i) = \int_{-\infty}^{t_i} p(x) dx \] where \(p(x)\) is the density function of the distribution of \(t\)-statistic under the null hypothesis. For the univariate regression, under the null hypothesis it is Students \(t\)-distribution with \(N-2\) degrees of freedom.
## [1] -1.660551
## -1.6605512170440575
The \(p\)-value is:
# Our calculated t-statistic in this example:
t_stat = -2.5
# The density function of the statistic under the null
dF <- function(x, df){
dt(x, df)
}
# The associated p-value:
p_val <- integrate(dF, lower = -Inf, upper = t_stat, df = N - 2)$value
print(p_val)
## [1] 0.007039878
# Our calculated t-statistic in this example:
t_stat = -2.5
# The density function of the statistic under the null:
def dF(x, df):
return(stats.t.pdf(x, df = N - 2))
#
# The associated p-value:
p_val = integrate.quad(dF, a = -np.inf, b = t_stat, args = N - 2)[0]
print(p_val)
## 0.0070398776873863665
We can verify this with the built-in functions:
## [1] 0.007039878
## 0.007039877687385987
This can also be graphically visualized as:
x <- seq(-4, 4, length = 1e5)
hx <- dt(x, df = N - 2)
# Plot the density
#
plot(x, hx, type = "l", yaxs = "i", ylim = c(0, 0.41),
main = expression("p-value for the left-tail test; "~H[1]~":"~beta[1]<0))
#
# Shade the probability alpha
polygon(c(min(x), x[x <= t_crit], t_crit), border = "red", lwd = 2,
c(0, hx[x <= t_crit], 0),
col = rgb(1, 0, 0, alpha = 0.2))
# plot the critical value t_c:
abline(v = t_crit, col = "red", lty = 2)
# Shade the probability p-value
polygon(c(min(x), x[x <= t_stat], t_stat),
c(0, hx[x <= t_stat], 0), border = NA,
col = rgb(0, 1, 0, alpha = 0.4))
# plot the observed t-statictic:
abline(v = t_stat, col = "darkgreen", lty = 2)
#
#
#
#
#
axis(side = 1, at = t_crit,
labels = expression(t[c]), col.axis = "red")
axis(side = 1, at = t_stat,
labels = expression(t[stat]), col.axis ="darkgreen")
x = np.linspace(start = -4, stop = 4, num = int(1e5))
hx = stats.t.pdf(x, df = N - 2)
# Plot the density
fig, ax = plt.subplots(num = 1, figsize = (10, 8))
_ = plt.plot(x, hx, color = "black")
_ = plt.margins(y = 0)
_ = plt.ylim((0, 0.41))
# Shade the probability alpha
_ = plt.fill_between(x[x <= t_crit], hx[x <= t_crit],
edgecolor = "red", linestyle = "-", linewidth = 2,
facecolor = clrs.to_rgba(c = (1, 0, 0, 0.2)), zorder = 10)
# plot the critical value t_c:
_ = plt.axvline(x = t_crit, color = "red", linestyle = "--")
# Shade the probability p-value
_ = plt.fill_between(x[x <= t_stat], hx[x <= t_stat], edgecolor = None,
color = clrs.to_rgba(c = (0, 1, 0, 0.4)))
# plot the observed t-statictic:
_ = plt.axvline(x = t_stat, color = "darkgreen", linestyle = "--")
#
xt = np.append(ax.get_xticks(), [t_crit, t_stat])
xtl = xt.tolist()
xtl[-2:] = ["$t_c$", "$t_{stat}$"]
_ = ax.set_xticks(xt)
_ = ax.set_xticklabels(xtl)
ax.get_xticklabels()[-2].set_color("red")
ax.get_xticklabels()[-1].set_color("darkgreen")
_ = plt.title("p-value for the left-tail test; $H_1: \\beta_1 < 0$")
plt.show()
We see that the \(p\)-value is less than the \(5\%\) significance level (as well as the fact that the calculated \(t\)-statistic is less than the critical value), so we reject the null hypothesis and conclude that \(\beta_1\) is statistically significantly less than zero.
3.6.7 Two Tail Tests, \(\neq\)
We are testing the null hypothesis \(H_0: \beta_i = c\) against the alternative \(H_1: \beta_i \neq c\):
\[ \begin{cases} H_0&: \beta_i = c \\ H_1&: \beta_i \neq c \end{cases} \]
We reject \(H_0\) and accept the alternative \(H_1\), if \(t_i \leq t_{(\alpha/2, N-2)}\) or if \(t_i \geq t_{(1 - \alpha/2, N-2)}\).
Regarding \(p\)-value - it is the sum of probabilities to the right of \(|t_i|\) and to the left of \(-|t_i|\) and can be calculated as: \[ \begin{aligned} p\text{-value} &= \mathbb{P}(T \leq -|t_i|) + \mathbb{P}(T \geq |t_i|) \\ &= 1 - \mathbb{P}( -|t_i| \leq T \leq |t_i| )= 1 - \int_{-|t_i|}^{|t_i|} p(x) dx\\ &= 2 \cdot \mathbb{P}(T \leq -|t_i|) = 2 \cdot \int_{-\infty}^{-|t_i|} p(x) dx \\ &= 2 \cdot(1 - \mathbb{P}(T \leq |t_i|)) = 2 \cdot \left( 1 - \int_{-\infty}^{|t_i|} p(x) dx \right) \end{aligned} \] where \(p(x)\) is the density function of the distribution of \(t\)-statistic under the null hypothesis. For the univariate regression, under the null hypothesis it is Students \(t\)-distribution with \(N-2\) degrees of freedom.
## [1] 1.984467
## [1] -1.984467
## 1.984467454426692
## -1.9844674544266925
The \(p\)-value can be computed either way:
# Our calculated t-statistic in this example:
t_stat = -1
# The density function of the statistic under the null
dF <- function(x, df){
dt(x, df)
}
# The associated p-value:
p_val <- 1 - integrate(dF, lower = -abs(t_stat), upper = abs(t_stat), df = N - 2)$value
print(p_val)
## [1] 0.3197733
## [1] 0.3197733
## [1] 0.3197733
# Our calculated t-statistic in this example:
t_stat = -1
# The density function of the statistic under the null:
def dF(x, df):
return(stats.t.pdf(x, df = N - 2))
#
# The associated p-value:
p_val = 1 - integrate.quad(dF, a = -np.abs(t_stat), b = np.abs(t_stat), args = N - 2)[0]
print(p_val)
## 0.3197732875085676
## 0.3197732875085983
p_val = 2 * (1 - integrate.quad(dF, a = -np.inf, b = -np.abs(t_stat), args = N - 2)[0])
print(p_val)
## 1.6802267124914017
We can verify this with the built-in functions:
## [1] 0.3197733
## 0.3197732875085884
## [1] 0.3197733
## 0.31977328750858836
This can also be graphically visualized as:
x <- seq(-4, 4, length = 1e5)
hx <- dt(x, df = N - 2)
# Plot the density
#
plot(x, hx, type = "l", yaxs = "i", xaxt = "n", ylim = c(0, 0.41),
main = expression("p-value for the two-tail test; "~H[1]~":"~beta[1]!=0))
#
# Shade the probability p-value
polygon(c(min(x), x[x <= t_stat], t_stat),
c(0, hx[x <= t_stat], 0), border = NA,
col = rgb(0, 1, 0, alpha = 0.4))
# plot the observed t-statictic:
abline(v = t_stat, col = "darkgreen", lty = 2)
# Shade the probability alpha / 2
polygon(c(min(x), x[x <= -abs(t_crit)], -abs(t_crit)), border = "red", lwd = 2,
c(0, hx[x <= -abs(t_crit)], 0), col = rgb(1, 0, 0, alpha = 0.1))
# Shade the probability 1 - alpha / 2
polygon(c(abs(t_crit), x[x >= abs(t_crit)], max(x)), border = "red", lwd = 2,
c(0, hx[x >= abs(t_crit)], 0), col = rgb(1, 0, 0, alpha = 0.4))
# plot the critical value -|t_c| and t_c|:
abline(v = -abs(t_crit), col = "red", lty = 2)
abline(v = abs(t_crit), col = "red", lty = 2)
#
#
#
axis(side = 1, at = -abs(t_crit),
labels = expression("-|"~t[c]~"|"), col.axis = "red")
axis(side = 1, at = t_stat,
labels = expression(t[stat]), col.axis ="darkgreen")
axis(side = 1, at = abs(t_crit),
labels = expression("|"~t[c]~"|"), col.axis = "red")
x = np.linspace(start = -4, stop = 4, num = int(1e5))
hx = stats.t.pdf(x, df = N - 2)
# Plot the density
fig, ax = plt.subplots(num = 2, figsize = (10, 8))
_ = plt.plot(x, hx, color = "black")
_ = plt.margins(y = 0)
_ = plt.ylim((0, 0.41))
# Shade the probability p-value
_ = plt.fill_between(x[x <= t_stat], hx[x <= t_stat], edgecolor = None,
color = clrs.to_rgba(c = (0, 1, 0, 0.4)))
# plot the observed t-statictic:
_ = plt.axvline(x = t_stat, color = "darkgreen", linestyle = "--")
# Shade the probability alpha / 2
_ = plt.fill_between(x[x <= -np.abs(t_crit)], hx[x <= -np.abs(t_crit)],
edgecolor = "red", linestyle = "-", linewidth = 2,
facecolor = clrs.to_rgba(c = (1, 0, 0, 0.1)), zorder = 10)
# Shade the probability 1 - alpha / 2
_ = plt.fill_between(x[x >= np.abs(t_crit)], hx[x >= np.abs(t_crit)],
edgecolor = "red", linestyle = "-", linewidth = 2,
facecolor = clrs.to_rgba(c = (1, 0, 0, 0.4)), zorder = 10)
# plot the critical value -|t_c| and t_c|:
_ = plt.axvline(x = -np.abs(t_crit), color = "red", linestyle = "--")
_ = plt.axvline(x = np.abs(t_crit), color = "red", linestyle = "--")
#
_ = ax.set_xticks([-np.abs(t_crit), t_stat, np.abs(t_crit)])
_ = ax.set_xticklabels(["$-|t_c|$", "$t_{stat}$", "$|t_c|$"])
ax.get_xticklabels()[0].set_color("red")
ax.get_xticklabels()[1].set_color("darkgreen")
ax.get_xticklabels()[2].set_color("red")
_ = plt.title("p-value for the two-tail test; $H_1: \\beta_1 \\neq 0$")
plt.show()
We see that the \(p\)-value is greater than the \(5\%\) significance level (as well as the fact that the calculated \(t\)-statistic is greater than the critical value), so we have no grounds to reject the null hypothesis that \(\beta_1\) is not statistically significantly different zero.
3.6.8 Conclusions of the Test
After completing the hypothesis test, we need to draw our conclusions - whether we reject, or fail to reject the null hypothesis.
Having estimated a model, our first concern is usually to determine, whether there is a relationship between the dependent variable \(Y\), and the independent variable \(X\). If \(\beta_1 = 0\) in a univariate regression, then there is no linear relationship between \(Y\) and \(X\).
In other words, are usually interested in determining if a coefficient of a predictor is significantly different from zero, that is \(H_0: \beta_i = 0\). This is also called a test of significance.
- If we reject \(H_0\), then we say that \(\beta_1\) is statistically significantly different from zero. Usually we want our model to have only significant variables included.
- If we fail to reject \(H_0\), then we say that \(\beta_i\) is not statistically significant. In such a case, we usually want to remove the insignificant variable (or replace it with another, hopefully significant, variable).
3.6.8.1 Statistical Significance versus Economical significance
We have emphasized statistical significance throughout this section. However, we should also pay attention to the magnitude of the estimated coefficient. In other words:
- The statistical significance of a parameter \(\widehat{\beta}_i\) is determined by the size of the \(t\)-statistic \(t_i\).
- The economic (or practical) significance of a variable is related to the size and sign of \(\widehat{\beta}_i\).
\[ \underset{(\text{se})}{\widehat{Y}} = \underset{(200)}{2000} + \underset{(0.00001)}{0.0001}X \] where \(Y\) - is the total value of ice cream sales in EUR, and \(X\) is the expenditure on advertising in EUR. Then the \(t\)-statistic for \(H_0: \beta_1 = 0\) against \(H_1:\beta_1 \neq 0\) is \(t_1 = \widehat{\beta}_1 / \text{se}(\widehat{\beta}_1) = (0.01)/(0.001) = 10\), which would be much greater than any critical value with \(\alpha = 0.05\). This means that an additional EUR spent in advertising would result in 0.01 EUR of ice cream sales. While this is statistically significantly different from zero, it may not be economically significantly different from zero. If advertisement increases by 10000 EUR, then the average value on ice cream sales increases by 1 EUR - would it be worth for an ice cream company to increase their spending on advertisement in this specific case?
\[ Y_i = \beta_0 + \beta_1 X_i + \epsilon_i \] where \(i = 1,...,N\), \(N = 100\), \(\beta_0 = 4\), \(\beta_1 = -2\), \(X_i \sim \mathcal{N}(2, 2^2)\), \(\epsilon_i \sim \mathcal{N}(0,1)\). Furthermore, let \(Z_i \sim \mathcal{N}(4, 3^2)\) be independent of \(\epsilon_i\) and \(X_i\), \(\forall i\)
Next we estimate the model and calculate the standard errors of the parameters:
#
#
# Estimate the parameters:
x_mat <- cbind(1, x)
xtx <- t(x_mat) %*% x_mat
xty <- t(x_mat) %*% y
beta_mat_est <- solve(xtx) %*% xty
# Calculate model fit:
y_fit <- beta_mat_est[1] + beta_mat_est[2] * x
# Calculate Residuals:
resid <- y - y_fit
# Estimate the standard errors:
sigma2_est <- sum(resid^2) / (length(x) - 2)
var_beta <- sigma2_est * solve(t(x_mat) %*% x_mat)
std_err <- sqrt(diag(var_beta))
# Print the values:
print(data.frame(estimates = beta_mat_est,
se = std_err))
## estimates se
## 3.949669 0.15167298
## x -2.026236 0.05343931
import pandas as pd
#
# Estimate the parameters:
x_mat = np.column_stack((np.ones(len(x)), x))
xtx = np.dot(np.transpose(x_mat), x_mat)
xty = np.dot(np.transpose(x_mat), y)
beta_mat_est = np.dot(np.linalg.inv(xtx), xty)
# Calculate model fit:
y_fit = beta_mat_est[0] + beta_mat_est[1] * x
# Calculate the residuals:
resid = y - y_fit
# Estimate the standard errors:
sigma2_est = sum(resid**2) / (len(x) - 2)
var_beta = sigma2_est * np.linalg.inv(np.dot(np.transpose(x_mat), x_mat))
std_err = np.sqrt(np.diag(var_beta))
# Print the values:
print(pd.DataFrame(np.column_stack([beta_mat_est, std_err]),
columns = ["estimates", "se"], index = ["b0", "b1"]))
## estimates se
## b0 3.997507 0.132489
## b1 -2.008296 0.043418
Next, we want to calculate the \(t\)-statistics under the null hypothesis \(H_0: \beta_i = 0\) for each \(i = 0, 1\):
## [,1]
## 26.04069
## x -37.91658
## [ 30.17232824 -46.25488965]
Next, when specifying the alternative hypothesis and calculating the critical values, we will do so under at \(\alpha = 0.05\) (or simply \(5\%\)) significance level.
We will begin by examining the case when the alternative hypothesis \(H_1: \beta_1 > 0\) and compare it with the \(t\)-statistic:
## reject_H0
## TRUE
## x FALSE
tc_g = stats.t.ppf(q = 1 - 0.05, df = N-2)
print(pd.DataFrame(t_stat >= tc_g, columns = ["reject_H0"],
index = ["b0", "b1"]))
## reject_H0
## b0 True
## b1 False
Since \(t_0 \geq t_c\), we reject the null hypothesis at significance level \(0.05\) and accept the alternative and conclude that \(\beta_0 > 0\). However, we do not reject (or in this case, we fail to reject) the null hypothesis that \(\beta_1 = 0\). This highlights one key aspect - it is very important to correctly specify the alternative hypothesis.
Additionally, can verify that this is indeed the \(5\%\) critical value by calculating its probability:
## [1] 0.05
## 0.050000000002184096
This also allows us to calculate the \(p\)-value for each estimated parameter:
## [,1]
## 3.949205e-46
## x 1.000000e+00
## [1.04385101e-51 1.00000000e+00]
We come to the same conclusion - for \(\widehat{\beta}_0\) we have that \(p-value < 0.05\), so we reject the null hypothesis at the \(5\%\) significance level. However, for \(\widehat{\beta}_1\), the \(p-value = 1 > 0.05\), so we fail to reject \(H_0\).
Next, we will calculate the critical value under \(H_1: \beta_1 < 0\) and compare it with the \(t\)-statistic as well and the \(p-value\):
tc_l <- qt(0.05, N-2)
p_val_l <- pt(t_stat, df = N - 2, lower = TRUE)
data.frame(reject_H0 = t_stat <= tc_l,
p_value = p_val_l)
## reject_H0 p_value
## FALSE 1.000000e+00
## x TRUE 1.147452e-60
tc_l = stats.t.ppf(q = 0.05, df = N-2)
p_val_l = stats.t.cdf(x = t_stat, df = N-2)
print(pd.DataFrame([t_stat <= tc_l, p_val_l], index = ["reject_H0", "p_value"],
columns = ["b0", "b1"]).T)
## reject_H0 p_value
## b0 False 1
## b1 True 1.10985e-68
We see that this time we fail to reject \(H_0\) for \(\beta_0\) but reject \(H_0\) for \(\beta_1\).
(Note: there is a different when calculating the \(p\)-values in R
the argument lower = TRUE
is used, in Python
the cdf()
function is used instead of sf()
).
Finally, we will calculate the critical value under \(H_1: \beta_1 \neq 0\):
r1 <- t_stat <= tc_l | t_stat >= tc_g
r2 <- 2 * pt(-abs(t_stat), df = N - 2, lower.tail = TRUE)
data.frame(reject_H0 = r1,
p_value = r2)
## reject_H0 p_value
## TRUE 7.898411e-46
## x TRUE 2.294904e-60
r1 = np.logical_or(t_stat >= tc_g, t_stat <= tc_l)
r2 = 2 * stats.t.cdf(x = -np.abs(t_stat), df = N-2)
print(pd.DataFrame([r1, r2,], index = ["reject_H0", "p_value"],
columns = ["b0", "b1"]).T)
## reject_H0 p_value
## b0 True 2.0877e-51
## b1 True 2.2197e-68
Since the \(p\)-value is greater than the \(5\%\) significance level, we reject the null hypothesis, and conclude that our estimated parameter \(\beta_0\) is statistically significantly different from zero. We arrive at the same conclusions for \(\beta_1\). We see that in this case it is much better to use a two-tail test, since we were only interested in the significance of our parameters.
Most statistical software calculates the \(p\)-value under \(H_0: \beta_i = 0\) against \(H_1:\beta_i \neq 0\), since we are first and foremost interested whether our coefficients are statistically significantly different from zero.
Using the built-in functions leads to the following result:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.949669 0.15167298 26.04069 7.898411e-46
## x -2.026236 0.05343931 -37.91658 2.294904e-60
import statsmodels.api as sm
#
lm_fit = sm.OLS(y, x_mat).fit()
df = pd.DataFrame(lm_fit.summary().tables[1].data)
print(df.iloc[:, 0:5])
## 0 1 2 3 4
## 0 coef std err t P>|t|
## 1 const 3.9975 0.132 30.172 0.000
## 2 x1 -2.0083 0.043 -46.255 0.000
We see the rounded p-values
, which are very close to zero. We can examine the values directly:
## [2.08770202e-51 2.21969545e-68]
In contrast, if we wanted to estimate how \(Z\) affects \(Y\):
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.2363001 0.6989195 -1.768874 0.0800262
## z 0.1758909 0.1343554 1.309146 0.1935462
lm_fit = sm.OLS(y, sm.add_constant(z)).fit()
df = pd.DataFrame(lm_fit.summary().tables[1].data)
print(df.iloc[:, 0:5])
## 0 1 2 3 4
## 0 coef std err t P>|t|
## 1 const -0.5050 0.758 -0.666 0.507
## 2 x1 0.1015 0.161 0.632 0.529
We see that \(p-value = 0.1935 > 0.05\), so we fail to reject the null hypothesis that the coefficient \(\beta_1\) is not statistically significantly different from zero. So, the value of \(Z\) does not affect the value of \(Y\).
3.6.9 Null Hypothesis and The Parameter Confidence Interval
The two-tail tests and parameter confidence intervals are closely related. Assume that we want to test the following: \[ \begin{cases} H_0&: \beta_i = c \\ H_1&: \beta_i \neq c \end{cases} \] Looking back at section 3.5 for parameter confidence interval estimation, we have that: \[ \mathbb{P} \left(\widehat{\beta}_i - t_c \cdot \text{se}(\widehat{\beta}_i)\leq \beta_i \leq \widehat{\beta}_i + t_c \cdot \text{se}(\widehat{\beta}_i)\right) = 1 - \alpha \] Which leads to the \(100 \cdot (1-\alpha)\) confidence interval: \[ \left[ \widehat{\beta}_i - t_c \cdot \text{se}(\widehat{\beta}_i);\quad \widehat{\beta}_i + t_c \cdot \text{se}(\widehat{\beta}_i) \right] \] Under the null hypothesis we would have that \(\beta_i = c\).
So, if we test the null hypothesis against the two-tailed alternative, then we should check if \(c\) belongs to the confidence interval:
- If \(c \in \left[ \widehat{\beta}_i - t_c \cdot \text{se}(\widehat{\beta}_i);\quad \widehat{\beta}_i + t_c \cdot \text{se}(\widehat{\beta}_i) \right]\), we will (most likely) not reject \(H_0\) at the level of significance \(\alpha\).
- If \(c \notin \left[ \widehat{\beta}_i - t_c \cdot \text{se}(\widehat{\beta}_i);\quad \widehat{\beta}_i + t_c \cdot \text{se}(\widehat{\beta}_i) \right]\), we will (most likely) reject \(H_0\).
Note that this is not an alternative to hypothesis testing - you should still carry our the tests by calculating the critical values and t-statistics. This is a neat trick to have a preliminary view of what would most likely be the outcome of the tests.
3.6.10 Null Hypothesis with inequalities: \(\leq\) or \(\geq\)
If we want to test \(H_0: \beta_1 = c\), we can come to similar conclusions. For example, say we want to answer the question, whether a change in \(X\) results in an increase in \(Y\) by \(10\), then our null hypothesis in a univariate regression model is \(H_0: \beta_1 = 10\). In other words, we may want to estimate our model and answer the question whether the coefficient is statistically significantly different from a specific value (as long as it is statistically significantly different from zero).
If we do not reject the null of a one-sided test \(H_0: \beta_1 = 10\) with \(H_1: \beta_1 < 10\) but reject the null hypothesis in one-sided tests with \(H_1: \beta_1 > 10\), then we could make the case that a change in \(X\) results in a change in \(Y\) by at least \(10\).
On the other hand, if we do not reject a two-sided test with a null \(H_0: \beta_1 = 10\), then we may say that we do not reject the hypothesis that the coefficient is not statistically significantly different from \(10\).
We may sometimes want to specify our hypothesis test as \(H_0: \beta_1 \text{ is at least } c\): \[ \begin{aligned} H_0: \beta_1 \geq c \\ H_1: \beta_1 \lt c \end{aligned} \] or \(H_0: \beta_1 \text{ is no more than } c\): \[ \begin{aligned} H_0: \beta_1 \leq c \\ H_1: \beta_1 \gt c \end{aligned} \] The hypothesis testing procedure for testing the null hypothesis \(H_0\) with an inequality against an alternative hypothesis \(H_1\) with a strict inequality is exactly the same as testing the null \(\beta_1 = c\) against the alternative hypothesis with a strict inequality. The test statistic and rejection region are exactly the same.
In the classical approach \(H_0\) is stated as an equality. The reason for using \(H_0: \beta_1 = c\) is that, among the values that correspond to \(\beta_1 \geq\) (or \(\beta_1 \leq c\)), \(\beta_1 = c\) is the most conservative. The common practice is to choose the null hypothesis, which gives you the highest probability of Type I error (Source)
set.seed(123)
#
N <- 150
beta_0 <- 3
beta_1 <- 15
#
x <- seq(from = 0, to = 1, length.out = N)
e <- rnorm(mean = 0, sd = 1, n = N)
y <- beta_0 + beta_1 * x + e
#Print the model output:
lm_fit <- lm(y ~ x)
print(summary(lm_fit)$coefficients)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.143908 0.1540132 20.41325 7.012387e-45
## x 14.663458 0.2663121 55.06117 1.773174e-100
np.random.seed(123)
#
N = 150
beta_0 = 3
beta_1 = 15
#
x = np.linspace(start = 0, stop = 1, num = N)
e = np.random.normal(loc = 0, scale = 1, size = N)
y = beta_0 + beta_1 * x + e
#Print the model output:
lm_fit = sm.OLS(y, sm.add_constant(x)).fit()
print(lm_fit.summary().tables[1])
## ==============================================================================
## coef std err t P>|t| [0.025 0.975]
## ------------------------------------------------------------------------------
## const 2.9658 0.178 16.648 0.000 2.614 3.318
## x1 15.1841 0.308 49.293 0.000 14.575 15.793
## ==============================================================================
We will begin by testing the following hypothesis: \[ \begin{cases} H_0&: \beta_1 = 10 \\ H_1&: \beta_1 < 10 \end{cases} \]
# Get the standard errors:
b1se <- summary(lm(y ~ x))$coefficients[2, 2]
# Calculate one sided test statistic with H1: b1 < 10
t_stat <- (coef(lm(y ~ x))[2] - 10) / b1se
h1 <- pt(t_stat, df = N - 2, lower = TRUE)
print(paste0("with H1: b1 < 10; p-value = ", h1))
## [1] "with H1: b1 < 10; p-value = 1"
Because p-value > 0.05
, we have no grounds to reject the null hypothesis that \(\beta_1 = 10\).
On the other hand testing: \[ \begin{cases} H_0&: \beta_1 = 10 \\ H_1&: \beta_1 > 10 \end{cases} \]
t_stat <- (coef(lm(y ~ x))[2] - 10) / b1se
h1 <- pt(t_stat, df = N - 2, lower = FALSE)
print(paste0("with H1: b1 > 10; p-value = ", h1))
## [1] "with H1: b1 > 10; p-value = 3.39526879907719e-38"
t_stat = (lm_fit.params[1] - 10) / b1se
h1 = stats.t.sf(x = t_stat, df = N-2)
print("with H1: b1 > 10; p-value = " + str(h1))
## with H1: b1 > 10; p-value = 1.721815270410005e-36
gives us a p-value < 0.05
, which means we reject the null hypothesis and accept the alternative, that \(\beta_1 > 10\).
Furthermore, testing the following hypothesis: \[ \begin{cases} H_0&: \beta_1 = 15 \\ H_1&: \beta_1 < 15 \end{cases} \]
t_stat <- (coef(lm(y ~ x))[2] - 15) / b1se
h1 <- pt(t_stat, df = N - 2, lower = TRUE)
print(paste0("with H1: b1 < 15; p-value = ", h1))
## [1] "with H1: b1 < 15; p-value = 0.104159748483423"
t_stat = (lm_fit.params[1] - 15) / b1se
h1 = stats.t.cdf(x = t_stat, df = N-2)
print("with H1: b1 < 15; p-value = " + str(h1))
## with H1: b1 < 15; p-value = 0.7244798922799103
gives p-value > 0.05
, which means we have no grounds to reject the null hypothesis that \(\beta_1 = 15\).
On the other hand testing the following hypothesis yields: \[ \begin{cases} H_0&: \beta_1 = 16 \\ H_1&: \beta_1 < 16 \end{cases} \]
t_stat <- (coef(lm(y ~ x))[2] - 16) / b1se
h1 <- pt(t_stat, df = N - 2, lower = TRUE)
print(paste0("with H1: b1 < 16; p-value = ", h1))
## [1] "with H1: b1 < 16; p-value = 7.36635354565659e-07"
t_stat = (lm_fit.params[1] - 16) / b1se
h1 = stats.t.cdf(x = t_stat, df = N-2)
print("with H1: b1 < 16; p-value = " + str(h1))
## with H1: b1 < 16; p-value = 0.00447744628577782
Because p-value < 0.05
, we reject the null hypothesis that \(\beta_1 = 16\). So, from these hypothesis tests, we could say that \(15 \leq \beta_1 < 16\).
Finally, we will test a number of different hypothesis: \[ \begin{cases} H_0&: \beta_1 = c \\ H_1&: \beta_1 > c \end{cases} \] where \(c = 10,...,20\).
# Calculate the p-values for different one sided tests for H1: b1 > i
for(i in 10:20){
t_stat <- (coef(lm(y ~ x))[2] - i) / b1se
h1 <- pt(t_stat, df = N - 2, lower = FALSE)
print(paste0("H0: b1 = ", i, " against H1: b1 > ", i, "; p-value = ", h1))
}
## [1] "H0: b1 = 10 against H1: b1 > 10; p-value = 3.39526879907719e-38"
## [1] "H0: b1 = 11 against H1: b1 > 11; p-value = 1.48139779974377e-28"
## [1] "H0: b1 = 12 against H1: b1 > 12; p-value = 1.30234384882019e-18"
## [1] "H0: b1 = 13 against H1: b1 > 13; p-value = 2.11519925719585e-09"
## [1] "H0: b1 = 14 against H1: b1 > 14; p-value = 0.00691614697341538"
## [1] "H0: b1 = 15 against H1: b1 > 15; p-value = 0.895840251516577"
## [1] "H0: b1 = 16 against H1: b1 > 16; p-value = 0.999999263364646"
## [1] "H0: b1 = 17 against H1: b1 > 17; p-value = 0.999999999999998"
## [1] "H0: b1 = 18 against H1: b1 > 18; p-value = 1"
## [1] "H0: b1 = 19 against H1: b1 > 19; p-value = 1"
## [1] "H0: b1 = 20 against H1: b1 > 20; p-value = 1"
# Calculate the p-values for different one sided tests for H1: b1 > i
for i in range(10, 21):
t_stat = (lm_fit.params[1] - i) / b1se
h1 = stats.t.sf(x = t_stat, df = N-2)
print("H0: b1 = "+ str(i) +
" against H1: b1 > " + str(i) + "; p-value = " + str(h1))
## H0: b1 = 10 against H1: b1 > 10; p-value = 1.721815270410005e-36
## H0: b1 = 11 against H1: b1 > 11; p-value = 4.2430845441520945e-28
## H0: b1 = 12 against H1: b1 > 12; p-value = 1.716716554084899e-19
## H0: b1 = 13 against H1: b1 > 13; p-value = 2.552032927419394e-11
## H0: b1 = 14 against H1: b1 > 14; p-value = 8.963697889092514e-05
## H0: b1 = 15 against H1: b1 > 15; p-value = 0.2755201077200897
## H0: b1 = 16 against H1: b1 > 16; p-value = 0.9955225537142222
## H0: b1 = 17 against H1: b1 > 17; p-value = 0.9999999878277424
## H0: b1 = 18 against H1: b1 > 18; p-value = 0.9999999999999998
## H0: b1 = 19 against H1: b1 > 19; p-value = 1.0
## H0: b1 = 20 against H1: b1 > 20; p-value = 1.0
This is another example of why we never accept, but rather do not reject the null hypothesis as it not only does not give us a concrete answer (in this case multiple null hypothesis are compatible with our data sample) but also depends on the specification of both the null and the alternative hypothesis.