## 3.1 General Concepts

In this subsection we will present some core definitions and concepts for the univariate cross sectional data case. Most of the definitions and concepts will also be (indirectly) used in later chapters.

### 3.1.1 The Modelling Framework

Much of econometric analysis begins with the following premise: let $$Y$$ and $$X$$ be two variables. We are interested in explaining $$Y$$ in terms of $$X$$. On the other hand, any single $$X$$ will never fully explain an economic variable $$Y$$. As such, all other non-observable factors, influencing $$Y$$, manifest themselves as an unpredictable (i.e. random) disturbance $$\epsilon$$. Thus, we are looking for a function $$f$$, which describes this relationship: $Y = f(X, \epsilon)$ where:

• $$Y$$ - is the variable of interest (e.g. total production, revenue, etc.);
• $$X$$ - is a variable, which would help us explain $$Y$$ (examples of $$X$$ include labor input, number of hours worked, capital input, equipment quality, number of employees, raw materials etc.);

Furthermore:

• $$f(\cdot)$$ is called a regression function;
• $$X$$ is called the independent variable, the control variable, the explanatory variable, the predictor variable, or the regressor;
• $$Y$$ is called the dependent variable, the response variable, the explained variable, the predicted variable or the regressand;
• $$\epsilon$$ is called the random component, the error term, the disturbance or the (economic) shock.

We shall assume that $$X$$ is a random variable (though this may not always be the case). In any case, $$Y$$ is a random variable ($$r.v.$$), which depends on $$\epsilon$$ ($$r.v.$$) and $$X$$. The simplest functional form of $$f$$ is linear:

$$$Y = \beta_0 + \beta_1 X + \epsilon \tag{3.1}$$$

If this functional form is appropriate for the data, our goal is to estimate its unknown coefficients $$\beta_0$$ and $$\beta_1$$ from the available data sample. Usually, we will assume that we have a finite random sample $$(X_1, Y_1), ..., (X_N, Y_N)$$ of independent identically distributed (i.i.d.) r.v.’s (unless stated otherwise), with their realizations (observed values) $$(x_1, y_1), ..., (x_N, y_N)$$. Since $$\epsilon$$ is a collection of unknown r.v.’s, unlike $$X$$ and $$Y$$, we do not observe $$\epsilon$$.

We expect that, on average, the relationship in (3.1) holds. The two-dimensional r.v. $$(X, Y)$$ is fully determined by the two-dimensional r.v. $$(X, \epsilon)$$. In order to get good estimates of $$\beta_0$$ and $$\beta_1$$, we have to impose certain restrictions on the properties of $$\epsilon$$ and the interaction between $$X$$ and $$\epsilon$$.

Let us denote $$\mathbb{E}(Y|X)$$ - the conditional expectation of $$Y$$ on $$X$$ - an expectation of $$Y$$, provided we know the value of r.v. $$X$$.

### 3.1.2 Conditional expectation properties

Proposition 3.1 Below we provide some properties of the conditional expectation which will be useful later on:
1. $$\mathbb{E}\left[ g(X) | X \right] = g(X)$$ for any function $$g(\cdot)$$;
2. $$\mathbb{E} \left[ a(X)Y + b(X) | X \right] = a(X) \cdot \mathbb{E} \left[ Y|X \right] + b(X)$$;
3. If $$X$$ and $$Y$$ are independent, then $$\mathbb{E} \left[ Y | X\right] = \mathbb{E} \left[ Y \right]$$ and $$\text{Cov}(X, Y) = 0$$;
4. Law of total expectation: $$\mathbb{E} \left[ \mathbb{E} \left[ Y|X \right]\right] = \mathbb{E} \left[ Y \right]$$;
5. Let $$Y$$, $$X$$ and $$\epsilon$$ follow (3.1). The conditional expectation is a linear operator: $$$\begin{split} \mathbb{E}(Y|X) &= \mathbb{E}(\beta_0 + \beta_1 X + \epsilon | X) \\ &= \mathbb{E}(\beta_0 | X) + \mathbb{E}(\beta_1 X | X) + \mathbb{E}(\epsilon | X) \\ &= \beta_0 + \beta_1 X + \mathbb{E}(\epsilon | X) \end{split} \tag{3.2}$$$
6. The conditional variance is: $\text{Var}(Y |X) = \mathbb{E} \left[(Y-\mathbb{E}[Y\mid X])^2\mid X \right] = \mathbb{E} \left[ Y^2 | X \right] - \left( \mathbb{E} \left[ Y|X \right] \right)^2.$ If $$X$$ and $$Y$$ are independent, then $$\text{Var}(Y|X) = \text{Var}(Y)$$. Additionally: $\text{Var}[X\mid X] = \text{Var}[X\mid X = x]=\mathbb{E}[(X-\mathbb{E}[X\mid X=x])^2\mid X=x] =\mathbb{E}[(X-x)^2\mid X=x] = 0$
Definition 3.1 Depending on whether $$X$$ and $$Y$$ are discrete or continuous, the expected value is calculated as follows:
• If both $$X$$ and $$Y$$ are discrete r.v.’s, then $\mathbb{E}(Y|X = x) = \sum_y y \mathbb{P}_{X|Y}(Y = y | X = x) = \sum_y y \dfrac{\mathbb{P}_{X,Y}(Y = y, X = x)}{\mathbb{P}_{X}(X = x)}$
• If both $$X$$ and $$Y$$ are continuous r.v.’s, then $\mathbb{E}(Y|X = x) = \int_{-\infty}^{\infty} y f_{Y|X}(y|x) dy = \int_{-\infty}^{\infty} y \dfrac{f_{X,Y}(x,y)}{f_X(x)} dy,$ where $$f_{X,Y}$$ is the joint probability density function (pdf) of $$X$$ and $$Y$$, and $$f_X$$ is the (marginal) density of $$X$$ with $$f_X(x) = \int_{-\infty}^{\infty} f_{X,Y}(x,y) dy$$;
• If $$X$$ is discrete and $$Y$$ is continuous r.v.’s, then $\mathbb{E}(Y|X = x) = \int_{-\infty}^{\infty} y f_Y(y | X = x) dy = \int_{-\infty}^{\infty} y \dfrac{f_{X,Y}(x,y)}{\mathbb{P}_{X}(X = x)} dy$
Corollary 3.1 Equation (3.2) implies that if $$\mathbb{E}(\epsilon | X) = 0$$, then the r.v. $$Y = \beta_0 + \beta_1 X + \epsilon$$ is on average $$\beta_0 + \beta_1 X$$:

$\mathbb{E}(\beta_0 + \beta_1 X + \epsilon | X) = \beta_0 + \beta_1 X$ and:

• $$\beta_0$$ - the intercept parameter, sometimes called the constant term - is the average value of $$Y$$ when $$X = 0$$. It often has no econometric meaning, especially if $$X$$ cannot take a zero value (e.g. if $$X$$ is wage);
• $$\beta_1$$ - the slope parameter - is the average change of $$Y$$ provided $$X$$ increases by 1.

### 3.1.3 Relationship Between the Regression and the Sample data

Assume that the relationship between $$Y$$ and $$X$$ (i.e. the data generating process) is a linear one as discussed before: $Y = \beta_0 + \beta_1 X + \epsilon$ where $$X$$ and $$\epsilon$$ are r.v.’s. For simplicity assume that:

• $$\beta_0 = 1$$, $$\beta_1 = 2$$;
• $$\epsilon \sim \mathcal{N}(0, 1)$$, $$X \sim \mathcal{N}(0, 5^2)$$;
• $$\mathbb{E}(\epsilon|X) = 0$$;

Observations (i.e. realizations) of the random sample $$(X_1, Y_1), ..., (X_N, Y_N)$$ with $$N = 50$$ can be generated as follows:

#
#
#
set.seed(1)
#
beta_0 <- 1
beta_1 <- 2
N <- 50
#
x <- rnorm(mean = 0, sd = 5, n = N)
eps <- rnorm(mean = 0, sd = 1, n = length(x))
y <- beta_0 + beta_1 * x + eps
#Conditional expectation of Y:
y_ce <- beta_0 + beta_1 * x
import numpy as np
import pandas as pd
#
np.random.seed(1)
#
beta_0 = 1
beta_1 = 2
N = 50
#
x = np.random.normal(loc = 0, scale = 5, size = N)
eps = np.random.normal(loc = 0, scale = 1, size = len(x))
y = beta_0 + beta_1 * x + eps
#Conditional expectation of Y:
y_ce = beta_0 + beta_1 * x
head(cbind(y_ce, y,x), 6)
##           y_ce         y          x
## [1,] -5.264538 -4.866432 -3.1322691
## [2,]  2.836433  2.224407  0.9182166
## [3,] -7.356286 -7.015166 -4.1781431
## [4,] 16.952808 15.823445  7.9764040
## [5,]  4.295078  5.728101  1.6475389
## [6,] -7.204684 -5.224284 -4.1023419
print(pd.DataFrame({'y_ce':y_ce, 'y':y, 'x':x}).head(6))
##         y_ce          y          x
## 0  17.243454  17.543624   8.121727
## 1  -5.117564  -5.469814  -3.058782
## 2  -4.281718  -5.424236  -2.640859
## 3  -9.729686 -10.079029  -5.364843
## 4   9.654076   9.445182   4.327038
## 5 -22.015387 -21.428764 -11.507693

We have also calculated the conditional expectation $$\mathbb{E}(Y_i|X_i = x_i) = 1 + 2 x_i$$ for each point $$x_i$$.

To get a better understanding of how the conditional expectation relates to the observed sample, we can look at the data scatter plots:

#
#
#
#
#
#
plot(x = x, y = y)
lines(x = x, y = y_ce, col = "red")
legend(x = -10, y = 15,
legend = c(expression(paste("E(Y|X) = ", beta[0] + beta[1] * X)),
"data sample"),
lty = c(1, NA), lwd = c(1, NA), pch = c(NA, 1), col = c("red", "black"))

import matplotlib.pyplot as plt
from matplotlib.lines import Line2D
#
_ = plt.figure(0)
_ = plt.plot(x, y, linestyle = "None", marker = "o", markerfacecolor='none');
_ = plt.plot(x, y_ce, linestyle = "-", color = "red");
legend_lines = [Line2D([0], [0], color = "red",
label='$E(Y|X) = \\beta_0 + \\beta_1 X$'),
Line2D([0], [0], linestyle = "None", markerfacecolor = "None",
marker = "o", label='sample data')]
_ = plt.legend(handles = legend_lines, loc = 'upper left')
plt.show()

Note that we could use a semi-colon ; to end the line in order to suppress the unwanted output of the plot() and plt.legend() functions. Alternatively, we need to assign a variable to the plot. In practice (at least when running Python code RStudio) the semi-colon does not always supresses the output, so we will assign it to an irrelevant variable, aptly named _. See this answer on stackoverflow. Another option is to use results='hold in the code chunk options in RStudio, which withholds partial output, as seen here. Though this is also not guaranteed to supress some output.

Note: In this particular example, we could have alternatively taken $$X$$ to be a non-random sequence of values - since the arrangement of the values does not matter, the horizontal axis is always ordered, so we would get a similar scatter plot as before. In practical applications this is not the case - in order to determine if $$X$$ is random, we need to examine its variable definition (e.g. if $$X$$ is the number of holidays in a year - this is, usually, a non-random number).

We see that, on average, the conditional expectation captures the relationship between $$Y$$ and $$X$$. Since we do not measure all the other variables (which are then consequently gathered in $$\epsilon$$) we see that the data sample values are scattered around the conditional mean of the process.

### 3.1.4 Parameter Terminology

In general, we do not know the underlying true values of the coefficients $$\beta_0$$ and $$\beta_1$$. We will denote $$\widehat{\beta}_1$$ - the estimation of the unknown parameter $$\beta_1$$.

We note that we can talk about $$\widehat{\beta}_1$$ in two ways:

• $$\widehat{\beta}_1$$ as a concrete estimated value, or more generally, as a result. Then we say that $$\widehat{\beta}_1$$ is an estimate of $$\beta_1$$ based on an observed data sample;
• $$\widehat{\beta}_1$$ as a random variable, or more generally, as a rule for calculating an estimate based on observed data. Then we say that $$\widehat{\beta}_1$$ is an estimator of $$\beta_1$$.

When we are talking about $$\widehat{\beta}_1$$ as an estimator, we can also talk about its mean, $$\mathbb{E}(\widehat{\beta}_1)$$, variance $$\mathbb{V}{\rm ar}(\widehat{\beta}_1)$$ and distribution, which are very important when determining if a particular estimation method is better than an alternative one.

We will talk about the estimation of these unknown parameters in the next chapter.