## 5.1 Generalized Linear Model (GLM)

This chapter version is based in part on Source 1, Source 2, Source 3 and Source 4, as well as Source 5.

In a general linear model: $\mathbf{Y} = \mathbf{X} \boldsymbol{\beta} + \boldsymbol{\varepsilon}, \quad \mathbb{E} \left( \boldsymbol{\varepsilon} | \mathbf{X}\right) = \boldsymbol{0},\quad \mathbb{V}{\rm ar}\left( \boldsymbol{\varepsilon} | \mathbf{X} \right) = \mathbb{E} \left( \boldsymbol{\varepsilon} \boldsymbol{\varepsilon}^\top \right)= \mathbf{\Sigma} = \sigma_\epsilon^2 \mathbf{\Omega}$ the dependent variable $$\mathbf{Y}$$ is described via as a linear function of explanatory variables + an error term.

While up until now this specification was useful in describing various relationships, there are cases when this specification is not appropriate. For example:

• when the range of $$\mathbf{Y}$$ is restricted to binary values, or counts (i.e. non-negative integer-valued data);
• when the variance of $$\mathbf{Y}$$ depends on the mean;

A natural extension, which deals with these cases is a class of Generalized linear models, which extend general linear models.

### 5.1.1 GLM Specification

A Generalized Linear Model consists of several elements:

1. A linear predictor: $\boldsymbol{\eta} = \mathbf{X} \boldsymbol{\beta}$
2. A link function, $$g$$, which describes how the mean of the process $$\mathbf{Y}$$ depends on the linear predictor: $\mathbb{E}(\mathbf{Y}) = \boldsymbol{\mu} = g^{-1}(\boldsymbol{\eta}) = g^{-1}(\mathbf{X} \boldsymbol{\beta})$
3. A variance function, $$V$$, and a dispersion parameter, $$\boldsymbol{\phi}$$, which describe how the variance of the process $$\mathbf{Y}$$ depends on the mean: $\mathbb{V}{\rm ar}(\mathbf{Y}) = \boldsymbol{\phi}V(\boldsymbol{\mu}) = \boldsymbol{\phi}V\left(g^{-1}(\mathbf{X} \boldsymbol{\beta}) \right)$ Sometimes, this property can be described as some king of specific distribution on the dependent variable. For example, we may assume that $$\mathbf{Y}$$ follows a probability distribution from the exponential family.

The following assumptions are implied in the GLM:

• The relationship between the dependent and independent variables may be non-linear;
• The dependent variable can have a non-normal distribution;
• In order to estimate the unknown parameters, the maximum likelihood estimation method need to be applied (see chapter 3.4 for an introductory example to the MLE for a simple univariate OLS regression);
• The errors are independent but can have a non-normal distribution;

### 5.1.2 Exponential Family

In a GLM, each $$Y_i$$ is assumed to be generated from a particular distribution in the exponential family, where the probability density function (pdf) is written as: $f(y_i) = \exp\left( \dfrac{y_i \theta_i - b(\theta_i)}{a_i(\phi)} + c(y_i, \phi) \right)$ where:

• $$\theta_i$$ is the location (i.e. mean) parameter;
• $$\phi$$ is the scale (i.e. standard deviation, or sometimes variance) parameter
• $$a_i(\cdot)$$, $$b(\cdot)$$ and $$c(\cdot, \cdot)$$ - are known functions.

Furthermore, $$\theta_i$$ is called the canonical parameter and $$b(\cdot)$$ is called the cumulant function.

It can be show that if $$Y_i$$ has a distribution from the exponential family, then: \begin{aligned} \mathbb{E}(Y_i) &= \mu_i = b'(\theta_i)\\ \mathbb{V}{\rm ar}(Y_i) &= \sigma^2_i = b''(\theta_i) a_i(\phi) \end{aligned} It is sometimes assumed that $$a_i (\phi)$$ has the following form: $a_i(\phi) = \dfrac{\phi}{p_i},$ where $$p_i$$ is a known prior weight, usually 1.

The exponential family includes various distributions such as:

• Gaussian (i.e. normal) distribution;
• Bernoulli distribution;
• Binomial distribution;
• Multinomial distribution;
• Poisson distribution;
• Exponential distribution;

More distributions can be found here.

Example 5.1 The Normal distribution has the following probability density function (pdf):

\begin{aligned} f(y_i) &= \dfrac{1}{\sqrt{2\pi \sigma^2}} \exp \left( -\dfrac{1}{2} \dfrac{(y_i - \mu_i)^2}{\sigma^2} \right) = \dfrac{1}{\sqrt{2\pi \sigma^2}} \exp \left( -\dfrac{1}{2} \dfrac{y_i^2 + \mu_i^2 - 2y_i\mu_i}{\sigma^2} \right) = \exp \left( \dfrac{y_i \mu_i - \dfrac{1}{2} \mu_i^2}{\sigma^2} - \dfrac{y_i^2}{2\sigma^2} - \dfrac{1}{2} \log(2\pi\sigma^2) \right) \end{aligned} From this expression it is clear that for the normal distribution case:

• $$\theta_i = \mu_i$$;
• $$\phi = \sigma^2$$;
• $$a_i(\phi) = \phi$$;
• $$b_i(\theta_i) = \dfrac{1}{2}\theta_i^2$$;
• $$c(y_i, \phi) = - \dfrac{y_i^2}{2\phi} - \dfrac{1}{2} \log(2\pi\phi)$$

Then the mean and variance is as we would expect of a normal distribution: \begin{aligned} \mathbb{E}(Y_i) &= b'(\theta_i) = \theta_i = \mu_i\\ \mathbb{V}{\rm ar}(Y_i) &= b''(\theta_i) a_i(\phi) = 1 \cdot \phi = \sigma^2 \end{aligned}

Example 5.2 The Binomial distribution has the probability distribution function (for discrete r.v.’s this is known as the probability mass function (pmf))

$f_i(y_i) = \mathbb{P}(Y_i = y_i) = \binom{N_i}{y_i} p_i^{y_i}(1 - p_i)^{N_i - y_i}, \quad y_i = 0, 1,..., N_i$ here $$\mathbb{P}(Y_i = y_i)$$ is the probability of obtaining $$y_i$$ successes and $$N_i-y_i$$ failures in $$\binom{N_i}{y_i}$$ different ways.

Takings the logarithms of both sides yields: \begin{aligned} \log \left( f_i(y_i) \right ) &= y_i \log (p_i) + (N_i - y_i) \log (1 - p_i) + \log \left( \binom{N_i}{y_i} \right) \\\\ &= y_i \log \left( \dfrac{p_i}{1-p_i}\right) + N_i \log (1 - p_i) + \log \left( \binom{N_i}{y_i} \right) \end{aligned} We see that:

• $$\theta_i = \log \left( \dfrac{p_i}{1-p_i}\right)$$;

Then, solving for $$p_i$$ yields:

• $$p_i = \dfrac{\exp\left( \theta_i \right)}{1 + \exp\left( \theta_i \right)}$$ and $$1 - p_i = \dfrac{1}{1 + \exp\left( \theta_i \right)}$$

Taking the log of $$1 - p_i$$ yields $$\log(1 - p_i) = -\log \left( 1 + \exp\left( \theta_i \right) \right)$$ which allows us to write $$b(\cdot)$$ as:

• $$b(\theta_i) = N_i \log \left( 1 + \exp\left( \theta_i \right) \right)$$

The remaining term is:

• $$c(y_i, \phi) = \log \left( \binom{N_i}{y_i} \right)$$

Finally, we can set:

• $$a_i(\phi) = \phi$$ and $$\phi = 1$$.

Consequently, the mean and variance: \begin{aligned} \mathbb{E}(Y_i) &= b'(\theta_i) = N_i \dfrac{\exp\left( \theta_i \right)}{1 + \exp\left( \theta_i \right)} = N_i \cdot p_i = \mu_i\\\\ \mathbb{V}{\rm ar}(Y_i) &= b''(\theta_i) a_i(\phi) = N_i \dfrac{\exp\left( \theta_i \right)}{(1 + \exp\left( \theta_i \right))^2} \cdot 1 = N_i \cdot p_i (1 - p_i) \end{aligned} are in line with what we expect of a binomial r.v.

Example 5.3 The Poisson distribution has the probability distribution function:

$f_i(y_i) = \mathbb{P}(Y_i = y_i) = \dfrac{\exp\left( -\mu_i\right)\mu_i^{y_i}}{y_i!}, \quad y_i = 0, 1, 2, ...$ Taking the logarithms of both sides yields: $\log \left( f_i(y_i) \right ) = y_i \log (\mu_i) - \mu_i - \log(y_i!)$ we see that we can take:

• $$a_i(\phi) = \phi$$ and $$\phi = 1$$

Then:

• $$\theta_i = \log(\mu_i)$$

which yields:

• $$\mu_i = \exp \left( \theta_i \right)$$

Then, the second term in the log pdf is:

• $$b(\theta_i) = \exp \left( \theta_i \right)$$

Finally, the last term is:

• $$c(y_i, \phi) = -\log(y_i!)$$

Consequently, the mean and variance: \begin{aligned} \mathbb{E}(Y_i) &= b'(\theta_i) = \exp \left( \theta_i \right) = \mu_i\\ \mathbb{V}{\rm ar}(Y_i) &= b''(\theta_i) a_i(\phi) = \exp \left( \theta_i \right) \cdot 1 = \exp \left( \theta_i \right) = \mu_i \end{aligned} is just as we would expect of a Poisson distribution - the mean and the variance are equal.

Example 5.4 The Exponential distribution has the probability distribution function

$f(y_i) = \lambda \exp \left( -\lambda y_i \right),\quad \lambda > 0$ We can rewrite the above as: $\log(f(y_i)) = \log(\lambda) -\lambda y_i = -y_i \lambda + \log(\lambda)$ Then we can take:

• $$a_i(\phi) = \phi$$ and $$\phi = 1$$

and setting:

• $$\theta_i = -\lambda$$

yields:

• $$b(\theta_i) = -\log(-\theta_i)$$.

Finally, the last term is:

• $$c(y_i, \phi) = 0$$.

Consequently, the mean and variance: \begin{aligned} \mathbb{E}(Y_i) &= b'(\theta_i) = - \dfrac{1}{\theta_i} = \mu_i\\ \mathbb{V}{\rm ar}(Y_i) &= b''(\theta_i) a_i(\phi) = \dfrac{1}{\theta_i^2} \end{aligned}