5.1 Generalized Linear Model (GLM)

This chapter version is based in part on Source 1, Source 2, Source 3 and Source 4, as well as Source 5.

In a general linear model: \[ \mathbf{Y} = \mathbf{X} \boldsymbol{\beta} + \boldsymbol{\varepsilon}, \quad \mathbb{E} \left( \boldsymbol{\varepsilon} | \mathbf{X}\right) = \boldsymbol{0},\quad \mathbb{V}{\rm ar}\left( \boldsymbol{\varepsilon} | \mathbf{X} \right) = \mathbb{E} \left( \boldsymbol{\varepsilon} \boldsymbol{\varepsilon}^\top \right)= \mathbf{\Sigma} = \sigma_\epsilon^2 \mathbf{\Omega} \] the dependent variable \(\mathbf{Y}\) is described via as a linear function of explanatory variables + an error term.

While up until now this specification was useful in describing various relationships, there are cases when this specification is not appropriate. For example:

  • when the range of \(\mathbf{Y}\) is restricted to binary values, or counts (i.e. non-negative integer-valued data);
  • when the variance of \(\mathbf{Y}\) depends on the mean;

A natural extension, which deals with these cases is a class of Generalized linear models, which extend general linear models.

5.1.1 GLM Specification

A Generalized Linear Model consists of several elements:

  1. A linear predictor: \[ \boldsymbol{\eta} = \mathbf{X} \boldsymbol{\beta} \]
  2. A link function, \(g\), which describes how the mean of the process \(\mathbf{Y}\) depends on the linear predictor: \[ \mathbb{E}(\mathbf{Y}) = \boldsymbol{\mu} = g^{-1}(\boldsymbol{\eta}) = g^{-1}(\mathbf{X} \boldsymbol{\beta}) \]
  3. A variance function, \(V\), and a dispersion parameter, \(\boldsymbol{\phi}\), which describe how the variance of the process \(\mathbf{Y}\) depends on the mean: \[ \mathbb{V}{\rm ar}(\mathbf{Y}) = \boldsymbol{\phi}V(\boldsymbol{\mu}) = \boldsymbol{\phi}V\left(g^{-1}(\mathbf{X} \boldsymbol{\beta}) \right) \] Sometimes, this property can be described as some king of specific distribution on the dependent variable. For example, we may assume that \(\mathbf{Y}\) follows a probability distribution from the exponential family.

The following assumptions are implied in the GLM:

  • The relationship between the dependent and independent variables may be non-linear;
  • The dependent variable can have a non-normal distribution;
  • In order to estimate the unknown parameters, the maximum likelihood estimation method need to be applied (see chapter 3.4 for an introductory example to the MLE for a simple univariate OLS regression);
  • The errors are independent but can have a non-normal distribution;

5.1.2 Exponential Family

In a GLM, each \(Y_i\) is assumed to be generated from a particular distribution in the exponential family, where the probability density function (pdf) is written as: \[ f(y_i) = \exp\left( \dfrac{y_i \theta_i - b(\theta_i)}{a_i(\phi)} + c(y_i, \phi) \right) \] where:

  • \(\theta_i\) is the location (i.e. mean) parameter;
  • \(\phi\) is the scale (i.e. standard deviation, or sometimes variance) parameter
  • \(a_i(\cdot)\), \(b(\cdot)\) and \(c(\cdot, \cdot)\) - are known functions.

Furthermore, \(\theta_i\) is called the canonical parameter and \(b(\cdot)\) is called the cumulant function.

It can be show that if \(Y_i\) has a distribution from the exponential family, then: \[ \begin{aligned} \mathbb{E}(Y_i) &= \mu_i = b'(\theta_i)\\ \mathbb{V}{\rm ar}(Y_i) &= \sigma^2_i = b''(\theta_i) a_i(\phi) \end{aligned} \] It is sometimes assumed that \(a_i (\phi)\) has the following form: \[ a_i(\phi) = \dfrac{\phi}{p_i}, \] where \(p_i\) is a known prior weight, usually 1.

The exponential family includes various distributions such as:

  • Gaussian (i.e. normal) distribution;
  • Bernoulli distribution;
  • Binomial distribution;
  • Multinomial distribution;
  • Poisson distribution;
  • Exponential distribution;

More distributions can be found here.

Example 5.1 The Normal distribution has the following probability density function (pdf):

\[ \begin{aligned} f(y_i) &= \dfrac{1}{\sqrt{2\pi \sigma^2}} \exp \left( -\dfrac{1}{2} \dfrac{(y_i - \mu_i)^2}{\sigma^2} \right) = \dfrac{1}{\sqrt{2\pi \sigma^2}} \exp \left( -\dfrac{1}{2} \dfrac{y_i^2 + \mu_i^2 - 2y_i\mu_i}{\sigma^2} \right) = \exp \left( \dfrac{y_i \mu_i - \dfrac{1}{2} \mu_i^2}{\sigma^2} - \dfrac{y_i^2}{2\sigma^2} - \dfrac{1}{2} \log(2\pi\sigma^2) \right) \end{aligned} \] From this expression it is clear that for the normal distribution case:

  • \(\theta_i = \mu_i\);
  • \(\phi = \sigma^2\);
  • \(a_i(\phi) = \phi\);
  • \(b_i(\theta_i) = \dfrac{1}{2}\theta_i^2\);
  • \(c(y_i, \phi) = - \dfrac{y_i^2}{2\phi} - \dfrac{1}{2} \log(2\pi\phi)\)

Then the mean and variance is as we would expect of a normal distribution: \[ \begin{aligned} \mathbb{E}(Y_i) &= b'(\theta_i) = \theta_i = \mu_i\\ \mathbb{V}{\rm ar}(Y_i) &= b''(\theta_i) a_i(\phi) = 1 \cdot \phi = \sigma^2 \end{aligned} \]

Example 5.2 The Binomial distribution has the probability distribution function (for discrete r.v.’s this is known as the probability mass function (pmf))

\[ f_i(y_i) = \mathbb{P}(Y_i = y_i) = \binom{N_i}{y_i} p_i^{y_i}(1 - p_i)^{N_i - y_i}, \quad y_i = 0, 1,..., N_i \] here \(\mathbb{P}(Y_i = y_i)\) is the probability of obtaining \(y_i\) successes and \(N_i-y_i\) failures in \(\binom{N_i}{y_i}\) different ways.

Takings the logarithms of both sides yields: \[ \begin{aligned} \log \left( f_i(y_i) \right ) &= y_i \log (p_i) + (N_i - y_i) \log (1 - p_i) + \log \left( \binom{N_i}{y_i} \right) \\\\ &= y_i \log \left( \dfrac{p_i}{1-p_i}\right) + N_i \log (1 - p_i) + \log \left( \binom{N_i}{y_i} \right) \end{aligned} \] We see that:

  • \(\theta_i = \log \left( \dfrac{p_i}{1-p_i}\right)\);

Then, solving for \(p_i\) yields:

  • \(p_i = \dfrac{\exp\left( \theta_i \right)}{1 + \exp\left( \theta_i \right)}\) and \(1 - p_i = \dfrac{1}{1 + \exp\left( \theta_i \right)}\)

Taking the log of \(1 - p_i\) yields \(\log(1 - p_i) = -\log \left( 1 + \exp\left( \theta_i \right) \right)\) which allows us to write \(b(\cdot)\) as:

  • \(b(\theta_i) = N_i \log \left( 1 + \exp\left( \theta_i \right) \right)\)

The remaining term is:

  • \(c(y_i, \phi) = \log \left( \binom{N_i}{y_i} \right)\)

Finally, we can set:

  • \(a_i(\phi) = \phi\) and \(\phi = 1\).

Consequently, the mean and variance: \[ \begin{aligned} \mathbb{E}(Y_i) &= b'(\theta_i) = N_i \dfrac{\exp\left( \theta_i \right)}{1 + \exp\left( \theta_i \right)} = N_i \cdot p_i = \mu_i\\\\ \mathbb{V}{\rm ar}(Y_i) &= b''(\theta_i) a_i(\phi) = N_i \dfrac{\exp\left( \theta_i \right)}{(1 + \exp\left( \theta_i \right))^2} \cdot 1 = N_i \cdot p_i (1 - p_i) \end{aligned} \] are in line with what we expect of a binomial r.v.

Example 5.3 The Poisson distribution has the probability distribution function:

\[ f_i(y_i) = \mathbb{P}(Y_i = y_i) = \dfrac{\exp\left( -\mu_i\right)\mu_i^{y_i}}{y_i!}, \quad y_i = 0, 1, 2, ... \] Taking the logarithms of both sides yields: \[ \log \left( f_i(y_i) \right ) = y_i \log (\mu_i) - \mu_i - \log(y_i!) \] we see that we can take:

  • \(a_i(\phi) = \phi\) and \(\phi = 1\)

Then:

  • \(\theta_i = \log(\mu_i)\)

which yields:

  • \(\mu_i = \exp \left( \theta_i \right)\)

Then, the second term in the log pdf is:

  • \(b(\theta_i) = \exp \left( \theta_i \right)\)

Finally, the last term is:

  • \(c(y_i, \phi) = -\log(y_i!)\)

Consequently, the mean and variance: \[ \begin{aligned} \mathbb{E}(Y_i) &= b'(\theta_i) = \exp \left( \theta_i \right) = \mu_i\\ \mathbb{V}{\rm ar}(Y_i) &= b''(\theta_i) a_i(\phi) = \exp \left( \theta_i \right) \cdot 1 = \exp \left( \theta_i \right) = \mu_i \end{aligned} \] is just as we would expect of a Poisson distribution - the mean and the variance are equal.

Example 5.4 The Exponential distribution has the probability distribution function

\[ f(y_i) = \lambda \exp \left( -\lambda y_i \right),\quad \lambda > 0 \] We can rewrite the above as: \[ \log(f(y_i)) = \log(\lambda) -\lambda y_i = -y_i \lambda + \log(\lambda) \] Then we can take:

  • \(a_i(\phi) = \phi\) and \(\phi = 1\)

and setting:

  • \(\theta_i = -\lambda\)

yields:

  • \(b(\theta_i) = -\log(-\theta_i)\).

Finally, the last term is:

  • \(c(y_i, \phi) = 0\).

Consequently, the mean and variance: \[ \begin{aligned} \mathbb{E}(Y_i) &= b'(\theta_i) = - \dfrac{1}{\theta_i} = \mu_i\\ \mathbb{V}{\rm ar}(Y_i) &= b''(\theta_i) a_i(\phi) = \dfrac{1}{\theta_i^2} \end{aligned} \]