There are some time series where the variance changes consistently over time. In the context of financial time series, this would be called increasing or decreasing volatility.
In time series where the variance is increasing in a systematic way, such as an increasing trend, this property of the series is called heteroskedasticity. If the change in variance can be correlated over time, then it can be modeled using an autoregressive process, such as an ARCH.
Autoregressive Conditional Heteroskedasticity (ARCH) models explicitly allow modelling of the change in variance over time in a time series, something that classical ARMA models do not allow.
In this section we will present an example of esimating ARCH and GARCH models with Python. We will use the arch_model
function from the arch
package.
To install the package, follow the documentation - open your command prompt (as long as you selected to add anaconda to your PATH
during the installation) and enter the following:
conda install -c bashtage arch
If you do not have anaconda
, use pip install arch
Note that this package works on a 64-bit OS only.
#Import the required modules for vectors and matrix operations, data generation
import numpy as np
#Import the required modules for plot creation:
import matplotlib.pyplot as plt
#import the required modules for TimeSeries data generation:
import statsmodels.api as sm
#Import the required modules for test statistic calculation:
import statsmodels.stats as sm_stat
#Import the required modules for time series model estimation:
import statsmodels.tsa as smt
#Import Pandas
import pandas as pd
import arch as arch
you can read the documentation on arch
with
arch.doc()
Other usefull functions:
def tsdisplay(y, figsize = (14, 8), title = "", lags = 20):
tmp_data = pd.Series(y)
fig = plt.figure(figsize = figsize)
#Plot the time series
tmp_data.plot(ax = fig.add_subplot(311), title = "$Time\ Series\ " + title + "$", legend = False)
#Plot the ACF:
sm.graphics.tsa.plot_acf(tmp_data, lags = lags, zero = False, ax = fig.add_subplot(323))
plt.xticks(np.arange(1, lags + 1, 1.0))
#Plot the PACF:
sm.graphics.tsa.plot_pacf(tmp_data, lags = lags, zero = False, ax = fig.add_subplot(324))
plt.xticks(np.arange(1, lags + 1, 1.0))
#Plot the QQ plot of the data:
sm.qqplot(tmp_data, line='s', ax = fig.add_subplot(325))
plt.title("QQ Plot")
#Plot the residual histogram:
fig.add_subplot(326).hist(tmp_data, bins = 40, normed = 1)
plt.title("Histogram")
#Fix the layout of the plots:
plt.tight_layout()
plt.show()
def tsdiag(y, figsize = (14,8), title = "", lags = 10):
#The data:
tmp_data = pd.Series(y)
#The Ljung-Box test results for the first k lags:
tmp_acor = list(sm_stat.diagnostic.acorr_ljungbox(tmp_data, lags = lags, boxpierce = True))
# get the p-values
p_vals = pd.Series(tmp_acor[1])
#Start the index from 1 instead of 0 (because Ljung-Box test is for lag values from 1 to k)
p_vals.index += 1
fig = plt.figure(figsize = figsize)
#Plot the p-values:
p_vals.plot(ax = fig.add_subplot(313), linestyle='', marker='o', title = "p-values for Ljung-Box statistic", legend = False)
#Add the horizontal 0.05 critical value line
plt.axhline(y = 0.05, color = 'blue', linestyle='--')
# Annotate the p-value points above and to the left of the vertex
x = np.arange(p_vals.size) + 1
for X, Y, Z in zip(x, p_vals, p_vals):
plt.annotate(round(Z, 4), xy=(X,Y), xytext=(-5, 5), ha = 'left', textcoords='offset points')
plt.show()
# Return the statistics:
col_index = ["Ljung-Box: X-squared", "Ljung-Box: p-value", "Box-Pierce: X-squared", "Box-Pierce: p-value"]
return pd.DataFrame(tmp_acor, index = col_index, columns = range(1, len(tmp_acor[0]) + 1))
# Suppress matplotlib's annoying deprecation warning
import warnings
import matplotlib.cbook
warnings.filterwarnings("ignore", category = matplotlib.cbook.mplDeprecation)
We can think of $\rm ARCH$ models as $\rm AR$ model analogue applied to the error variance of a time series $r_t$ (where $r_t$ are usually. More specifically, it models the variance at time $t$ as a function of the residual errors from a mean process.
For example, we define the following $\rm ARCH(1)$ process:
$ \begin{cases} r_t &= \mu + \epsilon_t \\ \epsilon_t &= \sigma_t w_t,\quad w_t \sim WN(0, \sigma^2)\\ \sigma_t^2 &= \omega + \alpha_1 \epsilon^2_{t-1} \end{cases} $
In general, a volatility modelling includes the following components:
$\mu$ - the mean model. This can be considered the first component in volatility model building. Examples include:
$\sigma_t^2 = ...$ - a volatility model, which is added to the mean model to capture time-varying volatility. In addition to $\rm ARCH$, other specifications are available, like: the constant variance, $\rm GARCH$, $\rm EGARCH$, $\rm IGARCH$, $\rm GJR-GARCH$, $\rm TGARCH$ and more.
$w_t \sim WN(0, \sigma^2)$ - the distribution of $w_t$ is the final component of a volatility model. Examples include - standard normal, Student t, GED (Generalized Error Distribution) and so on.
We will simulate an $\rm ARCH(1)$ process as follows:
np.random.seed(123)
#Define the sample size:
n = 2000
# Define the coefficients:
mu = 1.0
omega = 2.0
alpha = 0.5
# Generate the random shocks
w = np.random.normal(size = n)
#
# Generate the ARCH process:
r_t = np.array(mu + np.sqrt(omega) * w[0], ndmin = 1)
for j in range(1, n):
r_t = np.append(r_t, mu + np.sqrt(omega + alpha * ((r_t[j-1] - mu)**2)) * w[j])
We can examine the series:
tsdisplay(r_t)
tsdiag(r_t)
The first lags appear to be correlated. Now let's plot the squared time series:
tsdisplay(r_t**2)
We see a strong autocorrelation, which may indicate the possibility of ARCH effects.
If we wanted to estimate an ARCH model on this data, we would first have to create a mean model for the returns r_t
. Then we should inspect the residuals - if their ACF and PACF plots show no autocorrelation but the squared residual ACF and PACF plots - show autocorrelation - we should create an ARCH model for the residuals.
We leave this as an exercise and instead opt to show the detailed modelling steps for the more general, $\rm GARCH$ model. These same steps can also be applied to the $\rm ARCH$ model specification.
A $\rm GARCH$ model can be thought of an an ARMA model analogue applied to the error variance of a time series.
We will consider the following $\rm GARCH(1, 1)$ example:
$ \begin{cases} r_t &= \mu + \epsilon_t \\ \epsilon_t &= \sigma_t w_t \\ \sigma_t^2 &= \omega + \alpha_1 \epsilon^2_{t-1} + \beta_1 \sigma^2_{t-1} \end{cases} $
where $\mu = 1$, $\omega = 0.2$, $\alpha = 0.5$ and $\beta = 0.3$ and $w_t \sim \mathcal{N}(0, 1)$.
np.random.seed(1)
#Define the sample size:
n = 2000
# Define the coefficients:
mu = 1.0
omega = 0.2
alpha = 0.5
beta = 0.3
# Generate the random shocks
w = np.random.normal(size = n)
sigma_sq = np.zeros_like(w)
#
# Generate the GARCH process:
sigma_sq[0] = omega
r_t = np.array(mu + np.sqrt(sigma_sq[0]) * w[0], ndmin = 1)
for j in range(1, n):
sigma_sq[j] = omega + alpha * ((r_t[j-1] - mu)**2 + beta * sigma_sq[j-1])
r_t = np.append(r_t, mu + np.sqrt(sigma_sq[j]) * w[j])
We will again examine the ACF's of the series:
tsdisplay(r_t, title = "of\ r_t")
If we test for significant autocorrelations
tsdiag(r_t)
It would appear that our series is $WN$.
However, we do note that if $\epsilon_t \sim WN$, then their squared values should not be autocorrelated as well. We will test this
tsdisplay(r_t**2, title = "of\ r_t^2")
We see that the autocorrelation is significant, which indicates ARCH effects.
In comparison, a simple WN (in this case, a standard normal) process does not have this property:
tsdisplay(w**2, title = "w_t^2", figsize = (15, 8))
We will supress additional warning messages to make it easier to read the output. These messages include:
FutureWarning
:
"Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`.
In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result."
This is a warning specific to statsmodels
estimation, which will hopefully be adressed in a package update.
RuntimeWarning
:
"Invalid value encountered in true_divide newparams = ((1-np.exp(-params))/(1+np.exp(-params))).copy()"
This is again a warning in statsmodels
for some lag order combinations. Again, the hope is that it will be adressed in the future.
HessianInversionWarning
:
"Inverting hessian failed, no bse or cov_params available"
This is again a warning in statsmodels
for some lag order combinations.
ConvergenceWarning
:
"Maximum Likelihood optimization failed to converge. Check mle_retvals"
This is again a warning in statsmodels
for some lag order combinations, which do not lead ot a convergence in the maximum likelihood function.
warnings.filterwarnings("ignore", category = FutureWarning)
warnings.filterwarnings("ignore", category = RuntimeWarning)
warnings.filterwarnings("ignore", category = sm.tools.sm_exceptions.HessianInversionWarning)
warnings.filterwarnings("ignore", category = sm.tools.sm_exceptions.ConvergenceWarning)
We begin by creating the model for the mean by looping through all possible $\rm ARMA(p, q)$ model combinations with $p_\max=q_\max=1$.
# Specify the best AIC - default is infinite - so anything lower is better
best_aic = np.inf
# Specify the best order
best_order = None
# Specify the best model
best_mdl = None
# Loop through different (p, q) combinations, starting from ARMA(0, 0), to ARMA(1, 1)
pq_rng = range(2) # range(5) would be [0,1,2,3,4]
for i in pq_rng:
for j in pq_rng:
try:
#print("Fitting an ARMA(" + str(i), ", " + str(j) + ") model")
tmp_mdl = smt.api.ARIMA(r_t, order = (i,0,j)).fit(method = 'mle', trend = 'c')
tmp_aic = tmp_mdl.aic
if tmp_aic < best_aic:
best_aic = tmp_aic
best_order = (i, 0, j)
best_mdl = tmp_mdl
print("Fitted an ARMA(" + str(i), ", " + str(j) + ") model - it was better")
else:
print("Fitted an ARMA(" + str(i), ", " + str(j) + ") model - it was worse")
except: continue
best_aic
print(best_mdl.summary().tables[1])
We could have used an already established function to automatically select the order:
mdl_auto = sm.tsa.stattools.arma_order_select_ic(r_t, ic = 'bic', trend = 'c', max_ar = 4, max_ma = 4)
print(mdl_auto)
print(mdl_auto.bic_min_order)
We chose our manual specification to highlight that we can always specify our own automated procedures. They may not be exactly as the ones readily provided in the packages, but we may not always need very generalized methods.
Note that sometimes having common roots results in NaN
values in estimated parameters link.
As we have seen from the ACF and PACF plot of $r_t$, there is no need to create an $\rm ARMA$ model for the mean model and we got the same results from our $\rm ARMA$ order selection code.
In order to test for $\rm ARCH$ effects, we need to calculate the residuals $\widehat{e}_t = r_t - \widehat{r}_t$
resids = best_mdl.resid
tsdisplay(resids, title = "of\ residuals\ \widehat{e}_t")
tsdiag(resids)
The residuals no not exhibit any significant autocorrelation, but their squared values:
tsdisplay(resids**2, title = "of\ \widehat{e}_t^2")
exhibit significant autocorrelations. This can again be verified via Ljung-Box test:
tsdiag(resids**2)
Looking at the PACF plot of the squared residuals, $\widehat{e}_t^2$, we see that the first (and possibly, second?) lag is significantly different from zero, so we would need to specify an $\rm ARCH(1)$ model.
We will further assume that the underlying shocks are normal, $w_t \sim \mathcal{N}(0, \sigma^2)$.
mdl_arch = arch.univariate.ConstantMean(r_t)
mdl_arch.volatility = arch.univariate.ARCH(1)
mdl_arch.distribution = arch.univariate.Normal()
Now we can estimate the model coefficients:
mdl_arch_fit = mdl_arch.fit()
We can examine the model estimates:
print(mdl_arch_fit.summary())
$ \begin{cases} r_t &= 1.0141 + \epsilon_t \\ \epsilon_t &= \sigma_t w_t \\ \sigma_t^2 &= 0.2746 + 0.5270 \epsilon^2_{t-1} \end{cases} $
We can look at the coefficients and see that they are similar to the ones we used to generate the series (the true values were $\mu = 1$, $\omega = 0.2$, $\alpha = 0.5$):
print(mdl_arch_fit.params)
Note that we have estimated an $\rm ARCH$ model, so we have no $\beta$ parameter.
We can also view the $\rm AIC$ and $\rm BIC$ values:
print(pd.DataFrame([mdl_arch_fit.aic, mdl_arch_fit.bic], index = ["AIC", "BIC"], columns = ["ARCH"]).T)
mdl_garch = arch.univariate.ConstantMean(r_t)
mdl_garch.volatility = arch.univariate.GARCH(1, 0, 1)
mdl_garch.distribution = arch.univariate.Normal()
Then. the estimated model is:
mdl_garch_fit = mdl_garch.fit()
print(mdl_garch_fit.summary())
$ \begin{cases} r_t &= 1.0155 + \epsilon_t \\ \epsilon_t &= \sigma_t w_t \\ \sigma_t^2 &= 0.1809 + 0.5118 \epsilon^2_{t-1} + 0.1924 \sigma^2_{t-1} \end{cases} $
The true values were: $\mu = 1$, $\omega = 0.2$, $\alpha = 0.5$ and $\beta = 0.3$. The $\widehat{\beta}$ estimate is not as close to the true value as $\widehat{\mu}$, $\widehat{\alpha}$ and $\widehat{\omega}$.
Note the $\rm AIC$ and $\rm BIC$
print(pd.DataFrame([mdl_garch_fit.aic, mdl_garch_fit.bic], index = ["AIC", "BIC"], columns = ["GARCH"]).T)
Note that the $\rm AIC$ and $\rm BIC$ values are smaller than the ones for the $\rm ARCH$ model, indicating that this is a better model.
mdl_ar_garch = arch.univariate.ARX(r_t, lags = 1)
mdl_ar_garch.volatility = arch.univariate.GARCH(1, 0, 1)
mdl_ar_garch.distribution = arch.univariate.Normal()
Then the estiamted model is:
mdl_ar_garch_fit = mdl_ar_garch.fit()
print(mdl_ar_garch_fit.summary())
We can extract the parameters and round them to 4 digits:
np.round(mdl_ar_garch_fit.params, 4)
So, the $AR(1)-GARCH(1,1)$ model is:
$ \begin{cases} r_t &= 1.0214 - 0.0061 r_{t-1} + \epsilon_t \\ \epsilon_t &= \sigma_t w_t \\ \sigma_t^2 &= 0.1812 + 0.5117 \epsilon^2_{t-1} + 0.1920 \sigma^2_{t-1} \end{cases} $
Note that the mean and volatility parameters are similar to the $\rm GARCH(1,1)$ model, with an additional $\widehat{\phi} = -0.0061$ autoregression coefficient in the mean equation.
If we look at the $\rm AIC$ and $\rm BIC$ values:
print(pd.DataFrame([mdl_ar_garch_fit.aic, mdl_ar_garch_fit.bic], index = ["AIC", "BIC"], columns = ["AR-GARCH"]).T)
We can compare with the $\rm GARCH(1, 1)$ model
print(pd.DataFrame([mdl_ar_garch_fit.aic - mdl_garch_fit.aic, mdl_ar_garch_fit.bic - mdl_garch_fit.bic], index = ["AIC", "BIC"], columns = ["Difference"]).T)
We see that the $\rm AIC$ is slightly lower, but $\rm BIC$ is larger. Furthermore, the increase in $\rm BIC$ is larger than the decrease in $\rm AIC$. Finally, since $\rm BIC$ penalizes model complexity more heavily than $\rm AIC$, we conclude that the $AR(1)-GARCH(1, 1)$ model is not better than the $\rm GARCH(1, 1)$ model.
Finally, if we wanted, we could specify other distributions for the shocks, like Student's $t$ with arch.univariate.StudentsT()
.
Finally, we examine the standardized residuals to make sure that there are no more $\rm ARCH$ effects (nor any autocorrelation). The standardized residuals can be computed by dividing the residuals by the conditional volatility.
std_resid = mdl_arch_fit.resid / mdl_arch_fit.conditional_volatility
tsdisplay(std_resid, title = 'of\ ARCH\ standardizez\ residuals')