2.3 Example
Considerations
The following libraries will be useful for this exercise.
#Import the required modules for vectors and matrix operations, data generation
import numpy as np
#Import the required modules for plot creation:
import matplotlib.pyplot as plt
#import the required modules for TimeSeries data generation:
import statsmodels.api as sm
#Import the required modules for test statistic calculation:
import statsmodels.stats as sm_stat
#Import the required modules for model estimation:
import statsmodels.tsa as smt
#Import the required modules for time series data types
import pandas as pd
# Set maximum column width of a pd.DataFrame to PREVENT a line break of wide tables
pd.set_option('display.max_columns', None)
# pd.set_option('display.max_colwidth', -1)
pd.set_option('display.expand_frame_repr', False)
Note that in this example, when using automatic order detection and parameter estimation, we may sometimes get
HessianInversionWarning: Inverting hessian failed, no bse or cov_params available
, or ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals
. We will suppress them with:
We will simulate the following time series processes
- \((A)\) \(Y_t = 1 + (1 + 0.5L)\epsilon_t\)
- \((B)\) \(Y_t = 1 + (1 + 1.3L - 0.4 L^2)\epsilon_t\)
- \((C)\) \((1 - 1.1L) Y_t = 1 + \epsilon_t\)
- \((D)\) \((1 - 1.1L + 0.2L^2) Y_t = 1 + \epsilon_t\)
- \((E)\) \((1 - 1.1L + 0.2L^2) Y_t = (1 + 1.3L)\epsilon_t\)
We will assume that if \(t \leq 0\), then \(Y_t = \epsilon_t = 0\) and \(\epsilon_t \sim \mathcal{N}(0, 0.5^2)\), \(\forall t > 0\).
2.3.1 Exercise Set 1: Data Simulation and EDA
2.3.1.1 Task 1
We will begin by rewriting the processes in their expanded forms:
- \((A)\) \(Y_t = 1 + \epsilon_t + 0.5 \epsilon_{t-1}\)
- \((B)\) \(Y_t = 1 + \epsilon_t + 1.3 \epsilon_{t-1} - 0.4 \epsilon_{t-2}\)
- \((C)\) \(Y_t = 1 + 1.1 Y_{t-1} + \epsilon_t\)
- \((D)\) \(Y_t = 1 + 1.1 Y_{t-1} - 0.2 Y_{t-2} + \epsilon_t\)
- \((E)\) \(Y_t = 1.1 Y_{t-1} - 0.2 Y_{t-2} + \epsilon_t + 1.3 \epsilon_{t-1}\)
From the above functional forms, we see that we have the following processes:
- \((A)\) \(\rm MA(1)\)
- \((B)\) \(\rm MA(2)\)
- \((C)\) \(\rm AR(1)\)
- \((D)\) \(\rm AR(2)\)
- \((E)\) \(\rm ARMA(2, 1)\)
Considerations
To recap the theory:
- Pure MA models are always stationary (since they contain no AR terms).
- Pure MA models may or may not be invertible.
- Pure AR models are always invertible (since they contain no MA terms).
- Pure AR models may or may not be stationary.
This translates to checking the following:
- If all the roots of \(\Phi(z) = 0\) are outside the unit circle, then the \(AR\) process is stationary;
- If all the roots of \(\Theta(x) = 0\) are outside the unit circle, then the \(MA\) process is invertible;
We begin by checking the stationarity of \((C)\), \((D)\) and \((E)\).
- \((C)\) \(1 - 1.1 z = 0 \Rightarrow z = 1/1.1 < 1\), so the process in \((C)\) is not stationary;
- \((D)\) \(1 - 1.1 z + 0.2 z^2 = 0 \Rightarrow \mathcal{D} = b^2 - 4\cdot a \cdot c = (-1.1)^2 - 4 \cdot 1 \cdot 0.2 = 0.41\), \(z_{1,2} = \dfrac{-b \pm \sqrt{\mathcal{D}}}{2 \cdot a}\), which gives \(z_1 = 1.149\) and \(z_2 = 4.351\), so the process is stationary;
- \((E)\) \(1 - 1.1 z + 0.2 z^2 = 0\) - the same as in \((D)\), so the process is stationary.
We can do this in R
using polyroot
.
From the documentation - polyroot(c(1, 2, 3))
represents 1 + 2x + 3x^2
, so:
We can also do it via Python
using poly1d
.
From the documentation - an example of poly1d([1, 2, 3])
represents \(x^2 + 2x + 3\), so:
Considerations
We can also plot the unit circle with the roots:
root_1 <- polyroot(c(1, -1.1))
#
plot(root_1, col = 2, pch = 16,
xlim = c(-2, 2),
ylim = c(-2, 2),
main = "AR(1)")
#
#
symbols(0, 0, circles = 1,
add = T, inches = F)
#
abline(h = 0, col = "blue", lty = 2)
abline(v = 0, col = "blue", lty = 2)
roots_1 = np.poly1d([-1.1, 1], variable = 'z').roots
#
fig = plt.figure(figsize = (10, 8))
ax = fig.add_subplot(1, 1, 1)
circ = plt.Circle((0, 0), radius = 1, edgecolor = 'black', facecolor = 'None')
_ = ax.add_patch(circ)
_ = plt.scatter(x = roots_1.real, y = roots_1.imag, color = "r")
_ = plt.xlim(-1.25, 1.25)
_ = plt.ylim(-1.25, 1.25)
_ = plt.axhline(y = 0, linestyle = "--", color = "b")
_ = plt.axvline(x = 0, linestyle = "--", color = "b")
_ = plt.title("AR(1)")
plt.show()
Note that the real
part is on the horizontal axis and the imaginary
on the vertical axis.
## [1] "(D) and (E) 1 - 1.1*x + 0.2*x^2"
## [1] "Roots: 1.14921894064179+0i" "Roots: 4.35078105935821-0i"
The invertability concept concerns only moving-average and ARMA processes, but it is calculated in a similar way as before, only now we need to find the roots of the moving-average lag operator function \(\Theta(x)\):
- \((A)\) \(1 + 0.5x = 0\)
- \((B)\) \(1 + 1.3x - 0.4 x^2 = 0\)
- \((E)\) \(1 + 1.3x = 0\)
We will not repeat the manual calculations and instead immediately look at the results via R
and Python
:
We see that for \((A)\), the process is invertible (the roots are outside the unit circle), while in \((B)\) it is not since one of its roots is inside the unit circle. Finally, \((E)\) is also non-invertible.
So, to recap:
- \((A)\) is stationary and invertible;
- \((B)\) is stationary and non-invertible;
- \((C)\) is non-stationary and invertible;
- \((D)\) is stationary and invertible;
- \((E)\) is stationary and non-invertible;
We will remind that an MA model is invertible if it is equivalent to a converging infinite order AR model. By converging, we mean that the AR coefficients decrease to 0 as we move back in time.
2.3.1.2 Task 2
To make it easier to follow, we will simulate each process separately.
Considerations
Before beginning our data simulation it is important to mention a process known as ’burn-in. As defined here (which argues against using burn-in when applying MCMC methods):
The name “burn-in” comes from electronics (see the entry in the Jargon File). Many electronics components fail quickly. Those that don’t are a more reliable subset. So a burn-in is done at the factory to eliminate the worst ones.
— Charles J. Geyer, “Burn-In is Unnecessary”
Why is this important for time series simulation? To answer this question, we will analyse a simple \({\rm AR}(1)\) process: \[ Y_t = \phi_1 Y_{t-1} + \varepsilon_t \] Assuming that we have some starting value \(Y_0\), we can write the process as: \[ \begin{align} t = 1&: Y_1 = \phi_1 Y_0 + \varepsilon_1 \\ t = 2&: Y_2 = \phi_1 Y_1 + \varepsilon_2 = \phi_1^2 Y_0 + \varepsilon_2 + \phi_1 \varepsilon_1 \\ &\vdots\\ t = k&: Y_k = \phi_1 Y_{k-1} + \varepsilon_k = \phi_1^k Y_0 + \sum_{j=0}^{k-1}\phi_1^j \varepsilon_{k-j} \end{align} \] as \(t \longrightarrow \infty\). Assume that this is the true DGP, \(Y_t^{\rm DGP}\).
In contrast, when simulating the data, we do not have \(Y_0\), so we assume some starting value, e.g. \(Y_0 = 0\). With this assumption, the difference between the true data generating process, \(Y_t^{\rm DGP}\), and our simulated process, \(Y_t^{\rm sim}\), is: \[ Y_t^{\rm DGP} - Y_t^{\rm sim} = \phi_1^k Y_0 \] If our \({\rm AR}(1)\) process is stationary (\(|\phi_1| < 1\)), then \(\phi_1^k Y_0 \longrightarrow 0\), as \(k \longrightarrow \infty\). In other words, the initial values at the start of the series differ the most between \(Y_t^{\rm DGP}\) and \(Y_t^{\rm sim}\). This means that the process, which we are simulating may not be the same process that we want to simulate. However, as we continue to simulate the series, this difference diminishes, as \(t\) increases.
This means that we can apply the burn-in algorithm as follows:
- Simulate \(T + m\) observations, such that \(\phi_1^m Y_0 \approx 0\).
- Drop the first \(m\) observations from the series.
Unfortunately, the number of additional burn-in observations, \(m\), depend on the complexity of the model. Looking at the function code for arima.sim
in R
, we see that it depends on the number of parameters:
if (is.na(n.start)) n.start <- p + q + ifelse(p > 0, ceiling(6/log(minroots)), 0)
if (n.start < p + q) stop(“burn-in ‘n.start’ must be as long as ‘ar + ma’”)
—
arima.sim
function code inR
For our cases, we will take \(m = 50\). As an example, we will simulate the \((D)\) \(Y_t = 1 + 1.1 Y_{t-1} - 0.2 Y_{t-2} + \epsilon_t\) series and compare the beginning of the series, to the rest of the series:
np.random.seed(123)
#
m = 50
N = 150
epsilon = np.random.normal(loc = 0, scale = 0.5, size = N + m)
#
temp = np.array(1 + epsilon[0], ndmin = 1)
temp = np.append(temp, 1 + 1.1 * temp[0] + epsilon[1])
for j in range(2, N + m):
Y_temp = 1 + 1.1 * temp[j - 1] - 0.2 * temp[j - 2] + epsilon[j]
temp = np.append(temp, Y_temp)
#
Lets compare \(t_1 = 1, ..., N\) and \(t_2 = m + 1, ..., N + m\):
par(mfrow = c(1, 2))
#
plot(temp[1:N], col = "cornflowerblue", type = "l", lwd = 2,
main = as.expression(bquote(t==1~"..."~N)))
lines(temp[1:m], col = "red", lwd = 2)
plot(temp[(m + 1):(N + m)], col = "cornflowerblue", type = "l", lwd = 2,
main = as.expression(bquote(t==m+1~"..."~N+m)))
lines((N - m):(N), temp[(N):(N + m)], col = "darkgreen", lwd = 2)
fig = plt.figure(figsize = (10, 8))
_ = fig.add_subplot(1, 2, 1).plot(temp[:N])
_ = plt.plot(list(range(0, m)), temp[:m], color = 'red')
_ = plt.title("$t = 1, ..., N$")
_ = fig.add_subplot(1, 2, 2).plot(temp[m:])
_ = plt.plot(list(range(N - m - 1, N)), temp[(N - 1):], color = 'darkgreen')
_ = plt.title("$t = m + 1,..., N+m$")
plt.show()
Comparing the beginning of the series (in red), we see that our starting value has a significant impact on how the series looks - as time increases, the series fluctuates around a mean, which is much higher than \(0\). This would mean, that we have to be very careful with our starting value seelection, if we do not want to use burn-in. On the other hand, if we drop the first \(m\) observations and continue simulating \(N + 1, ..., N + m\) additional observations, so that our series remains at length \(N\), we see that the series continues to fluctuate around \(10\).
Consequently, if we do not want to spend extra time selecting an appropriate starting value, it is best to apply the burn-in process.
2.3.1.2.1 \((A)\) \(Y_t = 1 + \epsilon_t + 0.5 \epsilon_{t-1}\)
The logic behind generating this process is as follows:
- when \(t = 0\), we have \(\epsilon_0 = 0\), \(Y_0 = 0\)
- when \(t = 1\), we have \(Y_1 = 1 + \epsilon_1 + 0.5 \cdot \epsilon_0 = 1 + \epsilon_1\)
- when \(t = 2\), we have \(Y_2 = 1 + \epsilon_2 + 0.5 \cdot \epsilon_1\)
- when \(t = 3\), we have \(Y_3 = 1 + \epsilon_3 + 0.5 \cdot \epsilon_2\)
- etc.
We can carry out these calculations as follows:
2.3.1.2.2 \((B)\) \(Y_t = 1 + \epsilon_t + 1.3 \epsilon_{t-1} - 0.4 \epsilon_{t-2}\)
The logic behind generating this process is as follows:
- when \(t = 0\), we have \(\epsilon_0 = 0\), \(Y_0 = 0\)
- when \(t = 1\), we have \(Y_1 = 1 + \epsilon_1 + 1.3 \cdot \epsilon_0 - 0.4 \cdot \epsilon_{-1}= 1 + \epsilon_1\)
- when \(t = 2\), we have \(Y_2 = 1 + \epsilon_2 + 1.3 \cdot \epsilon_1 - 0.4 \cdot \epsilon_{0} = 1 + \epsilon_2 + 1.3 \epsilon_1\)
- when \(t = 3\), we have \(Y_3 = 1 + \epsilon_3 + 1.3 \cdot \epsilon_2 - 0.4 \cdot \epsilon_1\)
- when \(t = 4\), we have \(Y_4 = 1 + \epsilon_4 + 1.3 \cdot \epsilon_3 - 0.4 \cdot \epsilon_2\)
- etc.
We can carry out these calculations as follows:
set.seed(123)
#
m <- 50
N <- 150
epsilon <- rnorm(mean = 0, sd = 0.5, n = N + m)
#
B_data_full <- 1 + epsilon[1]
B_data_full <- c(B_data_full, 1 + epsilon[2] + 1.3 * epsilon[1])
for(j in 3:(N + m)){
Y_temp <- 1 + epsilon[j] + 1.3 * epsilon[j - 1] - 0.4 * epsilon[j - 2]
B_data_full <- c(B_data_full, Y_temp)
}
B_data <- B_data_full[-c(1:m)]
np.random.seed(123)
#
m = 50
N = 150
epsilon = np.random.normal(loc = 0, scale = 0.5, size = N + m)
#
B_data_full = 1 + epsilon[0]
B_data_full = np.append(B_data_full, 1 + epsilon[1] + 1.3 * epsilon[0])
for j in range(2, N + m):
Y_temp = 1 + epsilon[j] + 1.3 * epsilon[j - 1] - 0.4 * epsilon[j - 2]
B_data_full = np.append(B_data_full, Y_temp)
#
B_data = B_data_full[m:]
We have now started the loop from the third
element instead of the second
, since the first two elements include \(\epsilon\) at values \(t \leq 0\), so they have a different formula. (Remember that arrays in Python
start at 0
, while in R
they start at 1
).
plot(B_data, type = "l", col = "cornflowerblue", lwd = 2)
lines(A_data, type = "l", col = "orange", lwd = 2)
title("Moving-Average Process")
legend("bottomleft",
legend = c(as.expression(bquote(Y[t]==1+epsilon[t]+0.5*epsilon[t-1]~"(invertible)")),
as.expression(bquote(Y[t]==1+epsilon[t]+1.3*epsilon[t-1]-0.4*epsilon[t-2]~"(non-invertible)"))),
lty = 1, col = c("cornflowerblue", "orange"), bg = "grey95")
fig = plt.figure(figsize = (10, 8))
#
_ = plt.plot(B_data, label = "$Y_t = 1 + \\epsilon_t + 0.5 \\epsilon_{t-1}$ (invertible)")
_ = plt.plot(A_data, label = "$Y_t = 1 + \\epsilon_t + 1.3 \\epsilon_{t-1} - 0.4 \\epsilon_{t-2}$ (non-invertible)")
_ = plt.title("Moving-Average Processes")
_ = plt.legend()
plt.show()
Notice that the plots do not exhibit large differences - you may not be able to tell whether a process is invertrible or not from the time series plots alone.
2.3.1.2.3 \((C)\) \(Y_t = 1 + 1.1 Y_{t-1} + \epsilon_t\)
The logic behind generating this process is as follows:
- when \(t = 0\), we have \(\epsilon_0 = 0\), \(Y_0 = 0\)
- when \(t = 1\), we have \(Y_1 = 1 + 1.1 \cdot Y_{0} + \epsilon_1 = 1 + \epsilon_1\)
- when \(t = 2\), we have \(Y_2 = 1 + 1.1 \cdot Y_{1} + \epsilon_2 = 1 + 1.1 \cdot (1 + \epsilon_1) + \epsilon_2\)
- when \(t = 3\), we have \(Y_3 = 1+ 1.1 Y_{2} + \epsilon_3 = 1 + 1.1 \cdot (1 + 1.1 \cdot (1 + \epsilon_1) + \epsilon_2) + \epsilon_3\)
- etc.
We can carry out these calculations as follows:
np.random.seed(123)
#
m = 50
N = 150
epsilon = np.random.normal(loc = 0, scale = 0.5, size = N + m)
#
C_data_full = np.array(1 + epsilon[0], ndmin = 1)
for j in range(1, N + m):
Y_temp = 1 + 1.1 * C_data_full[j - 1] + epsilon[j]
C_data_full = np.append(C_data_full, Y_temp)
#
C_data = C_data_full[m:]
2.3.1.2.4 \((D)\) \(Y_t = 1 + 1.1 Y_{t-1} - 0.2 Y_{t-2} + \epsilon_t\)
The logic behind generating this process is as follows:
- when \(t = 0\), we have \(\epsilon_0 = 0\), \(Y_0 = 0\)
- when \(t = 1\), we have \(Y_1 = 1 + 1.1 \cdot Y_{0} - 0.2 \cdot Y_{-1} + \epsilon_1 = 1 + \epsilon_1\)
- when \(t = 2\), we have \(Y_2 = 1 + 1.1 \cdot Y_{1} - 0.2 \cdot Y_0 + \epsilon_2 = 1 + 1.1 \cdot (1 + \epsilon_1) + \epsilon_2\)
- when \(t = 3\), we have \(Y_3 = 1+ 1.1 Y_{2} - 0.2 Y_{1} + \epsilon_3 = 1 + 1.1 \cdot (1 + 1.1 \cdot (1 + \epsilon_1) + \epsilon_2) - 0.2 \cdot (1 + \epsilon_1)+ \epsilon_3\)
- etc.
We can carry out these calculations as follows:
set.seed(123)
#
m <- 50
N <- 150
epsilon <- rnorm(mean = 0, sd = 0.5, n = N + m)
#
D_data_full <- 1 + epsilon[1]
D_data_full <- c(D_data_full, 1 + 1.1 * D_data_full[1] + epsilon[2])
for(j in 3:(N + m)){
Y_temp <- 1 + 1.1 * D_data_full[j - 1] - 0.2 * D_data_full[j - 2] + epsilon[j]
D_data_full <- c(D_data_full, Y_temp)
}
D_data <- D_data_full[-c(1:m)]
np.random.seed(123)
#
m = 50
N = 150
epsilon = np.random.normal(loc = 0, scale = 0.5, size = N + m)
#
D_data_full = np.array(1 + epsilon[0], ndmin = 1)
D_data_full = np.append(D_data_full, 1 + 1.1 * D_data_full[0] + epsilon[1])
for j in range(2, N + m):
Y_temp = 1 + 1.1 * D_data_full[j - 1] - 0.2 * D_data_full[j - 2] + epsilon[j]
D_data_full = np.append(D_data_full, Y_temp)
#
D_data = D_data_full[m:]
We have started the loop from 2
instead of 1
, since the first two elements include pas values of \(Y\) at values \(t \leq 0\), so they have a different formula.
par(mfrow = c(1, 2))
#
plot(C_data, col = "cornflowerblue", type = "l", lwd = 2,
main = as.expression(bquote(Y[t]==1+1.1*Y[t-1]+epsilon[t]~"(non-stationary)")))
plot(D_data, col = "cornflowerblue", type = "l", lwd = 2,
main = as.expression(bquote(Y[t]==1+1.1*Y[t-1]-0.2*Y[t-2]+epsilon[t]~"(stationary)")))
2.3.1.2.5 \((E)\) \(Y_t = 1.1 Y_{t-1} - 0.2 Y_{t-2} + \epsilon_t + 1.3 \epsilon_{t-1}\)
The logic behind generating this process is as follows:
- when \(t = 0\), we have \(\epsilon_0 = 0\), \(Y_0 = 0\)
- when \(t = 1\), we have \(Y_1 = 1.1 \cdot Y_{0} - 0.2 \cdot Y_{-1} + \epsilon_1 + 1.3 \cdot \epsilon_0 = \epsilon_1\)
- when \(t = 2\), we have \(Y_2 = 1.1 \cdot Y_{1} - 0.2 \cdot Y_0 + \epsilon_2 + 1.3 \cdot \epsilon_1 = 1.1 \cdot (1 + \epsilon_1) + \epsilon_2 + 1.3 \cdot \epsilon_1\)
- when \(t = 3\), we have \(Y_3 = 1.1 Y_{2} - 0.2 Y_{1} + \epsilon_3 + 1.3 \cdot \epsilon_2 = 1.1 \cdot (1 + 1.1 \cdot (1 + \epsilon_1) + \epsilon_2 + 1.3 \cdot \epsilon_1) - 0.2 \cdot (1 + \epsilon_1)+ \epsilon_3 + 1.3 \cdot \epsilon_2\)
- etc.
We can carry out these calculations as follows:
set.seed(123)
#
N <- 150
epsilon <- rnorm(mean = 0, sd = 0.5, n = N + m)
#
E_data_full <- epsilon[1]
E_data_full <- c(E_data_full, 1.1 * E_data_full[1] + epsilon[2] + 1.3* epsilon[1])
for(j in 3:(N + m)){
Y_temp <- 1.1 * E_data_full[j - 1] - 0.2 * E_data_full[j - 2] + epsilon[j] + 1.3 * epsilon[j - 1]
E_data_full <- c(E_data_full, Y_temp)
}
E_data <- E_data_full[-c(1:m)]
np.random.seed(123)
#
m = 50
N = 150
epsilon = np.random.normal(loc = 0, scale = 0.5, size = N + m)
#
E_data_full = np.array(epsilon[0], ndmin = 1)
E_data_full = np.append(E_data_full, 1.1 * E_data_full[0] + epsilon[1] + 1.3 * epsilon[0])
for j in range(2, N + m):
Y_temp = 1.1 * E_data_full[j - 1] - 0.2 * E_data_full[j - 2] + epsilon[j] + 1.3 * epsilon[j - 1]
E_data_full = np.append(E_data_full, Y_temp)
#
E_data = E_data_full[m:]
plot(E_data, type = "l", col = "cornflowerblue", lwd = 2, ylim = c(min(D_data, E_data), max(D_data, E_data)))
lines(D_data, type = "l", col = "orange", lwd = 2)
title("ARMA(2, 1) Process vs AR(2) process")
legend("bottomleft",
legend = c(as.expression(bquote(Y[t]==1.1*Y[t-1]-0.2*Y[t-2]+epsilon[t]+1.3*epsilon[t-1]~"(stationary non-invertible)")),
as.expression(bquote(Y[t]==1+1.1*Y[t-1]-0.2*Y[t-2]+epsilon[t]~"(stationary)"))),
lty = 1, col = c("cornflowerblue", "orange"), bg = "grey95")
fig = plt.figure(figsize = (10, 8))
#
_ = plt.plot(E_data, label = "$Y_t = 1.1 Y_{t-1} - 0.2 Y_{t-2} + \\epsilon_t + 1.3 \\epsilon_{t-1}$ (stationary non-invertible)")
_ = plt.plot(D_data, label = "$Y_t = 1 + 1.1 Y_{t-1} - 0.2 Y_{t-2} + \\epsilon_t$ (stationary)")
_ = plt.title("ARMA(2, 1) Process vs AR(2) process")
_ = plt.legend()
plt.show()
Note the difference in the mean - our \(ARMA\) process does not have an intercept term.
Considerations
Alternatively, we can use the built-in functions to simulate the models. We need to specify the \(\Phi(z)\) and \(\Theta(x)\) functions.
Note that for pure \(\rm AR\) models \(\Theta(x) = 1\) and for pure \(\rm MA\) models \(\Phi(z) = 1\).
The nice thing about this function is that we can examine these lag functions:
as well as get their calculated roots:
and check the stationarity/invertibility automatically:
We can compare the two simulation methods:
plot(E_data, type = "l", col = "cornflowerblue", lwd = 2)
lines(E_autogenr, col = "red", lty = 2, lwd = 2)
title("ARMA(2, 1) Process: Automatic vs Manual Simulation")
legend("bottomleft",
legend = c(as.expression(bquote(Y[t]==1.1*Y[t-1]-0.2*Y[t-2]+epsilon[t]+1.3*epsilon[t-1]~"(stationary non-invertible)")),
as.expression(bquote("autogenerated ARMA(2, 1) process with the same parameters"))),
lty = c(1, 2), lwd = 2, col = c("cornflowerblue", "red"), bg = "grey95")
In
R
the automatic simulation creates N + n.start
observations. Then the first n.start
observations are dropped as burn-in values.
fig = plt.figure(figsize = (10, 8))
#
_ = plt.plot(E_data, label = "$Y_t = 1.1 Y_{t-1} - 0.2 Y_{t-2} + \\epsilon_t + 1.3 \\epsilon_{t-1}$ (stationary non-invertible)", color = "blue")
_ = plt.plot(E_autogenr, label = "autogenerated ARMA(2, 1) process with the same parameters", linestyle = "-.", color = "red")
_ = plt.title("ARMA(2, 1) Process: Automatic vs Manual Simulation")
_ = plt.legend()
plt.show()
In
Python
the automatic simulation creates N + burnin
observations. Then the first burnin
observations are dropped as burn-in values.
Note - the same innovations were used to generate \((A)\) - \((E)\) processes (because for each of the process we have specified the same seed, and then we simulated the innovations using the same function and parameters), and the automated function simulated the innovations via the same specified seed.
#
#
#
par(mfrow = c(1, 2))
#
#
forecast::Acf(E_autogenr, lag.max = 20)
#
forecast::Pacf(E_autogenr, lag.max = 20)
fig = plt.figure(figsize = (10, 8))
_ = sm.tsa.graphics.plot_acf(E_autogenr, lags = 20, ax = fig.add_subplot(121),
zero = False)
_ = plt.ylabel('ACF')
_ = sm.tsa.graphics.plot_pacf(E_autogenr, lags = 20, ax = fig.add_subplot(122),
method = "ywm", zero = False)
_ = plt.ylabel('PACF')
_ = plt.tight_layout()
plt.show()
NOTE on a possible PACF error, need to specify either method = "ywm"
, or method = "ols"
Remember to exclude Lag = 0
from the ACF/PACF plots! This will ensure that the first bar in the plots is for the correlation between \(Y_t\) and \(Y_{t-1}\), instead of \(Y_t\) and itself!
2.3.1.3 Task 3
We remember that \(\epsilon_t \sim \mathcal{N}(0, 0.5^2)\).
- \((A)\) \(\mathbb{E} (Y_t) = \mathbb{E} (1 + \epsilon_t + 0.5 \epsilon_{t-1}) = 1\)
- \((B)\) \(\mathbb{E} (Y_t) = \mathbb{E} (1 + \epsilon_t + 1.3 \epsilon_{t-1} - 0.4 \epsilon_{t-2}) = 1\)
- \((C)\) \(\mathbb{E} (Y_t) = \mathbb{E} (1 + 1.1 Y_{t-1} + \epsilon_t) = 1 + 1.1 \mathbb{E} (Y_{t-1}) = 1 + 1.1 + (1.1)^2 \mathbb{E} (Y_{t-2}) = 1 + 1.1 + (1.1)^2 + (1.1)^3 \mathbb{E} (Y_{t-2}) = ... = \sum_{j = 0}^\infty (1.1)^j = \infty\).
Assuming that \(Y_t = 0\) for \(t \leq 0\), this can be expressed as \(\mathbb{E} (Y_t) =\sum_{j = 0}^{t-1} (1.1)^j\) for an \(AR(1)\) process. So, for \(t = 150\) this is roughly:
The above is an example of a non-stationary process with a non-constant mean.
- \((D)\) \(\mathbb{E} (Y_t) = \mathbb{E} (1 + 1.1 Y_{t-1} - 0.2 Y_{t-2} + \epsilon_t) = 1 + 1.1 \mathbb{E} (Y_{t-1}) - 0.2 \mathbb{E} (Y_{t-2})\). For a stationary process \(\mathbb{E} (Y_t) = \mu\), which gives \(\mu = 1 + 1.1 \mu - 0.2 \mu\), which gives \(\mu = \dfrac{1}{1 - 1.1 + 0.2} = \dfrac{1}{0.1} = 10\).
- \((E)\) \(\mathbb{E} (Y_t) = \mathbb{E} (1.1 Y_{t-1} - 0.2 Y_{t-2} + \epsilon_t + 1.3 \epsilon_{t-1}) = 1.1 \mathbb{E} (Y_{t-1}) - 0.2 \mathbb{E} (Y_{t-2})\). Since \(Y_t\) is stationary and invertible, we can write it as an infinite MA process. Furthermore, for a stationary process \(\mathbb{E} (Y_t) = \mu\), which gives \(\mu = 1.1 \mu - 0.2 \mu \Longrightarrow \mu = 0\).
2.3.1.4 Task 4
- \((A)\) \(Y_t = 1 + \epsilon_t + 0.5 \epsilon_{t-1}\)
par(mfrow = c(1, 2))
#
#
forecast::Acf(A_data, lag.max = 20)
#
#
forecast::Pacf(A_data, lag.max = 20)
fig = plt.figure(figsize = (10, 8))
_ = sm.tsa.graphics.plot_acf(A_data, lags = 20, zero = False, ax = fig.add_subplot(121))
_ = plt.ylabel('ACF')
_ = sm.tsa.graphics.plot_pacf(A_data, lags = 20, zero = False, ax = fig.add_subplot(122), method = "ywm")
_ = plt.ylabel('PACF')
_ = plt.tight_layout()
plt.show()
We see that the ACF is abruptly cutoff after the first lag. Regarding PACF - it appears to be somewhat declining, though it is difficult to say. We expect it to be an MA(1) process.
- \((B)\) \(Y_t = 1 + \epsilon_t + 1.3 \epsilon_{t-1} - 0.4 \epsilon_{t-2}\)
par(mfrow = c(1, 2))
#
#
forecast::Acf(B_data, lag.max = 20)
#
#
forecast::Pacf(B_data, lag.max = 20)
fig = plt.figure(figsize = (10, 8))
_ = sm.tsa.graphics.plot_acf(B_data, lags = 20, zero = False, ax = fig.add_subplot(121))
_ = plt.ylabel('ACF')
_ = sm.tsa.graphics.plot_pacf(B_data, lags = 20, zero = False, ax = fig.add_subplot(122), method = "ywm")
_ = plt.ylabel('PACF')
_ = plt.tight_layout()
plt.show()
We see that the ACF is abruptly cutoff after the second lag in R
. On the other hand - the ACF is abruptly cutoff after the first lag in Python
. Regarding PACF - it appears to be somewhat declining, though it is difficult to say. We would expect an \({\rm MA}(1)\) process in Python
(though in reality we know that it is an \({\rm MA}(2)\) process).
- \((C)\) \(Y_t = 1 + 1.1 Y_{t-1} + \epsilon_t\)
par(mfrow = c(1, 2))
#
#
forecast::Acf(C_data, lag.max = 20)
#
#
forecast::Pacf(C_data, lag.max = 20)
fig = plt.figure(figsize = (10, 8))
_ = sm.tsa.graphics.plot_acf(C_data, lags = 20, zero = False, ax = fig.add_subplot(121))
_ = plt.ylabel('ACF')
_ = sm.tsa.graphics.plot_pacf(C_data, lags = 20, zero = False, ax = fig.add_subplot(122), method = "ywm")
_ = plt.ylabel('PACF')
_ = plt.tight_layout()
plt.show()
The ACF appears to decline, while the PACF is abruptly cutoff after the first lag. We would expect an AR(1) process.
- \((D)\) \(Y_t = 1 + 1.1 Y_{t-1} - 0.2 Y_{t-2} + \epsilon_t\)
par(mfrow = c(1, 2))
#
#
forecast::Acf(D_data, lag.max = 20)
#
#
forecast::Pacf(D_data, lag.max = 20)
fig = plt.figure(figsize = (10, 8))
_ = sm.tsa.graphics.plot_acf(D_data, lags = 20, zero = False, ax = fig.add_subplot(121))
_ = plt.ylabel('ACF')
_ = sm.tsa.graphics.plot_pacf(D_data, lags = 20, zero = False, ax = fig.add_subplot(122), method = "ywm")
_ = plt.ylabel('PACF')
_ = plt.tight_layout()
plt.show()
The ACF appears to decline. The PACF is abruptly cutoff after the first lag in R
. We would expect an \({\rm AR}(1)\) process (though in reality it is an \({\rm AR}(2)\) process). In Python
, the ACF is abruptly cutoff after the second lag, indicating an \({\rm AR}(2)\) process.
Considerations
(Try to increase the sample size and re-examine the ACF/PACF plots - are they closer to the true theoretical cases?)
- \((E)\) \(Y_t = 1.1 Y_{t-1} - 0.2 Y_{t-2} + \epsilon_t + 1.3 \epsilon_{t-1}\)
par(mfrow = c(1, 2))
#
#
forecast::Acf(E_data, lag.max = 20)
#
#
forecast::Pacf(E_data, lag.max = 20)
fig = plt.figure(figsize = (10, 8))
_ = sm.tsa.graphics.plot_acf(E_data, lags = 20, zero = False, ax = fig.add_subplot(121))
_ = plt.ylabel('ACF')
_ = sm.tsa.graphics.plot_pacf(E_data, lags = 20, zero = False, ax = fig.add_subplot(122), method = "ywm")
_ = plt.ylabel('PACF')
_ = plt.tight_layout()
plt.show()
We see both the ACF and PACF to be declining (it is more obvious in R
, but less so in Python
) - we believe that it is an \(\rm ARMA\) process. We cannot identify its order from these plots alone.
Considerations
Note that if we have also automatically generated the data using the built-in functions, we can examine the theoretical ACF/PACF plots:
acf_theor <- stats::ARMAacf(ar = c(1.1, -0.2), ma = 1.3, lag.max = 19)
acf_empir <- stats::acf(E_autogenr, plot = FALSE, lag.max = 19)$acf
pacf_theor <- stats::ARMAacf(pacf = TRUE, ar = c(1.1, -0.2), ma = 1.3, lag.max = 19)
pacf_empir <- stats::pacf(E_autogenr, plot = FALSE, lag.max = 19)$acf
#
par(mfrow = c(2, 1))
# ACF:
plot(0:19, acf_theor, type = "h", col = "red", pch = 21,
ylim = c(min(acf_theor, acf_empir), max(acf_theor, acf_empir)))
lines(0:19, acf_theor, col = "red")
points(0:19, acf_theor, col = "red", pch = 19)
#
abline(h = 0, lty = 2)
#
lines(0:19, acf_empir, type = "h", col = "darkgreen")
points(0:19, acf_empir, col = "darkgreen", pch = 19)
lines(0:19, acf_empir, col = "darkgreen")
#
legend("topright", legend = c("Theoretical ACF", "Empirical ACF"),
pch = 19, col = c("red", "darkgreen"))
# PACF:
plot(1:19, pacf_theor, type = "h", col = "red", pch = 21,
ylim = c(min(pacf_theor, pacf_empir), max(pacf_theor, pacf_empir)))
lines(1:19, pacf_theor, col = "red")
points(1:19, pacf_theor, col = "red", pch = 19)
#
abline(h = 0, lty = 2)
#
lines(1:19, pacf_empir, type = "h", col = "darkgreen")
points(1:19, pacf_empir, col = "darkgreen", pch = 19)
lines(1:19, pacf_empir, col = "darkgreen")
#
legend("topright", legend = c("Theoretical PACF", "Empirical PACF"),
pch = 19, col = c("red", "darkgreen"))
Note: The \(PACF\) in R
starts from Lag = 1
, instead of Lag = 0
.
#
#
#
#
#
fig = plt.figure(figsize = (10, 8))
# ACF:
ax1 = fig.add_subplot(211)
_ = ax1.stem(E_auto_spec.acf(lags = 20), use_line_collection = True,
label = "Theoretical ACF", basefmt = "white",
linefmt = "r-", markerfmt = "ro-")
#
_ = ax1.stem(smt.stattools.acf(E_autogenr, nlags = 19), use_line_collection = True,
label = "Empirical ACF", basefmt = "white",
linefmt = "g-", markerfmt = "go-")
#
_ = ax1.axhline(y = 0, color = "black")
#
_ = plt.legend()
#
# PACF:
_ = ax2 = fig.add_subplot(212)
_ = ax2.stem(E_auto_spec.pacf(lags = 20),
label = "Theoretical PACF", use_line_collection = True,
basefmt = "white", linefmt = "r-", markerfmt = "ro-")
#
_ = ax2.stem(smt.stattools.pacf(E_autogenr, nlags = 19, method = "ywm"),
label = "Empirical PACF", basefmt = "white", use_line_collection = True,
linefmt = "g-", markerfmt = "go-")
#
_ = ax2.axhline(y = 0, color = "black")
#
_ = plt.legend()
plt.show()
As a further example, we will re-generate the process but with a larger sample size of \(N = 1500\):
And examine the ACF/PACF plots of this larger sample, as well as compare to the theoretical values:
acf_theor <- stats::ARMAacf(ar = c(1.1, -0.2), ma = 1.3, lag.max = 19)
acf_empir <- stats::acf(E_autogenr_large, plot = FALSE, lag.max = 19)$acf
pacf_theor <- stats::ARMAacf(pacf = TRUE, ar = c(1.1, -0.2), ma = 1.3, lag.max = 19)
pacf_empir <- stats::pacf(E_autogenr_large, plot = FALSE, lag.max = 19)$acf
#
par(mfrow = c(2, 1))
# ACF:
plot(0:19, acf_theor, type = "h", col = "red", pch = 21,
ylim = c(min(acf_theor, acf_empir), max(acf_theor, acf_empir)))
lines(0:19, acf_theor, col = "red")
points(0:19, acf_theor, col = "red", pch = 19)
#
abline(h = 0, lty = 2)
#
lines(0:19, acf_empir, type = "h", col = "darkgreen")
points(0:19, acf_empir, col = "darkgreen", pch = 19)
lines(0:19, acf_empir, col = "darkgreen")
#
legend("topright", legend = c("Theoretical ACF", "Empirical ACF, sample size N = 1500"),
pch = 19, col = c("red", "darkgreen"))
# PACF:
plot(1:19, pacf_theor, type = "h", col = "red", pch = 21,
ylim = c(min(pacf_theor, pacf_empir), max(pacf_theor, pacf_empir)))
lines(1:19, pacf_theor, col = "red")
points(1:19, pacf_theor, col = "red", pch = 19)
#
abline(h = 0, lty = 2)
#
lines(1:19, pacf_empir, type = "h", col = "darkgreen")
points(1:19, pacf_empir, col = "darkgreen", pch = 19)
lines(1:19, pacf_empir, col = "darkgreen")
#
legend("topright", legend = c("Theoretical PACF", "Empirical PACF, sample size N = 1500"),
pch = 19, col = c("red", "darkgreen"))
#
#
#
#
#
fig = plt.figure(figsize = (10, 8))
# ACF:
ax1 = fig.add_subplot(211)
_ = ax1.stem(E_auto_spec.acf(lags = 20), use_line_collection = True,
label = "Theoretical ACF", basefmt = "white",
linefmt = "r-", markerfmt = "ro-")
#
_ = ax1.stem(smt.stattools.acf(E_autogenr_large, nlags = 19), use_line_collection = True,
label = "Empirical ACF, sample size N = 1500", basefmt = "white",
linefmt = "g-", markerfmt = "go-")
#
_ = ax1.axhline(y = 0, color = "black")
#
_ = plt.legend()
#
# PACF:
_ = ax2 = fig.add_subplot(212)
_ = ax2.stem(E_auto_spec.pacf(lags = 20),
label = "Theoretical PACF", use_line_collection = True,
basefmt = "white", linefmt = "r-", markerfmt = "ro-")
#
_ = ax2.stem(smt.stattools.pacf(E_autogenr_large, nlags = 19, method = "ywm"),
label = "Empirical PACF, sample size N = 1500", basefmt = "white", use_line_collection = True,
linefmt = "g-", markerfmt = "go-")
#
_ = ax2.axhline(y = 0, color = "black")
#
_ = plt.legend()
plt.show()
The following result is important to note: the larger the sample size, the closer the estimates to the true values.
2.3.2 Exercise Set 2: Model Estimation
2.3.2.1 Task 5
We will estimate the models with the exact order that we used to generate the data.
Note that in this task the focus is on whether the coefficients are close to the true values, if we were to somehow correctly identify the true underlying lag order. Unfortunately, in empirical applications we almost never know the true lag order.
- \((A)\) \(Y_t = 1 + \epsilon_t + 0.5 \epsilon_{t-1}\)
## Series: A_data
## ARIMA(0,0,1) with non-zero mean
##
## Coefficients:
## ma1 mean
## 0.4651 0.9829
## s.e. 0.0933 0.0566
##
## sigma^2 estimated as 0.2277: log likelihood=-100.98
## AIC=207.96 AICc=208.12 BIC=216.99
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set -5.818588e-05 0.473997 0.3701291 3.276435 87.11847 0.7966185 -0.06266759
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set -5.818588e-05 0.473997 0.3701291 3.276435 87.11847 0.7966185 -0.06266759
## ARMA Model Results
## ==============================================================================
## Dep. Variable: y No. Observations: 150
## Model: ARMA(0, 1) Log Likelihood -109.740
## Method: css-mle S.D. of innovations 0.502
## Date: Tue, 06 Apr 2021 AIC 225.481
## Time: 23:17:12 BIC 234.513
## Sample: 0 HQIC 229.150
##
## ==============================================================================
## coef std err z P>|z| [0.025 0.975]
## ------------------------------------------------------------------------------
## const 1.0058 0.062 16.178 0.000 0.884 1.128
## ma.L1.y 0.5190 0.081 6.384 0.000 0.360 0.678
## Roots
## =============================================================================
## Real Imaginary Modulus Frequency
## -----------------------------------------------------------------------------
## MA.1 -1.9267 +0.0000j 1.9267 0.5000
## -----------------------------------------------------------------------------
Note that, in order to reduce the output size, we can print the coefficients only:
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ma1 0.465110 0.093316 4.9842 6.22e-07 ***
## intercept 0.982894 0.056585 17.3702 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Since by default the summary()
function does not return the \(p\)-values associated with the parameter significance for \(ARMA\) models in R
, so we used lmtest::coeftest()
.
However, since the arima()
function in R
uses the maximum likelihood estimation, the coefficients are asymptotically normal. Consequently, the lmtest::coeftest()
returns the \(z\)-statistic and the associated \(p\)-values.
## ==============================================================================
## coef std err z P>|z| [0.025 0.975]
## ------------------------------------------------------------------------------
## const 1.0058 0.062 16.178 0.000 0.884 1.128
## ma.L1.y 0.5190 0.081 6.384 0.000 0.360 0.678
## ==============================================================================
To make it easier to focus on the output, we will only print the resulting coefficient values for the remaining cases in this task.
- \((B)\) \(Y_t = 1 + \epsilon_t + 1.3 \epsilon_{t-1} - 0.4 \epsilon_{t-2}\)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ma1 0.328716 0.078560 4.1843 2.861e-05 ***
## ma2 -0.326702 0.084348 -3.8733 0.0001074 ***
## intercept 0.980641 0.059636 16.4437 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## ==============================================================================
## coef std err z P>|z| [0.025 0.975]
## ------------------------------------------------------------------------------
## const 1.0113 0.073 13.819 0.000 0.868 1.155
## ma.L1.y 0.3912 0.080 4.878 0.000 0.234 0.548
## ma.L2.y -0.2516 0.080 -3.127 0.002 -0.409 -0.094
## ==============================================================================
Note that the coefficients are not close to the true values (except for the constant and the overall coefficient signs). This is because our true process is not invertible. Hence, the software attempts to fit an invertible model, which results in different coefficients.
- \((C)\) \(Y_t = 1 + 1.1 Y_{t-1} + \epsilon_t\)
In R
, the forecast::Arima
function will return an error if the sample indicates that a process is non-stationary:
If we set method = "ML"
, then we can force R
to use MLE (maximum likelihood estimation) instead. This is a slower estimation method, however it always returns a stationary model and if the true process is stationary - may give better (in terms of coefficient efficiency) estimates.
## Warning in sqrt(diag(se)): NaNs produced
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 9.9936e-01 3.6535e-04 2735.4 < 2.2e-16 ***
## intercept 8.0685e+08 NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## ==============================================================================
## coef std err z P>|z| [0.025 0.975]
## ------------------------------------------------------------------------------
## const 1.327e+08 3.04e+09 0.044 0.965 -5.82e+09 6.08e+09
## ar.L1.y 0.9996 0.001 682.744 0.000 0.997 1.002
## ==============================================================================
Note the coefficient values for the non-stationary process.
- \((D)\) \(Y_t = 1 + 1.1 Y_{t-1} - 0.2 Y_{t-2} + \epsilon_t\)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 0.991885 0.081114 12.2283 <2e-16 ***
## ar2 -0.121086 0.081146 -1.4922 0.1356
## intercept 9.927723 0.287315 34.5535 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## ==============================================================================
## coef std err z P>|z| [0.025 0.975]
## ------------------------------------------------------------------------------
## const 10.1996 0.306 33.313 0.000 9.599 10.800
## ar.L1.y 1.1009 0.080 13.796 0.000 0.945 1.257
## ar.L2.y -0.2304 0.081 -2.849 0.004 -0.389 -0.072
## ==============================================================================
Note that the const
coefficient of the model is close to the process mean. This is not the intercept (even though it is labeled as intercept
in R
)!
- \((E)\) \(Y_t = 1.1 Y_{t-1} - 0.2 Y_{t-2} + \epsilon_t + 1.3 \epsilon_{t-1}\)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 0.849206 0.088156 9.6330 <2e-16 ***
## ar2 0.019357 0.088083 0.2198 0.8261
## ma1 0.955942 0.036991 25.8425 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## ==============================================================================
## coef std err z P>|z| [0.025 0.975]
## ------------------------------------------------------------------------------
## ar.L1.y 1.0817 0.090 12.046 0.000 0.906 1.258
## ar.L2.y -0.2044 0.091 -2.249 0.025 -0.383 -0.026
## ma.L1.y 0.8161 0.049 16.537 0.000 0.719 0.913
## ==============================================================================
Note that that we used trend
to decide whether to include a constant or not. c
includes constant, nc
- no constant.
We can also see the roots for stationarity and invertibility. Again, the fitted model attempts to fit a stationary and invertible process, hence the MA coefficient is different for this process (which is generated from a non-invertible data generating process).
We can also extract specific information about the model:
## [1] "Number of AR parameters: 2"
## [1] "Number of MA parameters: 1"
as well as specific parameters:
## [1] "All of the parameters: 0.849205786332935, 0.0193571085494739, 0.95594224634259"
## [1] "All of the p-values: 5.80260158765536e-22, 0.826057697486593, 2.95779260381029e-147"
print(paste0("Only AR parameters: ", paste0(coef(E_mdl_1)[grepl("^ar", names(coef(E_mdl_1)))], collapse = ", ")))
## [1] "Only AR parameters: 0.849205786332935, 0.0193571085494739"
print(paste0("Only MA parameters: ", paste0(coef(E_mdl_1)[grepl("^ma", names(coef(E_mdl_1)))], collapse = ", ")))
## [1] "Only MA parameters: 0.95594224634259"
## All of the parameters: [ 1.08174983 -0.20442291 0.81605632]
## All of the p-values: [2.02459484e-33 2.45292519e-02 1.99072901e-61]
## Only AR parameters: [ 1.08174983 -0.20442291]
## Only MA parameters: [0.81605632]
Considerations
Interestingly, on the larger sample size the autoregressive parameters are indeed close to the true values:
## Series: E_autogenr_large
## ARIMA(2,0,1) with zero mean
##
## Coefficients:
## ar1 ar2 ma1
## 1.0790 -0.1854 0.7790
## s.e. 0.0305 0.0305 0.0201
##
## sigma^2 estimated as 0.4135: log likelihood=-1466.6
## AIC=2941.21 AICc=2941.23 BIC=2962.46
## ARMA Model Results
## ==============================================================================
## Dep. Variable: y No. Observations: 1500
## Model: ARMA(2, 1) Log Likelihood -1432.734
## Method: css-mle S.D. of innovations 0.628
## Date: Tue, 06 Apr 2021 AIC 2873.469
## Time: 23:17:14 BIC 2894.722
## Sample: 0 HQIC 2881.386
##
## ==============================================================================
## coef std err z P>|z| [0.025 0.975]
## ------------------------------------------------------------------------------
## ar.L1.y 1.1109 0.030 37.073 0.000 1.052 1.170
## ar.L2.y -0.2139 0.030 -7.157 0.000 -0.272 -0.155
## ma.L1.y 0.7735 0.020 39.262 0.000 0.735 0.812
## Roots
## =============================================================================
## Real Imaginary Modulus Frequency
## -----------------------------------------------------------------------------
## AR.1 1.1586 +0.0000j 1.1586 0.0000
## AR.2 4.0356 +0.0000j 4.0356 0.0000
## MA.1 -1.2929 +0.0000j 1.2929 0.5000
## -----------------------------------------------------------------------------
However, the moving-average parameters are not, since the process is non-invertible.
The estimated model, in Python
, is:
\[\widehat{Y}_t = 1.1109 \cdot \widehat{Y}_{t-1} - 0.2139 \cdot \widehat{Y}_{t-2} + e_t + 0.7735 \cdot e_{t-1},\]
where \(e_t = Y_t - 1.1109 \cdot \widehat{Y}_{t-1} + 0.2139 \cdot \widehat{Y}_{t-2} - 0.7735 \cdot e_{t-1}\) and \(e_0 = 0\).
We will see more on model fitting in the last task.
2.3.2.2 Task 6
We will now attempt to automatically determine the appropriate lag order and estimate the unknown parameters using the built-in functions in R
and Python
. For this example, we will simply use the \(AIC\).
Note that an \(AICc\), or a \(BIC\) information criterion would be better, as those put penalize models with more parameters.
In this specific task, we chose \(AIC\) to highlight just how much the model order could differ from the true one. In real empirical examples, it is recommended to use \(BIC\), or \(AICc\).
Note: for this task in R
, we will use stationary = TRUE
to only search for stationary models. Generally, if a process is found to be non-stationary, R
will try to differentiate the series (i.e. calculate \(\Delta Y_t\)) an then try to fit a stationary model for the differences. We will examine more about this in later chapters.
Note the documentation: This method can be used to tentatively identify the order of an ARMA process, provided that the time series is stationary and invertible.
- \((A)\) \(Y_t = 1 + \epsilon_t + 0.5 \epsilon_{t-1}\)
The automatically selected model is:
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 -0.747592 0.116093 -6.4396 1.198e-10 ***
## ma1 1.202935 0.155282 7.7468 9.425e-15 ***
## ma2 0.229716 0.140925 1.6301 0.1031
## intercept 0.982063 0.052107 18.8469 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
A_mdl_auto = smt.arima_model.ARMA(A_data, order = A_est_auto.aic_min_order).fit(trend = 'c')
print(A_mdl_auto.summary().tables[1])
## ==============================================================================
## coef std err z P>|z| [0.025 0.975]
## ------------------------------------------------------------------------------
## const 1.0046 0.055 18.121 0.000 0.896 1.113
## ar.L1.y -1.0083 0.080 -12.614 0.000 -1.165 -0.852
## ar.L2.y -0.3666 0.113 -3.242 0.001 -0.588 -0.145
## ar.L3.y 0.1575 0.113 1.390 0.165 -0.065 0.379
## ar.L4.y -0.2547 0.080 -3.176 0.001 -0.412 -0.097
## ma.L1.y 1.6032 0.035 45.238 0.000 1.534 1.673
## ma.L2.y 0.9999 0.042 24.066 0.000 0.918 1.081
## ==============================================================================
Considerations
If we wanted, we could restrict the models by specifying the maximum order of the \(MA\) or the \(AR\) (or both) parts:
- \((B)\) \(Y_t = 1 + \epsilon_t + 1.3 \epsilon_{t-1} - 0.4 \epsilon_{t-2}\)
The automatically selected model is:
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 -0.612249 0.077360 -7.9143 2.486e-15 ***
## ma1 0.964260 0.031071 31.0342 < 2.2e-16 ***
## intercept 0.980303 0.071183 13.7716 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
B_mdl_auto = smt.arima_model.ARMA(B_data, order = B_est_auto.aic_min_order).fit(trend = 'c')
print(B_mdl_auto.summary().tables[1])
## ==============================================================================
## coef std err z P>|z| [0.025 0.975]
## ------------------------------------------------------------------------------
## const 1.0099 0.071 14.219 0.000 0.871 1.149
## ar.L1.y -1.0880 0.136 -7.984 0.000 -1.355 -0.821
## ar.L2.y -0.5681 0.165 -3.447 0.001 -0.891 -0.245
## ar.L3.y -0.0052 0.123 -0.042 0.966 -0.246 0.235
## ar.L4.y -0.2750 0.091 -3.030 0.002 -0.453 -0.097
## ma.L1.y 1.5205 0.099 15.291 0.000 1.326 1.715
## ma.L2.y 0.8827 0.204 4.327 0.000 0.483 1.283
## ==============================================================================
Considerations
Even though our process is invertible - the suggested model, in some cases, may still result in an error when estimating in Python
. For example, if we did not use burn-in, and tried to fit the model on the first \(N\) observations:
B_est_auto_tmp = sm.tsa.stattools.arma_order_select_ic(B_data_full[:N], ic = 'aic', trend = 'c')
B_mdl_auto_tmp = smt.arima_model.ARMA(B_data_full[:N], order = B_est_auto_tmp.aic_min_order).fit(trend = 'c')
## Error in py_call_impl(callable, dots$args, dots$keywords): ValueError: The computed initial MA coefficients are not invertible
## You should induce invertibility, choose a different model order, or you can
## pass your own start_params.
##
## Detailed traceback:
## File "<string>", line 1, in <module>
## File "C:\Users\Andrius\AppData\Local\Programs\Python\Python38\lib\site-packages\statsmodels\tsa\arima_model.py", line 1016, in fit
## start_params = self._fit_start_params((k_ar, k_ma, k), method,
## File "C:\Users\Andrius\AppData\Local\Programs\Python\Python38\lib\site-packages\statsmodels\tsa\arima_model.py", line 608, in _fit_start_params
## start_params = self._fit_start_params_hr(order, start_ar_lags)
## File "C:\Users\Andrius\AppData\Local\Programs\Python\Python38\lib\site-packages\statsmodels\tsa\arima_model.py", line 593, in _fit_start_params_hr
## raise ValueError("The computed initial MA coefficients are not "
## Error in py_call_impl(callable, dots$args, dots$keywords): NameError: name 'B_mdl_auto_tmp' is not defined
##
## Detailed traceback:
## File "<string>", line 1, in <module>
To make the error more compact, we will wrap the function in a try-catch block.
Note that in R
we did not have any errors when searching for an appropriate model.
However, we did get an error, when trying to manually estimate a model for a subset of the data from \((C)\):
tryCatch({
forecast::Arima(C_data_full[1:N], order = c(1, 0, 0), include.constant = TRUE)
}, error = function(e){
print(e)
})
## <simpleError in stats::arima(x = x, order = order, seasonal = seasonal, include.mean = include.mean, method = method, ...): non-stationary AR part from CSS>
try:
B_mdl_auto_tmp = smt.arima_model.ARMA(B_data_full[:N], order = B_est_auto_tmp.aic_min_order).fit(trend = 'c')
print(B_mdl_auto_tmp.summary().tables[1])
except ValueError as err:
print('Handling run-time error:', err)
## Handling run-time error: The computed initial MA coefficients are not invertible
## You should induce invertibility, choose a different model order, or you can
## pass your own start_params.
We could try restricting the order of some of the lags. Since the error concerns invertibility - let’s restrict the maximum \(MA\) order to \(q_{\max} = 1\):
Note that in R
we can restrict the \(MA\) coefficients by specifying max.q
.
Note that now the \(AR\) order is also different - restricting \(MA\), or \(AR\) terms results in the rest of the model parameters having a different lag order in some cases. This is because the automatic order selection tries various model combinations to find the best one (in terms of some specified criterion).
B_mdl_auto_tmp = smt.arima_model.ARMA(B_data_full[:N], order = B_mdl_auto_tmp.aic_min_order).fit(trend = 'c')
print(B_mdl_auto_tmp.summary().tables[1])
## ==============================================================================
## coef std err z P>|z| [0.025 0.975]
## ------------------------------------------------------------------------------
## const 1.0542 0.089 11.817 0.000 0.879 1.229
## ar.L1.y -0.3954 0.101 -3.904 0.000 -0.594 -0.197
## ma.L1.y 0.8184 0.055 14.846 0.000 0.710 0.926
## ==============================================================================
- \((C)\) \(Y_t = 1 + 1.1 Y_{t-1} + \epsilon_t\)
Since our process is non-stationary, we might get an error when estimating. This depends on the package version, as well as the randomness of the data.
To make the error more compact, we will wrap the function in a try-catch block:
try:
C_mdl_auto = smt.arima_model.ARMA(C_data, order = C_est_auto.aic_min_order).fit(trend = 'c')
print(C_mdl_auto.summary().tables[1])
except ValueError as err:
print('Handling run-time error:', err)
## Handling run-time error: The computed initial AR coefficients are not stationary
## You should induce stationarity, choose a different model order, or you can
## pass your own start_params.
So our suggestions are to either (a) induce stationarity; or (b) choose a different model order.
Since we know that this process is non-stationary, we may try to specify an \(\rm ARMA(2, 2)\) model and hope that the \(\rm AR\) coefficients approximately stationary, or a pure \(\rm MA\) model. We will go with a pure \(MA\) process, since they are always stationary (though they may not be the best fit on the data). We opt to choose an order automatically.
Considerations
We will examine how to deal with non-stationary processes in later lectures (to be more specific, we will examine the unit root case).
Note that for larger orders the calculations take a while, so we restrict to \(q_\max = 6\) (default max in Python
is 2 for \(\rm MA\) and \(p_\max = 4\) for \(\rm AR\)).
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## intercept 147172811 23680140 6.215 5.131e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The results are unchanged in R
, since our previous model was an \(ARMA(0,0)\). This would suggest that a constant-only model is the best fit for our data. However, there is no guarantee that it is stationary - we still need to check the residuals!
C_mdl_auto = smt.arima_model.ARMA(C_data, order = C_mdl_auto.aic_min_order).fit(trend = 'c')
print(C_mdl_auto.summary().tables[1])
## ==============================================================================
## coef std err z P>|z| [0.025 0.975]
## ------------------------------------------------------------------------------
## const 1.327e+08 3.5e+07 3.789 0.000 6.4e+07 2.01e+08
## ma.L1.y 6.0000 nan nan nan nan nan
## ma.L2.y 15.0000 nan nan nan nan nan
## ma.L3.y 20.0000 nan nan nan nan nan
## ma.L4.y 15.0000 2.63e-05 5.71e+05 0.000 15.000 15.000
## ma.L5.y 6.0000 nan nan nan nan nan
## ma.L6.y 1.0000 3.54e-06 2.82e+05 0.000 1.000 1.000
## ==============================================================================
Note again that the non-stationarity of the process is quite a problem even looking at the results - some standard errors are nan
in Python
.
- \((D)\) \(Y_t = 1 + 1.1 Y_{t-1} - 0.2 Y_{t-2} + \epsilon_t\)
The automatically selected model is:
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 0.049504 0.069359 0.7137 0.4754
## ar2 0.718813 0.069363 10.3630 <2e-16 ***
## ma1 0.971401 0.027784 34.9626 <2e-16 ***
## intercept 9.917819 0.306341 32.3751 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
D_mdl_auto = smt.arima_model.ARMA(D_data, order = D_est_auto.aic_min_order).fit(trend = 'c')
print(D_mdl_auto.summary().tables[1])
## ==============================================================================
## coef std err z P>|z| [0.025 0.975]
## ------------------------------------------------------------------------------
## const 10.1960 0.296 34.463 0.000 9.616 10.776
## ar.L1.y -0.3933 0.079 -4.980 0.000 -0.548 -0.239
## ar.L2.y 0.5553 0.063 8.813 0.000 0.432 0.679
## ar.L3.y 0.6931 0.064 10.766 0.000 0.567 0.819
## ar.L4.y -0.3052 0.080 -3.816 0.000 -0.462 -0.148
## ma.L1.y 1.5996 0.033 49.043 0.000 1.536 1.663
## ma.L2.y 0.9997 0.037 26.676 0.000 0.926 1.073
## ==============================================================================
- \((E)\) \(Y_t = 1.1 Y_{t-1} - 0.2 Y_{t-2} + \epsilon_t + 1.3 \epsilon_{t-1}\)
For this dataset, we will attempt to find the best model without a constant
term. In practice, if we are unsure of whether we should include the mean or not - we could always try to include the mean and examine whether it is statistically significant.
Note that we would expect to have problems with \((B)\) and \((C)\), since the automatic model selection functions (as well as the parameter estimation assumptions) require stationarity and invertibility.
Note that in the case of \((A)\), in Python
specifying max_ar = 0
returns a result without any warnings. This is a current drawback of this method as it does not automatically distinguish when an order of 0
is appropriate for either MA, or AR components. Update: after a package update some warnings may have disappeared.
Either way, in all cases the final suggested models are ARMA models, instead of pure AR or pure MA models.
The automatically selected model is:
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 0.866539 0.039792 21.777 < 2.2e-16 ***
## ma1 0.952700 0.037476 25.422 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
E_mdl_auto = smt.arima_model.ARMA(E_data, order = E_est_auto.aic_min_order).fit(trend = 'nc')
print(E_mdl_auto.summary().tables[1])
## ==============================================================================
## coef std err z P>|z| [0.025 0.975]
## ------------------------------------------------------------------------------
## ar.L1.y 1.0817 0.090 12.046 0.000 0.906 1.258
## ar.L2.y -0.2044 0.091 -2.249 0.025 -0.383 -0.026
## ma.L1.y 0.8161 0.049 16.537 0.000 0.719 0.913
## ==============================================================================
Considerations
For the larger sample size:
The suggested process is actually the same order as the true underlying process. Note that because of the non-invertibility of the process this may not always be the case.
If the process was both stationary and invertible, then it is very likely that for a larger sample the automated order selection would suggest a model order, which is close to the true process, as long as the true underlying process is an \(\rm ARMA(p, q)\) process (which is stationary and invertible).
2.3.2.3 Task 7
We will show with \((A)\) since it is both stationary and invertible.
- \((A)\) \(Y_t = 1 + \epsilon_t + 0.5 \epsilon_{t-1}\)
Rewriting as an \(AR(\infty)\) process yields \(Y_t = \alpha + \sum_{j = 1}^\infty (-1)^{j+1} \theta^j Y_{t-j} + \epsilon_t\). The first \(j = 1, ... , 10\) coefficients \((-1)^{j+1} \theta^j\) are equal to:
## [1] 0.5000000000 -0.2500000000 0.1250000000 -0.0625000000 0.0312500000 -0.0156250000 0.0078125000 -0.0039062500 0.0019531250 -0.0009765625
As we can see, the coefficients are decaying to zero. Hence after some lag \(j^*\), the coefficient \((-1)^{j+1} \approx 0\), \(\forall j > j^*\).
A manually specified arbitrary approximation could be, for example, an \(\rm AR(10)\) process:
## Series: A_data
## ARIMA(10,0,0) with non-zero mean
##
## Coefficients:
## ar1 ar2 ar3 ar4 ar5 ar6 ar7 ar8 ar9 ar10 mean
## 0.4194 -0.2873 0.2168 -0.2302 0.1488 -0.1227 0.1358 -0.0228 0.0180 0.0259 0.9826
## s.e. 0.0819 0.0890 0.0926 0.0952 0.0969 0.0994 0.0983 0.0965 0.0927 0.0846 0.0538
##
## sigma^2 estimated as 0.2316: log likelihood=-97.66
## AIC=219.32 AICc=221.59 BIC=255.44
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set -0.0001044864 0.4632639 0.3684176 3.473363 86.57543 0.7929348 0.004850416
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set -0.0001044864 0.4632639 0.3684176 3.473363 86.57543 0.7929348 0.004850416
## ARMA Model Results
## ==============================================================================
## Dep. Variable: y No. Observations: 150
## Model: ARMA(10, 0) Log Likelihood -104.547
## Method: css-mle S.D. of innovations 0.485
## Date: Tue, 06 Apr 2021 AIC 233.094
## Time: 23:17:42 BIC 269.222
## Sample: 0 HQIC 247.771
##
## ==============================================================================
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 0.419395 0.081882 5.1219 3.024e-07 ***
## ar2 -0.287303 0.089035 -3.2268 0.001252 **
## ar3 0.216785 0.092640 2.3401 0.019280 *
## ar4 -0.230217 0.095180 -2.4187 0.015574 *
## ar5 0.148844 0.096902 1.5360 0.124530
## ar6 -0.122722 0.099436 -1.2342 0.217133
## ar7 0.135801 0.098285 1.3817 0.167061
## ar8 -0.022780 0.096475 -0.2361 0.813335
## ar9 0.017985 0.092739 0.1939 0.846232
## ar10 0.025864 0.084602 0.3057 0.759827
## intercept 0.982574 0.053798 18.2642 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## ==============================================================================
## coef std err z P>|z| [0.025 0.975]
## ------------------------------------------------------------------------------
## const 1.0092 0.053 19.167 0.000 0.906 1.112
## ar.L1.y 0.5036 0.081 6.213 0.000 0.345 0.662
## ar.L2.y -0.2746 0.092 -2.984 0.003 -0.455 -0.094
## ar.L3.y 0.1230 0.096 1.278 0.201 -0.066 0.312
## ar.L4.y -0.2126 0.096 -2.213 0.027 -0.401 -0.024
## ar.L5.y 0.1443 0.098 1.465 0.143 -0.049 0.337
## ar.L6.y -0.1727 0.099 -1.738 0.082 -0.368 0.022
## ar.L7.y 0.0480 0.099 0.486 0.627 -0.146 0.242
## ar.L8.y 0.0247 0.099 0.251 0.802 -0.168 0.218
## ar.L9.y -0.0801 0.095 -0.842 0.400 -0.266 0.106
## ar.L10.y 0.1472 0.086 1.708 0.088 -0.022 0.316
## ==============================================================================
We see that \(\widehat{\phi}_j\) for \(j \geq 5\) are statistically insignificant, since their \(p\)-values are less than 0.05.
We can also get the roots of the estimated \(ARMA\) model.
a_ar_mdl_roots <- polyroot(c(1, - A_mdl_2$model$phi))
tmp <- data.frame(Real = Re(a_ar_mdl_roots),
Imaginary = Im(a_ar_mdl_roots),
Modulus = Mod(a_ar_mdl_roots),
row.names = paste0("AR.", 1:length(A_mdl_2$model$phi)))
print(round(tmp, 4))
## Real Imaginary Modulus
## AR.1 0.9856 0.8600 1.3081
## AR.2 -0.9892 0.7831 1.2616
## AR.3 -0.9892 -0.7831 1.2616
## AR.4 0.9856 -0.8600 1.3081
## AR.5 0.5588 1.4510 1.5549
## AR.6 -0.2976 1.3156 1.3488
## AR.7 -0.2976 -1.3156 1.3488
## AR.8 1.2905 0.0000 1.2905
## AR.9 0.5588 -1.4510 1.5549
## AR.10 -2.5012 0.0000 2.5012
## Roots
## ==============================================================================
## Real Imaginary Modulus Frequency
## ------------------------------------------------------------------------------
## AR.1 -0.8607 -0.7460j 1.1390 -0.3863
## AR.2 -0.8607 +0.7460j 1.1390 0.3863
## AR.3 -1.3889 -0.0000j 1.3889 -0.5000
## AR.4 -0.2233 -1.2090j 1.2294 -0.2791
## AR.5 -0.2233 +1.2090j 1.2294 0.2791
## AR.6 0.4632 -1.1232j 1.2150 -0.1877
## AR.7 0.4632 +1.1232j 1.2150 0.1877
## AR.8 0.9515 -0.6509j 1.1529 -0.0955
## AR.9 0.9515 +0.6509j 1.1529 0.0955
## AR.10 1.2715 -0.0000j 1.2715 -0.0000
## ------------------------------------------------------------------------------
We can also try to automatically fit the best \(AR\) model:
## Series: A_data
## ARIMA(2,0,0) with non-zero mean
##
## Coefficients:
## ar1 ar2 mean
## 0.3575 -0.1772 0.9845
## s.e. 0.0805 0.0805 0.0476
##
## sigma^2 estimated as 0.2328: log likelihood=-102.08
## AIC=212.17 AICc=212.44 BIC=224.21
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 0.357521 0.080459 4.4435 8.85e-06 ***
## ar2 -0.177160 0.080533 -2.1998 0.02782 *
## intercept 0.984549 0.047585 20.6904 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Results: ARMA
## ==================================================================
## Model: ARMA BIC: 241.3021
## Dependent Variable: y Log-Likelihood: -110.63
## Date: 2021-04-06 23:17 Scale: 1.0000
## No. Observations: 150 Method: css-mle
## Df Model: 3 Sample: 0
## Df Residuals: 147 0
## Converged: 1.0000 S.D. of innovations: 0.506
## No. Iterations: 8.0000 HQIC: 234.152
## AIC: 229.2596
## --------------------------------------------------------------------
## Coef. Std.Err. t P>|t| [0.025 0.975]
## --------------------------------------------------------------------
## const 1.0056 0.0571 17.6259 0.0000 0.8938 1.1175
## ar.L1.y 0.4661 0.0800 5.8262 0.0000 0.3093 0.6229
## ar.L2.y -0.1890 0.0803 -2.3528 0.0186 -0.3464 -0.0316
## ---------------------------------------------------------------------------
## Real Imaginary Modulus Frequency
## ---------------------------------------------------------------------------
## AR.1 1.2331 -1.9418 2.3003 -0.1600
## AR.2 1.2331 1.9418 2.3003 0.1600
## ==================================================================
Note the different coefficients. This is because the lower level model is a sort of approximation of the higher order model.
We leave the remaining cases as exercises (might be added later).
Considerations
From our automatically simulated data, we can transform it into a pure AR, or pure MA process of different orders:
## [1] 2.40000 2.44000 2.20400 1.93640 1.68924
## [1] -2.400 3.320 -4.316 5.611 -7.294
For Python
- looking at the documentation (here and here), this should be used with caution as it (as of 2019) has minimal documentation and no examples.
2.3.3 Exercise Set 3: Residual Diagnostics
Remember that our underlying assumption is that \(\epsilon_t \sim WN\). In other words the residuals:
- must be zero-mean;
- must have a constant variance;
- must not be autocorrelated;
The zero-mean property is guaranteed by the estimation method (the proof is similar to the OLS estimation method, which minimizes the sum of squared residuals in the univariate cross-sectional regression case). We can examine the variance by checking for ARCH effects (a topic, which is covered separately). For the purpose of these exercises, we will examine the variance only visually.
We note that, unlike the forecast
package in R
, the statsmodels
package in Python
does not have the tsdisplay
function. Fortunately, we can always create one ourselves for simplicity:
def tsdisplay(y, figsize = (10, 8), title = "", lags = 20):
tmp_data = pd.Series(y)
fig = plt.figure(figsize = figsize)
#Plot the time series
_ = tmp_data.plot(ax = fig.add_subplot(311), title = "$Time\ Series\ " + title + "$", legend = False)
#Plot the ACF:
_ = sm.graphics.tsa.plot_acf(tmp_data, lags = lags, zero = False, ax = fig.add_subplot(323))
_ = plt.xticks(np.arange(1, lags + 1, 1.0))
#Plot the PACF:
_ = sm.graphics.tsa.plot_pacf(tmp_data, lags = 20, zero = False, ax = fig.add_subplot(324))
_ = plt.xticks(np.arange(1, lags + 1, 1.0))
#Plot the QQ plot of the data:
_ = sm.qqplot(tmp_data, line='s', ax = fig.add_subplot(325))
_ = plt.title("QQ Plot")
#Plot the residual histogram:
_ = fig.add_subplot(326).hist(tmp_data, bins = 40, density = 1)
_ = plt.title("Histogram")
#Fix the layout of the plots:
_ = plt.tight_layout()
plt.show()
Note that the stats::tsdiag()
does not take into account the reduction in degrees of freedom, which results from using the residuals of an estimated model. More similar issues can be found here. We can specify our own version of tsdiag()
:
Furthermore, the tsdiag()
function, which is available in R
, does not have an analog in statsmodels
. Nevertheless, we can create our own:
tsdiag2 <- function(y, title = "", lags = 30, df = 0){
rez <- matrix(0, nrow = 4, ncol = lags)
rownames(rez) <- c("Ljung-Box: X-squared", "Ljung-Box: p-value", "Box-Pierce: X-squared", "Box-Pierce: p-value")
colnames(rez) <- 1:lags
suppressWarnings({
rez[1, ] <- sapply(1:lags, function(z){Box.test(y, lag = z, fitdf = df, type = "Ljung-Box")$statistic})
rez[2, ] <- sapply(1:lags, function(z){Box.test(y, lag = z, fitdf = df, type = "Ljung-Box")$p.value})
rez[3, ] <- sapply(1:lags, function(z){Box.test(y, lag = z, fitdf = df, type = "Box-Pierce")$statistic})
rez[4, ] <- sapply(1:lags, function(z){Box.test(y, lag = z, fitdf = df, type = "Box-Pierce")$p.value})
})
# Fix for when df - fitdf = 0, since Box.test(y, lag = 0) and Box.test(y, lag = 1, fitdf = 1) do not always return NA
rez[ c(2, 4), 1:lags == df] <- NA
print(rez)
# Plot Ljung-Box and Box-Pierce statistic p-values:
plot(1:lags, rez[2, ], pch = 19, col = "blue", main = paste0("Time Series, with model fitted df = ", df),
ylim = c(0, max(rez[c(2, 4),], 0.05, na.rm = TRUE)))
points(1:lags, rez[4, ], pch = 19, col = "darkgreen")
abline(h = 0.05, col = "red")
axis(1, at = 1:lags, labels = 1:lags)
legend("topleft", legend = c("Ljung-Box", "Box-Pierce", "5% critical value"),
lty = c(NA, NA, 1), pch = c(19, 19, NA), col = c("blue", "darkgreen", "red"))
# Return nothing:
#return(NULL)
}
def tsdiag(y, fig_size = (10, 8), title = "", lags = 30, df = 0):
tmp_data = pd.Series(y.copy()).reset_index(drop = True)
tmp_data.index = list(range(1, len(tmp_data) + 1))
# Carry out the Ljung-Box and Box-Pierce tests:
tmp_acor = list(sm_stat.diagnostic.acorr_ljungbox(tmp_data, lags = lags, model_df = df, boxpierce = True))
#
col_index = ["Ljung-Box: X-squared", "Ljung-Box: p-value", "Box-Pierce: X-squared", "Box-Pierce: p-value"]
rez = pd.DataFrame(tmp_acor, index = col_index, columns = range(1, len(tmp_acor[0]) + 1))
print(rez)
# Plot Ljung-Box and Box-Pierce statistic p-values:
fig = plt.figure(figsize = fig_size)
_ = plt.plot(range(1, len(tmp_acor[0]) + 1), tmp_acor[1], 'bo', label = "Ljung-Box values")
_ = plt.plot(range(1, len(tmp_acor[0]) + 1), tmp_acor[3], 'go', label = "Box-Pierce values")
_ = plt.xticks(np.arange(1, len(tmp_acor[0]) + 1, 1.0))
_ = plt.axhline(y = 0.05, color = "red", label = "5% critical value")
_ = plt.title("$Time\ Series\ " + title + "$, with model fitted df = " + str(df))
_ = plt.legend()
plt.show()
# Return nothing
return None
where df
is passed to the model_df
in the acorr_ljungbox() function. As per the documentation, it is the number of degrees of freedom consumed by the model. In an \(ARMA(p, q)\) model, this value is usually equal to \(p + q\) (unless we also estimate an intercept). This value is subtracted from the degrees-of-freedom (dof) used in the test so that the adjusted dof for the statistics are ags - model_df
.
Note: If lags - model_df <= 0
, then NaN
is returned.
IMPORTANT: this requires at leaststatsmodels 0.11.0
! An older version of this package does not have the option to specify model_df
!
2.3.3.1 Task 8, Task 9 and Task 10
To make it easier to follow, we will carry out the residual diagnostic tasks for each model separately, instead of doing each of the task for all models at once.
- \((A)\) \(Y_t = 1 + \epsilon_t + 0.5 \epsilon_{t-1}\)
The residuals of our manually specified model (with the true underlying lag order):
forecast::tsdisplay(A_mdl_1$residuals, main = as.expression(bquote("Time Series: Residuals of"~Y[t]==1+epsilon[t]+0.5*epsilon[t-1])))
## Traceback (most recent call last):
## File "C:\Users\Andrius\AppData\Local\Programs\Python\Python38\lib\site-packages\matplotlib\backends\backend_qt5.py", line 480, in _draw_idle
## self.draw()
## File "C:\Users\Andrius\AppData\Local\Programs\Python\Python38\lib\site-packages\matplotlib\backends\backend_agg.py", line 407, in draw
## self.figure.draw(self.renderer)
## File "C:\Users\Andrius\AppData\Local\Programs\Python\Python38\lib\site-packages\matplotlib\artist.py", line 41, in draw_wrapper
## return draw(artist, renderer, *args, **kwargs)
## File "C:\Users\Andrius\AppData\Local\Programs\Python\Python38\lib\site-packages\matplotlib\figure.py", line 1863, in draw
## mimage._draw_list_compositing_images(
## File "C:\Users\Andrius\AppData\Local\Programs\Python\Python38\lib\site-packages\matplotlib\image.py", line 131, in _draw_list_compositing_images
## a.draw(renderer)
## File "C:\Users\Andrius\AppData\Local\Programs\Python\Python38\lib\site-packages\matplotlib\artist.py", line 41, in draw_wrapper
## return draw(artist, renderer, *args, **kwargs)
## File "C:\Users\Andrius\AppData\Local\Programs\Python\Python38\lib\site-packages\matplotlib\cbook\deprecation.py", line 411, in wrapper
## return func(*inner_args, **inner_kwargs)
## File "C:\Users\Andrius\AppData\Local\Programs\Python\Python38\lib\site-packages\matplotlib\axes\_base.py", line 2707, in draw
## self._update_title_position(renderer)
## File "C:\Users\Andrius\AppData\Local\Programs\Python\Python38\lib\site-packages\matplotlib\axes\_base.py", line 2636, in _update_title_position
## if (ax.xaxis.get_ticks_position() in ['top', 'unknown']
## File "C:\Users\Andrius\AppData\Local\Programs\Python\Python38\lib\site-packages\matplotlib\axis.py", line 2207, in get_ticks_position
## self._get_ticks_position()]
## File "C:\Users\Andrius\AppData\Local\Programs\Python\Python38\lib\site-packages\matplotlib\axis.py", line 1892, in _get_ticks_position
## major = self.majorTicks[0]
## IndexError: list index out of range
We now want to test the null hypothesis of no autocorrelation: \[ \begin{aligned} H_0 &: \rho(1) = ... = \rho(k) = 0\\ H_1 &: \exists j: \rho(j) \neq 0 \end{aligned} \]
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
## Ljung-Box: X-squared 0.6009447 1.3859901 2.4992287 4.8520838 5.0756260 5.2298353 5.5388280 6.2152166 6.4117377 7.1645985 7.4158427 10.1369025 10.6403396 10.8707988 13.6356228 15.5397188 15.7110603 16.0355597 17.1630659 20.1988345
## Ljung-Box: p-value NA 0.2390834 0.2866153 0.1829543 0.2796255 0.3884796 0.4767763 0.5148566 0.6012135 0.6199869 0.6856895 0.5181164 0.5599668 0.6216404 0.4771950 0.4132833 0.4733022 0.5213155 0.5119218 0.3827038
## Box-Pierce: X-squared 0.5890840 1.3534703 2.4300892 4.6900684 4.9033159 5.0494089 5.3401060 5.9719953 6.1542945 6.8477190 7.0774752 9.5479111 10.0016668 10.2078672 12.6634674 14.3420784 14.4920022 14.7738043 15.7455366 18.3419177
## Box-Pierce: p-value NA 0.2446724 0.2966968 0.1959504 0.2973625 0.4098801 0.5009887 0.5430231 0.6299533 0.6529705 0.7181111 0.5714513 0.6158144 0.6768544 0.5531768 0.4997593 0.5621108 0.6117704 0.6103011 0.4997182
tsdiag(A_mdl_1.resid, lags = 20, df = len(A_mdl_1.arparams) + len(A_mdl_1.maparams)) # len(A_mdl_1.params)
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
## Ljung-Box: X-squared 0.111312 0.124826 0.131003 4.539037 4.819075 6.472439 6.919087 7.798845 9.293600 13.483090 13.483101 13.483240 14.289143 17.494535 17.677863 18.312421 21.129353 23.182770 26.608928 28.872834
## Ljung-Box: p-value NaN 0.723858 0.936598 0.208835 0.306370 0.262922 0.328395 0.350665 0.318137 0.141937 0.197901 0.262917 0.282627 0.177671 0.221849 0.246597 0.173592 0.143395 0.086628 0.068012
## Box-Pierce: X-squared 0.109115 0.122273 0.128247 4.362280 4.629421 6.195766 6.615968 7.437848 8.824429 12.683170 12.683180 12.683306 13.409679 16.277662 16.440486 16.999899 19.464714 21.247945 24.200752 26.136987
## Box-Pierce: p-value NaN 0.726582 0.937889 0.224908 0.327476 0.287633 0.357826 0.384756 0.357323 0.177474 0.241929 0.314527 0.339980 0.234461 0.287221 0.318870 0.245302 0.215396 0.148560 0.126417
Considerations
Note that the stats::tsdiag()
does not take into account the reduction in degrees of freedom, which come from using the residuals of an estimated model:
More similar issues can be found here. To account for this, as well as keep the \(Ljung-Box\) test p-value plots, we can use the astsa
package:
tmp <- astsa::sarima(A_data, p = forecast::arimaorder(A_mdl_1)["p"],
d = 0, q = forecast::arimaorder(A_mdl_1)["q"], details = TRUE, no.constant = FALSE)
## initial value -0.674107
## iter 2 value -0.736743
## iter 3 value -0.746494
## iter 4 value -0.746563
## iter 5 value -0.746564
## iter 6 value -0.746564
## iter 7 value -0.746564
## iter 8 value -0.746564
## iter 8 value -0.746564
## iter 8 value -0.746564
## final value -0.746564
## converged
## initial value -0.745731
## iter 2 value -0.745741
## iter 3 value -0.745742
## iter 3 value -0.745742
## iter 3 value -0.745742
## final value -0.745742
## converged
Although, it requires re-estimation of the model.
Alternatively, we have specified our own version of tsdiag()
, which we will be using.
In Pythohn
, for the first 16
lags we have that \(p-value > 0.05\), which means we have no grounds to reject the null hypothesis of no correlation. For larger lag values, \(p-value\) decreases and is very close to \(0.05\), though it occurs only at lag = 20
.
Looking at the automatically-selected order model residuals we see similar results:
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
## Ljung-Box: X-squared 0.01572267 0.4110617 0.4820004 1.6843053 1.7666553 1.8495815 2.1803248 2.8017476 2.8269930 3.1224093 3.2555664 6.4365674 7.1547242 7.7270910 10.8373434 13.1606312 13.1936669 13.2041922 14.2118247 17.6055449
## Ljung-Box: p-value NaN NA 0.4875176 0.4307822 0.6222179 0.7633997 0.8236728 0.8332865 0.9005322 0.9264423 0.9532950 0.7773489 0.7864187 0.8060772 0.6244418 0.5139162 0.5873421 0.6577734 0.6520551 0.4819082
## Box-Pierce: X-squared 0.01541235 0.4003477 0.4689529 1.6237984 1.7023560 1.7809176 2.0920774 2.6726172 2.6960356 2.9681296 3.0898983 5.9779123 6.6251984 7.1373160 9.8997112 11.9478729 11.9767791 11.9859195 12.8543396 15.7568634
## Box-Pierce: p-value NaN NA 0.4934693 0.4440140 0.6364101 0.7759717 0.8362633 0.8486709 0.9116268 0.9363416 0.9606238 0.8171155 0.8285731 0.8484072 0.7021141 0.6104891 0.6807859 0.7449485 0.7458812 0.6095114
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
## Ljung-Box: X-squared 0.002332 0.087243 0.245451 0.449782 1.170380 1.545317 1.711803 1.794964 1.880164 2.396379 2.525659 2.893903 3.110470 3.189565 3.518778 4.076212 4.891414 4.918303 5.338921 6.441716
## Ljung-Box: p-value NaN NaN NaN NaN NaN NaN 0.190752 0.407595 0.597648 0.663282 0.772626 0.822041 0.874598 0.921904 0.940145 0.943843 0.936311 0.960652 0.966955 0.954101
## Box-Pierce: X-squared 0.002286 0.084962 0.237966 0.434232 1.121644 1.476848 1.633476 1.711166 1.790200 2.265661 2.383884 2.718211 2.913406 2.984175 3.276568 3.767990 4.481292 4.504644 4.867150 5.810329
## Box-Pierce: p-value NaN NaN NaN NaN NaN NaN 0.201223 0.425035 0.617070 0.687028 0.793872 0.843292 0.892896 0.935347 0.952314 0.957207 0.953671 0.972517 0.978002 0.971048
Considerations
In comparison, the original data had significant autocorrelation from the very first lag
(and further on):
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
## Ljung-Box: X-squared 1.401034e+01 1.477750e+01 14.787220416 15.647933460 15.694417642 15.70020082 16.21794556 17.04776059 17.08994640 17.37515405 18.08502158 20.69821210 20.70501565 20.9475634 21.9249994 22.8750211 23.5331092 23.5335174 25.9231975 31.27670053
## Ljung-Box: p-value 1.818077e-04 6.181695e-04 0.002007815 0.003529867 0.007772904 0.01545684 0.02319808 0.02961558 0.04732592 0.06646412 0.07962866 0.05497819 0.07897171 0.1029838 0.1097856 0.1171166 0.1326861 0.1709121 0.1323433 0.05164192
## Box-Pierce: X-squared 1.373382e+01 1.448079e+01 14.490192375 15.316929904 15.361273367 15.36675217 15.85384096 16.62906290 16.66819580 16.93088705 17.58004223 19.95254415 19.95867629 20.1756927 21.0438102 21.8813293 22.4571564 22.4575109 24.5170378 29.09569174
## Box-Pierce: p-value 2.106269e-04 7.170294e-04 0.002308455 0.004087068 0.008925224 0.01758820 0.02648576 0.03421193 0.05417273 0.07590733 0.09185068 0.06798901 0.09624546 0.1246988 0.1354354 0.1470647 0.1677709 0.2123048 0.1770609 0.08589220
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
## Ljung-Box: X-squared 23.728816 23.732151 24.554599 28.436812 29.235744 31.091103 31.769012 31.798655 31.798666 34.321337 35.070795 35.215160 35.244828 36.900743 37.345752 39.152065 41.268670 41.268753 41.696871 41.894049
## Ljung-Box: p-value 0.000001 0.000007 0.000019 0.000010 0.000021 0.000024 0.000045 0.000101 0.000216 0.000163 0.000241 0.000433 0.000776 0.000764 0.001126 0.001034 0.000854 0.001398 0.001947 0.002855
## Box-Pierce: X-squared 23.260484 23.263731 24.059125 27.788093 28.550232 30.307940 30.945710 30.973403 30.973413 33.296926 33.982285 34.113353 34.140094 35.621703 36.016941 37.609348 39.461378 39.461449 39.830419 39.999058
## Box-Pierce: p-value 0.000001 0.000009 0.000024 0.000014 0.000028 0.000034 0.000064 0.000142 0.000299 0.000243 0.000364 0.000647 0.001146 0.001188 0.001758 0.001720 0.001541 0.002470 0.003444 0.004997
- \((B)\) \(Y_t = 1 + \epsilon_t + 1.3 \epsilon_{t-1} - 0.4 \epsilon_{t-2}\)
The residuals of our manually specified model (with the true underlying lag order):
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
## Ljung-Box: X-squared 0.1123942 0.6441230 0.8498436 1.6158201 1.7093052 1.8330825 2.2325655 2.6810754 2.6948382 2.9021889 2.9542813 6.2092355 7.1384382 7.8094386 10.8527148 13.4033257 13.4856532 13.4865008 14.5921882 18.2662862
## Ljung-Box: p-value NaN NA 0.3565966 0.4457888 0.6348670 0.7664245 0.8161184 0.8476775 0.9117255 0.9403448 0.9660800 0.7973884 0.7877642 0.7998392 0.6231547 0.4950400 0.5648402 0.6369092 0.6248297 0.4382444
## Box-Pierce: X-squared 0.1101759 0.6279118 0.8268653 1.5626059 1.6517857 1.7690485 2.1448778 2.5638806 2.5766474 2.7676282 2.8152653 5.7704212 6.6079263 7.2082950 9.9112048 12.1597696 12.2318062 12.2325423 13.1854703 16.3277909
## Box-Pierce: p-value NaN NA 0.3631805 0.4578091 0.6477056 0.7781400 0.8287570 0.8612510 0.9212157 0.9480832 0.9711708 0.8341685 0.8298881 0.8435470 0.7011789 0.5934737 0.6614051 0.7278228 0.7236841 0.5696824
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
## Ljung-Box: X-squared 0.047104 0.590007 0.622402 4.638975 4.662342 5.887556 6.633527 7.703376 9.089350 13.733231 13.753438 13.792274 14.380635 17.884187 17.982541 18.651011 21.447878 23.417510 27.109541 29.309129
## Ljung-Box: p-value NaN NaN 0.430156 0.098324 0.198259 0.207704 0.249353 0.260650 0.246305 0.088989 0.131370 0.182679 0.212642 0.119253 0.158184 0.178717 0.123119 0.103021 0.056481 0.044730
## Box-Pierce: X-squared 0.046174 0.574790 0.606119 4.464144 4.486435 5.647164 6.348966 7.348429 8.634103 12.911362 12.929840 12.965100 13.495398 16.630156 16.717509 17.306818 19.754077 21.464547 24.646495 26.527721
## Box-Pierce: p-value NaN NaN 0.436253 0.107306 0.213503 0.227093 0.273728 0.289823 0.280004 0.114936 0.165807 0.225630 0.262182 0.164047 0.212542 0.240196 0.181571 0.161337 0.102906 0.088290
The residuals from the automated procedure:
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
## Ljung-Box: X-squared 0.03282845 0.6722802 0.8143308 0.8278391 1.3408265 1.4466935 1.4749914 2.5459315 2.6915384 3.4039713 3.8030374 6.1417021 6.2763201 6.2880583 8.1509414 9.8296086 10.0302293 10.3042081 11.3544104 15.5378610
## Ljung-Box: p-value NaN NA 0.3668429 0.6610541 0.7194620 0.8360402 0.9159376 0.8632922 0.9119972 0.9065133 0.9238912 0.8032208 0.8543058 0.9008704 0.8336295 0.7745319 0.8178341 0.8502663 0.8375581 0.6247652
## Box-Pierce: X-squared 0.03218052 0.6548046 0.7921824 0.8051575 1.2945205 1.3948156 1.4214379 2.4219215 2.5569910 3.2131792 3.5781147 5.7013761 5.8227093 5.8332120 7.4877463 8.9676239 9.1431671 9.3810961 10.2862046 13.8641558
## Box-Pierce: p-value NaN NA 0.3734410 0.6685937 0.7304346 0.8450956 0.9219539 0.8771015 0.9227448 0.9202757 0.9369269 0.8396973 0.8849277 0.9242550 0.8752999 0.8331190 0.8699134 0.8968934 0.8911812 0.7379028
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
## Ljung-Box: X-squared 0.013248 0.326687 1.008425 1.474426 3.290342 3.542407 3.574536 4.113290 4.913923 5.223662 5.248268 6.297730 6.864674 6.986136 7.172068 8.321590 8.460358 8.746002 9.028129 10.331547
## Ljung-Box: p-value NaN NaN NaN NaN NaN NaN 0.058672 0.127882 0.178210 0.265109 0.386339 0.390678 0.443105 0.538129 0.619210 0.597456 0.671568 0.724459 0.770816 0.737564
## Box-Pierce: X-squared 0.012986 0.318177 0.977489 1.425096 3.157384 3.396183 3.426409 3.929719 4.672411 4.957697 4.980198 5.932999 6.443995 6.552672 6.717809 7.731203 7.852625 8.100684 8.343833 9.458598
## Box-Pierce: p-value NaN NaN NaN NaN NaN NaN 0.064161 0.140176 0.197418 0.291666 0.418302 0.430737 0.488964 0.585571 0.666473 0.655074 0.726442 0.777214 0.820501 0.800599
Both model residuals indicate similar results in terms of adequacy - the residuals do not exhibit any autocorrelation for lag values up to 20
in Python
.
- \((C)\) \(Y_t = 1 + 1.1 Y_{t-1} + \epsilon_t\)
The residuals of our manually specified model (with the true underlying lag order):
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
## Ljung-Box: X-squared 124.4212 227.6557 313.2688 384.2277 443.0014 491.6448 531.8679 565.0938 592.5071 615.0934 633.6730 648.9287 661.4285 671.6452 679.9722 686.7367 692.2109 696.6211 700.1558 702.9713
## Ljung-Box: p-value NA 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
## Box-Pierce: X-squared 121.9656 222.4833 305.2802 373.4380 429.5051 475.5883 513.4298 544.4698 569.8992 590.7024 607.6929 621.5435 632.8098 641.9511 649.3468 655.3102 660.1001 663.9300 666.9763 669.3843
## Box-Pierce: p-value NA 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
## Ljung-Box: X-squared 97.187614 1.778538e+02 2.447787e+02 3.002756e+02 3.462693e+02 3.843617e+02 4.158857e+02 4.419505e+02 4.634793e+02 4.812403e+02 4.958727e+02 5.079086e+02 5.177905e+02 5.258869e+02 5.325043e+02 5.378975e+02 5.422788e+02 5.458245e+02 5.486813e+02 5.509712e+02
## Ljung-Box: p-value NaN 1.425706e-40 7.030240e-54 8.671976e-65 1.120608e-73 6.953902e-81 1.073011e-86 2.376093e-91 4.777773e-95 5.961432e-98 3.363506e-100 6.596815e-102 3.614881e-103 4.616077e-104 1.181550e-104 5.349797e-105 3.862663e-105 4.080848e-105 5.874505e-105 1.086193e-104
## Box-Pierce: X-squared 95.269437 1.738129e+02 2.385363e+02 2.918425e+02 3.357181e+02 3.718056e+02 4.014630e+02 4.258131e+02 4.457839e+02 4.621427e+02 4.755236e+02 4.864509e+02 4.953577e+02 5.026018e+02 5.084791e+02 5.132337e+02 5.170673e+02 5.201464e+02 5.226085e+02 5.245670e+02
## Box-Pierce: p-value NaN 1.087556e-39 1.594046e-52 5.796974e-63 2.124550e-71 3.525582e-78 1.354997e-83 6.916186e-88 2.959725e-91 7.262100e-94 7.465383e-96 2.481930e-97 2.155433e-98 4.100348e-99 1.477844e-99 8.954584e-100 8.266683e-100 1.072310e-99 1.828514e-99 3.880704e-99
The residuals clearly still exhibit autocorrelation - as indicated by the \(ACF\), \(PACF\) and \(Ljung-Box\) tests (p-value < 0.05
).
The automatically selected model exhibits the same problems of autocorrelated residuals:
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
## Ljung-Box: X-squared 126.1612 230.8302 317.6241 389.5528 449.1213 498.4139 539.1657 572.8204 600.5798 623.4438 642.2449 657.6755 670.3123 680.6347 689.0419 695.8659 701.3830 705.8228 709.3763 712.2023
## Ljung-Box: p-value 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
## Box-Pierce: X-squared 123.6711 225.5857 309.5246 378.6140 435.4392 482.1375 520.4763 551.9169 577.6674 598.7264 615.9194 629.9288 641.3185 650.5544 658.0213 664.0372 668.8646 672.7202 675.7828 678.1998
## Box-Pierce: p-value 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
## Ljung-Box: X-squared 29.404663 62.785773 79.709713 97.148617 108.045406 118.831454 1.261338e+02 1.332248e+02 1.382142e+02 1.430152e+02 1.464634e+02 1.497702e+02 1.521715e+02 1.544752e+02 1.561570e+02 1.577753e+02 1.589584e+02 1.601028e+02 1.609381e+02 1.617517e+02
## Ljung-Box: p-value NaN NaN NaN NaN NaN NaN 2.874316e-29 1.176553e-29 9.172601e-30 6.382885e-30 7.553504e-30 8.653132e-30 1.420255e-29 2.282704e-29 4.664508e-29 9.325086e-29 2.187137e-28 5.005214e-28 1.272958e-27 3.152625e-27
## Box-Pierce: X-squared 28.824307 61.326967 77.694199 94.444726 104.839688 115.058050 1.219280e+02 1.285525e+02 1.331808e+02 1.376029e+02 1.407561e+02 1.437583e+02 1.459227e+02 1.479838e+02 1.494776e+02 1.509042e+02 1.519395e+02 1.529332e+02 1.536531e+02 1.543490e+02
## Box-Pierce: p-value NaN NaN NaN NaN NaN NaN 2.393636e-28 1.216708e-28 1.115669e-28 9.199940e-29 1.235777e-28 1.612636e-28 2.913240e-28 5.162295e-28 1.131528e-27 2.428661e-27 5.983291e-27 1.439147e-26 3.780466e-26 9.676194e-26
- \((D)\) \(Y_t = 1 + 1.1 Y_{t-1} - 0.2 Y_{t-2} + \epsilon_t\)
The residuals of our manually specified model (with the true underlying lag order):
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
## Ljung-Box: X-squared 0.01139571 0.1081637 1.2117477 2.8405120 3.1638785 3.2318556 3.6763186 4.4847515 4.5588240 5.2242027 5.4799315 8.0742502 8.4542493 8.6588240 11.0516984 12.8440203 13.0673895 13.1704589 14.4733255 17.8223203
## Ljung-Box: p-value NaN NA 0.2709863 0.2416521 0.3670419 0.5198011 0.5968920 0.6113742 0.7136228 0.7333663 0.7906273 0.6215843 0.6721271 0.7317620 0.6064866 0.5388408 0.5970927 0.6602559 0.6333598 0.4674142
## Box-Pierce: X-squared 0.01117080 0.1053922 1.1726742 2.7371452 3.0456198 3.1100191 3.5281652 4.2834118 4.3521237 4.9649725 5.1988298 7.5541981 7.8966973 8.0797378 10.2049881 11.7850613 11.9805094 12.0700171 13.1928823 16.0571542
## Box-Pierce: p-value NaN NA 0.2788524 0.2544699 0.3846449 0.5395857 0.6191302 0.6383827 0.7384430 0.7613133 0.8166427 0.6722918 0.7225266 0.7788685 0.6770920 0.6235593 0.6805038 0.7391460 0.7231810 0.5885584
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
## Ljung-Box: X-squared 0.156269 0.643704 0.659390 3.471816 4.540305 5.178952 5.180896 7.187222 7.908971 11.238173 11.325804 11.331107 11.685370 13.430618 13.801404 14.378648 17.261784 18.323109 21.183329 23.395932
## Ljung-Box: p-value NaN NaN 0.416775 0.176240 0.208723 0.269424 0.394205 0.303880 0.340689 0.188566 0.254036 0.332310 0.387752 0.338533 0.387977 0.421899 0.303457 0.305319 0.218197 0.175830
## Box-Pierce: X-squared 0.153185 0.627793 0.642963 3.344372 4.363654 4.968688 4.970517 6.844847 7.514365 10.580735 10.660872 10.665686 10.984989 12.546527 12.875843 13.384729 15.907473 16.829151 19.294208 21.186566
## Box-Pierce: p-value NaN NaN 0.422640 0.187836 0.224779 0.290525 0.419489 0.335434 0.377354 0.226605 0.299672 0.384150 0.444522 0.402843 0.457447 0.496478 0.388228 0.396735 0.311909 0.270105
The residuals from the automated procedure:
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
## Ljung-Box: X-squared 0.1658588 0.5270103 0.5397261 0.9390750 1.0245543 1.0641166 1.0970349 2.3821178 2.5312818 3.1678999 3.6720324 5.8889019 6.0394658 6.1248719 8.0431473 9.5391665 9.7066792 9.7068043 10.9558239 14.3053791
## Ljung-Box: p-value NaN NaN NA 0.3325158 0.5991297 0.7857428 0.8947424 0.7941341 0.8649504 0.8690478 0.8854360 0.7509839 0.8119367 0.8649038 0.7817495 0.7311139 0.7832988 0.8377662 0.8122038 0.6453798
## Box-Pierce: X-squared 0.1625852 0.5142328 0.5265303 0.9101154 0.9916582 1.0291383 1.0601075 2.2606454 2.3990147 2.9853735 3.4463893 5.4590735 5.5947791 5.6711951 7.3749265 8.6937856 8.8403592 8.8404678 9.9169255 12.7816766
## Box-Pierce: p-value NaN NaN NA 0.3400838 0.6090657 0.7942018 0.9005488 0.8120293 0.8795939 0.8863520 0.9033103 0.7925958 0.8480824 0.8943779 0.8318754 0.7956572 0.8411391 0.8857100 0.8709305 0.7506743
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
## Ljung-Box: X-squared 0.108974 0.504135 1.027775 1.028427 1.054902 1.099799 1.116354 1.395432 1.691489 1.994225 2.707352 2.967250 3.321214 3.364098 3.528036 3.658165 5.191096 5.192165 5.548731 6.473726
## Ljung-Box: p-value NaN NaN NaN NaN NaN NaN 0.290705 0.497721 0.638828 0.736821 0.744994 0.812946 0.853784 0.909477 0.939649 0.961453 0.921560 0.951250 0.961048 0.953104
## Box-Pierce: X-squared 0.106823 0.491585 0.998001 0.998626 1.023883 1.066417 1.081991 1.342709 1.617341 1.896176 2.548312 2.784272 3.103306 3.141676 3.287279 3.401997 4.743312 4.744240 5.051544 5.842658
## Box-Pierce: p-value NaN NaN NaN NaN NaN NaN 0.298253 0.511016 0.655464 0.754847 0.769202 0.835395 0.875284 0.925154 0.951809 0.970322 0.942958 0.965980 0.974041 0.970303
We see that the \(p-value\) of the Ljung-Box statistic is greater than 0.05
for the first 20
lags - we have no grounds to reject the null hypothesis of no autocorrelation in the residuals. So, both the specified underlying model and the automatically selected model is adequate.
Considerations
Note the histogram and Q-Q
plot - since the first value of the series is quite different from the rest - one value (and hence, one residual) appears to be an outlier:
As mentioned at the beginning, generally, when simulating data, a number of the first values generated in the series are treated as burn-in
observations - they are usually dropped from the beginning of the sample. For example, if we need to generate \(N\) observations, we generate \(N + m\) and drop the first \(m\) observations.
The more complex our model, the more burn-in
observations we may need.
- \((E)\) \(Y_t = 1.1 Y_{t-1} - 0.2 Y_{t-2} + \epsilon_t + 1.3 \epsilon_{t-1}\)
The residuals of our manually specified model (with the true underlying lag order):
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
## Ljung-Box: X-squared 0.02072419 0.9847243 1.18818 1.2516771 1.4897355 1.5725842 1.5853641 2.8771704 2.9985536 3.5226249 3.9087507 6.1812945 6.4085660 6.5279449 8.4734930 10.1097035 10.2524077 10.2551564 11.3053080 15.0080867
## Ljung-Box: p-value NaN NaN NA 0.2632324 0.4747971 0.6656220 0.8114202 0.7189156 0.8090283 0.8328233 0.8652610 0.7216454 0.7798493 0.8359231 0.7471209 0.6849399 0.7435065 0.8033903 0.7902597 0.5948991
## Box-Pierce: X-squared 0.02031516 0.9589469 1.15571 1.2167006 1.4437957 1.5222840 1.5343071 2.7411262 2.8537251 3.3364224 3.6895243 5.7527548 5.9575982 6.0644109 7.7923648 9.2348135 9.3596797 9.3620667 10.2671316 13.4339818
## Box-Pierce: p-value NaN NaN NA 0.2700091 0.4858293 0.6771368 0.8205475 0.7398215 0.8269693 0.8522356 0.8839935 0.7643859 0.8188130 0.8690321 0.8011386 0.7549869 0.8073425 0.8578382 0.8523106 0.7066839
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
## Ljung-Box: X-squared 1.422860 2.118465 2.223336 5.791265 6.984317 7.318775 7.382975 9.448726 9.555712 11.510251 11.517790 11.785960 11.786103 12.706695 13.392113 13.801157 15.281182 15.667069 17.945506 20.219814
## Ljung-Box: p-value NaN NaN NaN 0.016106 0.030435 0.062402 0.116982 0.092450 0.144652 0.117859 0.174050 0.225647 0.299627 0.312927 0.341196 0.387995 0.359208 0.404517 0.327098 0.263141
## Box-Pierce: X-squared 1.394777 2.072077 2.173498 5.600588 6.738697 7.055551 7.115950 9.045797 9.145040 10.945273 10.952168 11.195638 11.195766 12.019455 12.628214 12.988818 14.283840 14.618952 16.582605 18.527737
## Box-Pierce: p-value NaN NaN NaN 0.017954 0.034412 0.070148 0.129886 0.107251 0.165584 0.141024 0.204424 0.262537 0.342471 0.362183 0.396632 0.448676 0.428786 0.479197 0.413090 0.356303
The residuals from the automated procedure:
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
## Ljung-Box: X-squared 0.002864305 1.029821 1.1810809 1.2313281 1.4151872 1.5059632 1.5232033 2.8139283 2.9374130 3.4749329 3.8089809 6.1134070 6.3969261 6.5615942 8.6324907 10.3756389 10.4939449 10.4963693 11.4680730 15.1584310
## Ljung-Box: p-value NaN NA 0.2771361 0.5402820 0.7019787 0.8255846 0.9103767 0.8318122 0.8907310 0.9011264 0.9235290 0.8056469 0.8456120 0.8851752 0.8000989 0.7342367 0.7875740 0.8394545 0.8312233 0.6510614
## Box-Pierce: X-squared 0.002807773 1.002739 1.1490236 1.1972873 1.3726792 1.4586775 1.4748968 2.6807057 2.7952540 3.2903381 3.5958162 5.6879926 5.9435328 6.0908674 7.9301505 9.4668732 9.5703910 9.5724964 10.4099515 13.5661788
## Box-Pierce: p-value NaN NA 0.2837536 0.5495565 0.7119512 0.8339352 0.9159484 0.8477209 0.9032749 0.9148388 0.9359488 0.8407603 0.8771052 0.9114323 0.8481209 0.8000312 0.8458587 0.8879803 0.8855452 0.7569020
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
## Ljung-Box: X-squared 1.422860 2.118465 2.223336 5.791265 6.984317 7.318775 7.382975 9.448726 9.555712 11.510251 11.517790 11.785960 11.786103 12.706695 13.392113 13.801157 15.281182 15.667069 17.945506 20.219814
## Ljung-Box: p-value NaN NaN NaN 0.016106 0.030435 0.062402 0.116982 0.092450 0.144652 0.117859 0.174050 0.225647 0.299627 0.312927 0.341196 0.387995 0.359208 0.404517 0.327098 0.263141
## Box-Pierce: X-squared 1.394777 2.072077 2.173498 5.600588 6.738697 7.055551 7.115950 9.045797 9.145040 10.945273 10.952168 11.195638 11.195766 12.019455 12.628214 12.988818 14.283840 14.618952 16.582605 18.527737
## Box-Pierce: p-value NaN NaN NaN 0.017954 0.034412 0.070148 0.129886 0.107251 0.165584 0.141024 0.204424 0.262537 0.342471 0.362183 0.396632 0.448676 0.428786 0.479197 0.413090 0.356303
For the automatically selected model, the Box-Pierce test statistic of the first 20
lags is greater than 0.05
, indicating no autocorrelation in the residuals for R
. On the other hand, in Python
we see significant residual autocorrelation, both in the manually specified (true underlying) model, as well as in the automatically selected model for the first couple lags. This highlights the case, where even knowing the true model does not help in capturing all of the autocorrelation in the data sample.
Considerations
How would our Ljung-Box
and Box-Pierce
statistics look like, if we were to ignore the reduction in degrees of freedom from estimating the parameters?
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
## Ljung-Box: X-squared 0.002864305 1.0298206 1.1810809 1.2313281 1.4151872 1.5059632 1.5232033 2.8139283 2.9374130 3.4749329 3.8089809 6.1134070 6.3969261 6.5615942 8.6324907 10.3756389 10.4939449 10.4963693 11.4680730 15.1584310
## Ljung-Box: p-value 0.957318232 0.5975542 0.7575454 0.8729142 0.9226448 0.9590975 0.9814955 0.9454862 0.9667246 0.9679427 0.9751752 0.9102576 0.9305072 0.9502948 0.8959364 0.8462894 0.8816309 0.9144949 0.9070881 0.7672715
## Box-Pierce: X-squared 0.002807773 1.0027389 1.1490236 1.1972873 1.3726792 1.4586775 1.4748968 2.6807057 2.7952540 3.2903381 3.5958162 5.6879926 5.9435328 6.0908674 7.9301505 9.4668732 9.5703910 9.5724964 10.4099515 13.5661788
## Box-Pierce: p-value 0.957741134 0.6057006 0.7652547 0.8785450 0.9272784 0.9621921 0.9831660 0.9527715 0.9718622 0.9737426 0.9802867 0.9309863 0.9481839 0.9641489 0.9265525 0.8929493 0.9206532 0.9449705 0.9420980 0.8517933
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
## Ljung-Box: X-squared 1.422860 2.118465 2.223336 5.791265 6.984317 7.318775 7.382975 9.448726 9.555712 11.510251 11.517790 11.785960 11.786103 12.706695 13.392113 13.801157 15.281182 15.667069 17.945506 20.219814
## Ljung-Box: p-value 0.232933 0.346722 0.527367 0.215288 0.221809 0.292370 0.390122 0.305866 0.387634 0.319169 0.400957 0.463019 0.545257 0.549738 0.572039 0.613524 0.575236 0.615770 0.526081 0.444257
## Box-Pierce: X-squared 1.394777 2.072077 2.173498 5.600588 6.738697 7.055551 7.115950 9.045797 9.145040 10.945273 10.952168 11.195638 11.195766 12.019455 12.628214 12.988818 14.283840 14.618952 16.582605 18.527737
## Box-Pierce: p-value 0.237600 0.354858 0.537187 0.231028 0.240810 0.315740 0.416908 0.338447 0.423995 0.361798 0.447279 0.512231 0.594422 0.604740 0.630990 0.673575 0.646918 0.687943 0.618122 0.552686
We see that the \(p\)-values are higher. In the case when the \(p\)-values were close to, or less than, 0.05 - we see that ignoring the fact that we are examining residuals from an estimated model would result in, in Python
s case, of having no grounds to reject the null hypothesis of not autocorrelation!
In practice, this may also be the case for larger lag values as well. Consequently, we should always be mindful of how estimation of unknown parameters affects our test statistics (e.g. is it only the degrees of freedom, or maybe even the asymptotic distribution of the test statistic?)
2.3.4 Exercise Set 4: Model Forecasts
2.3.4.1 Task 11
We will compare the AIC and BIC values to select the best model.
The model with the lower AIC/BIC value is the preferred one. BIC penalizes model complexity more heavily.
- \((A)\) \(Y_t = 1 + (1 + 0.5L)\epsilon_t\)
The automatically specified model is better in terms of it its lower AIC. On the other hand, the \(BIC\) is lower of our manually specified model.
In this case, we should examine the residuals and maybe even forecasts in order to determine, which model is more appropriate.
However, since we automatically selected the model order in terms of lower \(AIC\) values and since the task requires to compare in terms of \(AIC\), we will opt to use the auto specified model.
- \((B)\) \(Y_t = 1 + (1 + 1.3L - 0.4 L^2)\epsilon_t\)
\(AIC\) and \(BIC\) indicate that the automatically specified model is better in R
, while in Python
the automatically specified model is better in terms of \(AIC\).
- \((C)\) \((1 - 1.1L) Y_t = 1 + \epsilon_t\)
The manually specified model is better in R
since its \(AIC\) and \(BIC\) are lower. On the other hand, the automatically specified model is better in Python
.
- \((D)\) \((1 - 1.1L + 0.2L^2) Y_t = 1 + \epsilon_t\)
Automatically specified model is better in R
because its \(AIC\) and \(BIC\) are lower. In Python
, the automatically specified model is better in terms of a lower \(AIC\) value.
- \((E)\) \((1 - 1.1L + 0.2L^2) Y_t = (1 + 1.3L)\epsilon_t\)
In R
, the automatically specified model is better in terms of \(AIC\) and \(BIC\). In Python
, the manually and automatically specified models are the same, so the results are equal.
In most cases, both the \(AIC\) suggest that the automatically specified model is better, except for \((A)\), where \(BIC\) was lower for the manually specified model, whereas the \(AIC\) was lower for the automatically specified model.
As we can see, we do not always get a clear-cut result of which model is better, especially if we have more than one criteria for model selection.
Since in practice we would not know the true lag order, the automatic model selection produced acceptable models.
Hence, we will analyse forecasts of the automatically specified models.
2.3.4.2 Task 12
From the results in the previous task - the automatically specified models have lower \(AIC\) and \(BIC\) values in most cases. Furthermore, from the residual diagnostics in tasks, the residuals of the automatically specified models are just as good (in terms of the significance of autocorrelation) as the residuals of the models with the true lag order specification.
- \((A)\) \(Y_t = 1 + (1 + 0.5L)\epsilon_t\)
We will show how to get the predictions as values in this example. We will generally focus on the plots, so this is provided only as an example for this one instance.
The last fitted historical data point:
## Time Series:
## Start = 150
## End = 150
## Frequency = 1
## [1] 0.9071897
The forecasted values:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 151 0.7609154 0.1545052 1.367326 -0.16650894 1.688340
## 152 1.0884837 0.4290083 1.747959 0.07990305 2.097064
## 153 0.9832625 0.3165811 1.649944 -0.03633867 2.002864
## 154 0.9832625 0.3165811 1.649944 -0.03633867 2.002864
## 155 0.9832625 0.3165811 1.649944 -0.03633867 2.002864
## 156 0.9832625 0.3165811 1.649944 -0.03633867 2.002864
## 157 0.9832625 0.3165811 1.649944 -0.03633867 2.002864
## 158 0.9832625 0.3165811 1.649944 -0.03633867 2.002864
## 159 0.9832625 0.3165811 1.649944 -0.03633867 2.002864
## 160 0.9832625 0.3165811 1.649944 -0.03633867 2.002864
## 161 0.9832625 0.3165811 1.649944 -0.03633867 2.002864
## 162 0.9832625 0.3165811 1.649944 -0.03633867 2.002864
## 163 0.9832625 0.3165811 1.649944 -0.03633867 2.002864
## 164 0.9832625 0.3165811 1.649944 -0.03633867 2.002864
## 165 0.9832625 0.3165811 1.649944 -0.03633867 2.002864
## 166 0.9832625 0.3165811 1.649944 -0.03633867 2.002864
## 167 0.9832625 0.3165811 1.649944 -0.03633867 2.002864
## 168 0.9832625 0.3165811 1.649944 -0.03633867 2.002864
## 169 0.9832625 0.3165811 1.649944 -0.03633867 2.002864
## 170 0.9832625 0.3165811 1.649944 -0.03633867 2.002864
## array([1.15882944, 0.76989887, 0.96162531, 1.15440624, 0.88305414,
## 1.12529146, 0.96205815, 0.94601715, 1.12928241, 0.86298451,
## 1.10334808, 0.99156129, 0.92755158, 1.13874023, 0.87045166,
## 1.08193056, 1.01661081, 0.90891344, 1.14307541, 0.88231217,
## 1.05906824])
Note that if we start the forecast from a historical datapoint (remember that our sample size is \(N\) and indexes in Python
start from 0
, so effectively, our last observation is at index N-1
), we actually get the fitted value.
To get only the forecasts, we can use the .forecast(...)
method:
## array([0.76989887, 0.96162531, 1.15440624, 0.88305414, 1.12529146,
## 0.96205815, 0.94601715, 1.12928241, 0.86298451, 1.10334808,
## 0.99156129, 0.92755158, 1.13874023, 0.87045166, 1.08193056,
## 1.01661081, 0.90891344, 1.14307541, 0.88231217, 1.05906824])
A reminder of the model coefficients for this model:
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ma1 0.427401 0.080232 5.3271 9.98e-08 ***
## ma2 -0.161204 0.086558 -1.8624 0.06255 .
## intercept 0.983262 0.048406 20.3127 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## ==============================================================================
## coef std err z P>|z| [0.025 0.975]
## ------------------------------------------------------------------------------
## const 1.0046 0.055 18.121 0.000 0.896 1.113
## ar.L1.y -1.0083 0.080 -12.614 0.000 -1.165 -0.852
## ar.L2.y -0.3666 0.113 -3.242 0.001 -0.588 -0.145
## ar.L3.y 0.1575 0.113 1.390 0.165 -0.065 0.379
## ar.L4.y -0.2547 0.080 -3.176 0.001 -0.412 -0.097
## ma.L1.y 1.6032 0.035 45.238 0.000 1.534 1.673
## ma.L2.y 0.9999 0.042 24.066 0.000 0.918 1.081
## ==============================================================================
We can also plot the fitted values and the forecasts along with their standard errors:
par(mfrow = c(2, 1))
plot(forecast::forecast(A_est_auto, h = 20), col = "orange", lwd = 2)
lines(A_est_auto$fitted, col = "blue", lwd = 2)
plot(forecast::forecast(A_est_auto, h = 20), col = "orange", lwd = 2,
xlim = c(tail(time(A_est_auto$fitted), 1)) + c(-20, 20))
lines(A_est_auto$fitted, col = "blue", lwd = 2)
legend("topright", c("Forecast", "Y"), lwd = 2, col = c("blue", "orange"), bg = "transparent")
We see that:
in
R
the initial forecast has some variation at \(T + 1\) and \(T + 2\), but for \(T + 3, ..., T+20\) the forecasts are equal to the process mean;in
Python
, there is quite a bit of variation in our forecasts, since our model includes multiple autoregressive and moving-average lag coefficients.\((B)\) \(Y_t = 1 + (1 + 1.3L - 0.4 L^2)\epsilon_t\)
par(mfrow = c(2, 1))
plot(forecast::forecast(B_est_auto, h = 20), col = "orange", lwd = 2)
lines(B_est_auto$fitted, col = "blue", lwd = 2)
plot(forecast::forecast(B_est_auto, h = 20), col = "orange", lwd = 2,
xlim = c(tail(time(B_est_auto$fitted), 1)) + c(-20, 20))
lines(B_est_auto$fitted, col = "blue", lwd = 2)
legend("topright", c("Forecast", "Y"), lwd = 2, col = c("blue", "orange"), bg = "transparent")
In
R
, the process there is some variation for the first couple of forecasts, but the later ones are close to the process mean (because the automatically specified model is not a pure \(\rm MA\) model).in
Python
, the results are similar as in \((A)\).\((C)\) \((1 - 1.1L) Y_t = 1 + \epsilon_t\)
par(mfrow = c(2, 1))
plot(forecast::forecast(C_est_auto, h = 20), col = "orange", lwd = 2)
lines(C_est_auto$fitted, col = "blue", lwd = 2)
plot(forecast::forecast(C_est_auto, h = 20), col = "orange", lwd = 2,
xlim = c(tail(time(C_est_auto$fitted), 1)) + c(-20, 20))
lines(C_est_auto$fitted, col = "blue", lwd = 2)
legend("topright", c("Forecast", "Y"), lwd = 2, col = c("blue", "orange"), bg = "transparent")
Graphically, the fitted values appear close to the true values for this non-stationary process in Python
. However, the forecast is declining, whereas the fitted values and the true values appear to grow exponentially. this does not appear to make any sense intuitively. Remember that we attempt to estimate a stationary model for the data, regardless whether this is true or not, so the forecast is quite poor. On the other hand, in R
, the best model was an \({\rm ARMA}(0, 0)\).
Considerations
Overall, these kinds of non-stationary processes need separate methods for dealing with non-stationarity and cannot simply be modelled by specifying an approximate stationary model on the initial data.
- \((D)\) \((1 - 1.1L + 0.2L^2) Y_t = 1 + \epsilon_t\)
par(mfrow = c(2, 1))
plot(forecast::forecast(D_est_auto, h = 20), col = "orange", lwd = 2)
lines(D_est_auto$fitted, col = "blue", lwd = 2)
plot(forecast::forecast(D_est_auto, h = 20), col = "orange", lwd = 2,
xlim = c(tail(time(D_est_auto$fitted), 1)) + c(-20, 20))
lines(D_est_auto$fitted, col = "blue", lwd = 2)
legend("topright", c("Forecast", "Y"), lwd = 2, col = c("blue", "orange"), bg = "transparent")
Fitted values appear to fit the data quite well, the forecasts appear to be approaching the mean of the process, but it may take longer than 20 periods in the future.
- \((E)\) \((1 - 1.1L + 0.2L^2) Y_t = (1 + 1.3L)\epsilon_t\)
par(mfrow = c(2, 1))
plot(forecast::forecast(E_est_auto, h = 20), col = "orange", lwd = 2)
lines(E_est_auto$fitted, col = "blue", lwd = 2)
plot(forecast::forecast(E_est_auto, h = 20), col = "orange", lwd = 2,
xlim = c(tail(time(E_est_auto$fitted), 1)) + c(-20, 20))
lines(E_est_auto$fitted, col = "blue", lwd = 2)
legend("topright", c("Forecast", "Y"), lwd = 2, col = c("blue", "orange"), bg = "transparent")
The model appears to fit the data quite nicely. The forecasts are similar to the ones in \((D)\). For Python
, we see that the forecasts are less volatile - this is because the automatically selected model has fewer lags, compared to \((D)\) (see TASK 6
).
To sum up:
\((A)\) and \((B)\) are moving average processes, which are governed by random shocks, hence the fitted values are quite varying, even though the automatic selection selected \(ARMA\) models, which have both \(AR\) and \(MA\) components.
If the true underlying process has a significant autoregressive part, the automatically selected models appear to fit the data quite well, even if it isn’t stationary.
On the other hand, the forecast of a non-stationary process does not appear to make sense, indicating overall a poor model for \((C)\).
Considerations
Stationarity and invertibility problems when estimating simulated or empirical data
As we have seen, in case of non-stationary or non-invertible processes, we may still get an estimated model, with stationary and invertible coefficients. The model may not fit the data well, but is an attempt of fitting a good model (in terms of stationarity and invertibility). However, sometimes estimation may result in an error, stating that the non-stationarity and/or noninvertibility does not allow to estimate a model.
To make matters worse, sometimes even if we know that a process is stationary and/or invertible, we may still get an error, stating otherwise.
To give an example of when an error may occur, generate the following \(\rm AR(1)\) process:
We can verify that this process is both stationary and invertible:
However, if we try to estimate this model:
we would get an error:
Sometimes we may even get this kind of error if our coefficients are relatively small - for example the following \(\rm ARMA\) model:
We can verify that this process is both stationary and invertible:
Then if we specify:
and attempt to estimate the model:
which is confusing.
We could also get a different kind of error - that the \(AR\) coefficients are not stationary:
np.random.seed(751)
N = 150
ARMA_spec_test = smt.arima_process.ArmaProcess(ar = np.array([1, 0.5]), ma = [1, 0.5])
ARMA_bad_example = ARMA_spec_test.generate_sample(N, scale = 0.5)
spec_2 = smt.arima_model.ARMA(ARMA_bad_example, order = (1, 1))
try:
mdl_2 = spec_2.fit(trend = 'nc')
except ValueError as err:
print('Handling run-time error:', err)
## Handling run-time error: The computed initial AR coefficients are not stationary
## You should induce stationarity, choose a different model order, or you can
## pass your own start_params.
which makes no sense.
To bypass this, we can specify the starting parameters, start_params
, when fitting:
## ==============================================================================
## coef std err z P>|z| [0.025 0.975]
## ------------------------------------------------------------------------------
## ar.L1.y -0.6235 0.165 -3.770 0.000 -0.948 -0.299
## ma.L1.y 0.4341 0.181 2.398 0.016 0.079 0.789
## ==============================================================================
If our specified model has a constant:
Then the starting parameters have to include the constant term as well:
## ==============================================================================
## coef std err z P>|z| [0.025 0.975]
## ------------------------------------------------------------------------------
## const 3.0258 0.035 86.311 0.000 2.957 3.094
## ar.L1.y -0.6221 0.166 -3.757 0.000 -0.947 -0.298
## ma.L1.y 0.4311 0.181 2.377 0.017 0.076 0.787
## ==============================================================================
We note that non-stationarity, even when the true underlying process is stationary, can be caused by:
- a small sample size;
- a random chance;
- combination of coefficients, which lead to lag function roots that are close to the unit circle;
- a specific lag order selection in the \(AR\), or \(MA\), may result in a non-stationary and/or non-invertible process. This may be remedied by attempting to specify a different order model.
- bugs in the econometric software (not as frequent);
Similarly, if the process is non-invertible, we may not always get the estimates. This, unfortunately, may also cause false error for invertible processes for similar reasons, as with non-stationarity:
We can verify the stationarity and invertibility as before:
We can calculate the roots:
And their absolute values:
To verify that they are outside the unit circle (i.e. their absolute values are > 1). We can also visualize them:
fig = plt.figure(figsize = (10, 8))
ax = fig.add_subplot(1, 1, 1)
circ = plt.Circle((0, 0), radius = 1, edgecolor = 'black', facecolor = 'None')
_ = ax.add_patch(circ)
_ = plt.scatter(x = MA_spec_test.maroots.real, y = MA_spec_test.maroots.imag, color = "r")
_ = plt.xlim(-max(np.abs(MA_spec_test.maroots)), max(np.abs(MA_spec_test.maroots)))
_ = plt.ylim(-max(np.abs(MA_spec_test.maroots)), max(np.abs(MA_spec_test.maroots)))
_ = plt.axhline(y = 0, linestyle = "--", color = "b")
_ = plt.axvline(x = 0, linestyle = "--", color = "b")
_ = plt.title("MA(1) roots")
plt.show()
However, if we try to estimate this model:
we would get an error:
Note that this time, the \(MA\) coefficient is non-invertible, even though the model specification and MA_spec_test.isinvertible
indicated that this process is invertible.
Similarly to before, we can specify the starting parameter values:
## ==============================================================================
## coef std err z P>|z| [0.025 0.975]
## ------------------------------------------------------------------------------
## ma.L1.y -1.5021 0.071 -21.023 0.000 -1.642 -1.362
## ma.L2.y 0.7023 0.063 11.119 0.000 0.579 0.826
## ==============================================================================
Careful!
If we want to avoid checking for stationarity and invertibility entirely, we might try to fit the model with transparams = False
:
## ==============================================================================
## coef std err z P>|z| [0.025 0.975]
## ------------------------------------------------------------------------------
## ma.L1.y -1.5021 0.071 -21.023 0.000 -1.642 -1.362
## ma.L2.y 0.7023 0.063 11.119 0.000 0.579 0.826
## ==============================================================================
NOTE: the results are entirely different! Furthermore, it may be possible that the final parameters are exactly the same as the ones in start_params
. Therefore, if is advised against using this option.