2.3 Example

Considerations

The following libraries will be useful for this exercise.

suppressPackageStartupMessages({
  suppressWarnings({
    suppressMessages({
      library(forecast)
      library(fma)
      library(astsa)
      library(polynom)
    })
  })
})      
#Import the required modules for vectors and matrix operations, data generation
import numpy as np
#Import the required modules for plot creation:
import matplotlib.pyplot as plt
#import the required modules for TimeSeries data generation:
import statsmodels.api as sm
#Import the required modules for test statistic calculation:
import statsmodels.stats as sm_stat
#Import the required modules for model estimation:
import statsmodels.tsa as smt
#Import the required modules for time series data types
import pandas as pd
# Set maximum column width of a pd.DataFrame to PREVENT a line break of wide tables
pd.set_option('display.max_columns', None)
# pd.set_option('display.max_colwidth', -1)
pd.set_option('display.expand_frame_repr', False)

Note that in this example, when using automatic order detection and parameter estimation, we may sometimes get HessianInversionWarning: Inverting hessian failed, no bse or cov_params available, or ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals. We will suppress them with:

import warnings
#
warnings.filterwarnings("ignore", category = sm.tools.sm_exceptions.HessianInversionWarning)
warnings.filterwarnings("ignore", category = sm.tools.sm_exceptions.ConvergenceWarning)

We will simulate the following time series processes

  • \((A)\) \(Y_t = 1 + (1 + 0.5L)\epsilon_t\)
  • \((B)\) \(Y_t = 1 + (1 + 1.3L - 0.4 L^2)\epsilon_t\)
  • \((C)\) \((1 - 1.1L) Y_t = 1 + \epsilon_t\)
  • \((D)\) \((1 - 1.1L + 0.2L^2) Y_t = 1 + \epsilon_t\)
  • \((E)\) \((1 - 1.1L + 0.2L^2) Y_t = (1 + 1.3L)\epsilon_t\)

We will assume that if \(t \leq 0\), then \(Y_t = \epsilon_t = 0\) and \(\epsilon_t \sim \mathcal{N}(0, 0.5^2)\), \(\forall t > 0\).

2.3.1 Exercise Set 1: Data Simulation and EDA

2.3.1.1 Task 1

We will begin by rewriting the processes in their expanded forms:

  • \((A)\) \(Y_t = 1 + \epsilon_t + 0.5 \epsilon_{t-1}\)
  • \((B)\) \(Y_t = 1 + \epsilon_t + 1.3 \epsilon_{t-1} - 0.4 \epsilon_{t-2}\)
  • \((C)\) \(Y_t = 1 + 1.1 Y_{t-1} + \epsilon_t\)
  • \((D)\) \(Y_t = 1 + 1.1 Y_{t-1} - 0.2 Y_{t-2} + \epsilon_t\)
  • \((E)\) \(Y_t = 1.1 Y_{t-1} - 0.2 Y_{t-2} + \epsilon_t + 1.3 \epsilon_{t-1}\)

From the above functional forms, we see that we have the following processes:

  • \((A)\) \(\rm MA(1)\)
  • \((B)\) \(\rm MA(2)\)
  • \((C)\) \(\rm AR(1)\)
  • \((D)\) \(\rm AR(2)\)
  • \((E)\) \(\rm ARMA(2, 1)\)

Considerations

To recap the theory:

  • Pure MA models are always stationary (since they contain no AR terms).
  • Pure MA models may or may not be invertible.
  • Pure AR models are always invertible (since they contain no MA terms).
  • Pure AR models may or may not be stationary.

This translates to checking the following:

  • If all the roots of \(\Phi(z) = 0\) are outside the unit circle, then the \(AR\) process is stationary;
  • If all the roots of \(\Theta(x) = 0\) are outside the unit circle, then the \(MA\) process is invertible;

We begin by checking the stationarity of \((C)\), \((D)\) and \((E)\).

  • \((C)\) \(1 - 1.1 z = 0 \Rightarrow z = 1/1.1 < 1\), so the process in \((C)\) is not stationary;
  • \((D)\) \(1 - 1.1 z + 0.2 z^2 = 0 \Rightarrow \mathcal{D} = b^2 - 4\cdot a \cdot c = (-1.1)^2 - 4 \cdot 1 \cdot 0.2 = 0.41\), \(z_{1,2} = \dfrac{-b \pm \sqrt{\mathcal{D}}}{2 \cdot a}\), which gives \(z_1 = 1.149\) and \(z_2 = 4.351\), so the process is stationary;
  • \((E)\) \(1 - 1.1 z + 0.2 z^2 = 0\) - the same as in \((D)\), so the process is stationary.

We can do this in R using polyroot.

From the documentation - polyroot(c(1, 2, 3)) represents 1 + 2x + 3x^2, so:

We can also do it via Python using poly1d.

From the documentation - an example of poly1d([1, 2, 3]) represents \(x^2 + 2x + 3\), so:

print(paste0("(C) ", polynom::polynomial(coef = c(1, -1.1))))
## [1] "(C) 1 - 1.1*x"
print("(C)" + str(np.poly1d([-1.1, 1], variable = 'z')))
## (C) 
## -1.1 z + 1
sprintf("Roots: %s", polyroot(c(1, -1.1)))
## [1] "Roots: 0.909090909090909+0i"
print("Roots: " + str(np.poly1d([-1.1, 1], variable = 'z').roots))
## Roots: [0.90909091]

Considerations

We can also plot the unit circle with the roots:

root_1 <- polyroot(c(1, -1.1))
#
plot(root_1, col = 2, pch = 16, 
     xlim = c(-2, 2), 
     ylim = c(-2, 2), 
     main = "AR(1)")
#
#
symbols(0, 0, circles = 1, 
        add = T, inches = F)
#
abline(h = 0, col = "blue", lty = 2)
abline(v = 0, col = "blue", lty = 2)

roots_1 = np.poly1d([-1.1, 1], variable = 'z').roots
#
fig = plt.figure(figsize = (10, 8))
ax = fig.add_subplot(1, 1, 1)
circ = plt.Circle((0, 0), radius = 1, edgecolor = 'black', facecolor = 'None')
_ = ax.add_patch(circ)
_ = plt.scatter(x = roots_1.real, y = roots_1.imag, color = "r")
_ = plt.xlim(-1.25, 1.25)
_ = plt.ylim(-1.25, 1.25)
_ = plt.axhline(y = 0, linestyle = "--", color = "b")
_ = plt.axvline(x = 0, linestyle = "--", color = "b")
_ = plt.title("AR(1)")
plt.show()

Note that the real part is on the horizontal axis and the imaginary on the vertical axis.

print(paste0("(D) and (E) ", polynom::polynomial(coef = c(1, -1.1, 0.2))))
## [1] "(D) and (E) 1 - 1.1*x + 0.2*x^2"
print("(D) and (E)\n" + str(np.poly1d([0.2, -1.1, 1], variable = 'z')))
## (D) and (E)
##      2
## 0.2 z - 1.1 z + 1
sprintf("Roots: %s", polyroot(c(1, -1.1, 0.2)))
## [1] "Roots: 1.14921894064179+0i" "Roots: 4.35078105935821-0i"
print("Roots: " + str(np.poly1d([0.2, -1.1, 1], variable = 'z').roots))
## Roots: [4.35078106 1.14921894]

The invertability concept concerns only moving-average and ARMA processes, but it is calculated in a similar way as before, only now we need to find the roots of the moving-average lag operator function \(\Theta(x)\):

  • \((A)\) \(1 + 0.5x = 0\)
  • \((B)\) \(1 + 1.3x - 0.4 x^2 = 0\)
  • \((E)\) \(1 + 1.3x = 0\)

We will not repeat the manual calculations and instead immediately look at the results via R and Python:

print(paste0("(A) ", polynom::polynomial(coef = c(1, 0.5))))
## [1] "(A) 1 + 0.5*x"
print("(A)" + str(np.poly1d([0.5, 1], variable = 'x')))
## (A) 
## 0.5 x + 1
sprintf("Roots: %s", polyroot(c(1, 0.5)))
## [1] "Roots: -2+0i"
print("Roots: " + str(np.poly1d([0.5, 1], variable = 'x').roots))
## Roots: [-2.]
print(paste0("(B) ", polynom::polynomial(coef = c(1, 1.3, -0.4))))
## [1] "(B) 1 + 1.3*x - 0.4*x^2"
print("(B)\n" + str(np.poly1d([-0.4, 1.3, 1], variable = 'x')))
## (B)
##       2
## -0.4 x + 1.3 x + 1
print(polyroot(c(1, 1.3, -0.4)))
## [1] -0.6422946+0i  3.8922946-0i
print("Roots: " + str(np.poly1d([-0.4, 1.3, 1], variable = 'x').roots))
## Roots: [ 3.89229464 -0.64229464]
print(paste0("(E) ", polynom::polynomial(coef = c(1, 1.3))))
## [1] "(E) 1 + 1.3*x"
print("(E)\n" + str(np.poly1d([1.3, 1], variable = 'x')))
## (E)
##  
## 1.3 x + 1
print(polyroot(c(1, 1.3)))
## [1] -0.7692308+0i
print("Roots: " + str(np.poly1d([1.3, 1], variable = 'x').roots))
## Roots: [-0.76923077]

We see that for \((A)\), the process is invertible (the roots are outside the unit circle), while in \((B)\) it is not since one of its roots is inside the unit circle. Finally, \((E)\) is also non-invertible.

So, to recap:

  • \((A)\) is stationary and invertible;
  • \((B)\) is stationary and non-invertible;
  • \((C)\) is non-stationary and invertible;
  • \((D)\) is stationary and invertible;
  • \((E)\) is stationary and non-invertible;

We will remind that an MA model is invertible if it is equivalent to a converging infinite order AR model. By converging, we mean that the AR coefficients decrease to 0 as we move back in time.

2.3.1.2 Task 2

To make it easier to follow, we will simulate each process separately.

Considerations

Before beginning our data simulation it is important to mention a process known as ’burn-in. As defined here (which argues against using burn-in when applying MCMC methods):

The name “burn-in” comes from electronics (see the entry in the Jargon File). Many electronics components fail quickly. Those that don’t are a more reliable subset. So a burn-in is done at the factory to eliminate the worst ones.

— Charles J. Geyer, “Burn-In is Unnecessary”

Why is this important for time series simulation? To answer this question, we will analyse a simple \({\rm AR}(1)\) process: \[ Y_t = \phi_1 Y_{t-1} + \varepsilon_t \] Assuming that we have some starting value \(Y_0\), we can write the process as: \[ \begin{align} t = 1&: Y_1 = \phi_1 Y_0 + \varepsilon_1 \\ t = 2&: Y_2 = \phi_1 Y_1 + \varepsilon_2 = \phi_1^2 Y_0 + \varepsilon_2 + \phi_1 \varepsilon_1 \\ &\vdots\\ t = k&: Y_k = \phi_1 Y_{k-1} + \varepsilon_k = \phi_1^k Y_0 + \sum_{j=0}^{k-1}\phi_1^j \varepsilon_{k-j} \end{align} \] as \(t \longrightarrow \infty\). Assume that this is the true DGP, \(Y_t^{\rm DGP}\).

In contrast, when simulating the data, we do not have \(Y_0\), so we assume some starting value, e.g. \(Y_0 = 0\). With this assumption, the difference between the true data generating process, \(Y_t^{\rm DGP}\), and our simulated process, \(Y_t^{\rm sim}\), is: \[ Y_t^{\rm DGP} - Y_t^{\rm sim} = \phi_1^k Y_0 \] If our \({\rm AR}(1)\) process is stationary (\(|\phi_1| < 1\)), then \(\phi_1^k Y_0 \longrightarrow 0\), as \(k \longrightarrow \infty\). In other words, the initial values at the start of the series differ the most between \(Y_t^{\rm DGP}\) and \(Y_t^{\rm sim}\). This means that the process, which we are simulating may not be the same process that we want to simulate. However, as we continue to simulate the series, this difference diminishes, as \(t\) increases.

This means that we can apply the burn-in algorithm as follows:

  • Simulate \(T + m\) observations, such that \(\phi_1^m Y_0 \approx 0\).
  • Drop the first \(m\) observations from the series.

Unfortunately, the number of additional burn-in observations, \(m\), depend on the complexity of the model. Looking at the function code for arima.sim in R, we see that it depends on the number of parameters:

if (is.na(n.start)) n.start <- p + q + ifelse(p > 0, ceiling(6/log(minroots)), 0)

if (n.start < p + q) stop(“burn-in ‘n.start’ must be as long as ‘ar + ma’”)

arima.sim function code in R

For our cases, we will take \(m = 50\). As an example, we will simulate the \((D)\) \(Y_t = 1 + 1.1 Y_{t-1} - 0.2 Y_{t-2} + \epsilon_t\) series and compare the beginning of the series, to the rest of the series:

set.seed(123)
#
m <- 50
N <- 150
epsilon <- rnorm(mean = 0, sd = 0.5, n = N + m)
#
temp <- 1 + epsilon[1]
temp <- c(temp, 1 + 1.1 * temp[1] + epsilon[2])
for(j in 3:(N + m)){
  Y_temp <- 1 + 1.1 * temp[j - 1] - 0.2 * temp[j - 2] + epsilon[j]
  temp <- c(temp, Y_temp)
}
np.random.seed(123)
#
m = 50
N = 150
epsilon = np.random.normal(loc = 0, scale = 0.5, size = N + m)
#
temp = np.array(1 + epsilon[0], ndmin = 1)
temp = np.append(temp, 1 + 1.1 * temp[0] + epsilon[1])
for j in range(2, N + m):
    Y_temp = 1 + 1.1 * temp[j - 1] - 0.2 * temp[j - 2] + epsilon[j]
    temp = np.append(temp, Y_temp)
#      

Lets compare \(t_1 = 1, ..., N\) and \(t_2 = m + 1, ..., N + m\):

par(mfrow = c(1, 2))
#
plot(temp[1:N], col = "cornflowerblue", type = "l", lwd = 2,
     main = as.expression(bquote(t==1~"..."~N)))
lines(temp[1:m], col = "red", lwd = 2)
plot(temp[(m + 1):(N + m)], col = "cornflowerblue", type = "l", lwd = 2,
     main = as.expression(bquote(t==m+1~"..."~N+m)))
lines((N - m):(N), temp[(N):(N + m)], col = "darkgreen", lwd = 2)

fig = plt.figure(figsize = (10, 8))
_ = fig.add_subplot(1, 2, 1).plot(temp[:N])
_ = plt.plot(list(range(0, m)), temp[:m], color = 'red')
_ = plt.title("$t = 1, ..., N$")
_ = fig.add_subplot(1, 2, 2).plot(temp[m:])
_ = plt.plot(list(range(N - m - 1, N)), temp[(N - 1):], color = 'darkgreen')
_ = plt.title("$t = m + 1,..., N+m$")
plt.show()

Comparing the beginning of the series (in red), we see that our starting value has a significant impact on how the series looks - as time increases, the series fluctuates around a mean, which is much higher than \(0\). This would mean, that we have to be very careful with our starting value seelection, if we do not want to use burn-in. On the other hand, if we drop the first \(m\) observations and continue simulating \(N + 1, ..., N + m\) additional observations, so that our series remains at length \(N\), we see that the series continues to fluctuate around \(10\).

Consequently, if we do not want to spend extra time selecting an appropriate starting value, it is best to apply the burn-in process.

2.3.1.2.1 \((A)\) \(Y_t = 1 + \epsilon_t + 0.5 \epsilon_{t-1}\)

The logic behind generating this process is as follows:

  • when \(t = 0\), we have \(\epsilon_0 = 0\), \(Y_0 = 0\)
  • when \(t = 1\), we have \(Y_1 = 1 + \epsilon_1 + 0.5 \cdot \epsilon_0 = 1 + \epsilon_1\)
  • when \(t = 2\), we have \(Y_2 = 1 + \epsilon_2 + 0.5 \cdot \epsilon_1\)
  • when \(t = 3\), we have \(Y_3 = 1 + \epsilon_3 + 0.5 \cdot \epsilon_2\)
  • etc.

We can carry out these calculations as follows:

set.seed(123)
#
m <- 50
N <- 150
epsilon <- rnorm(mean = 0, sd = 0.5, n = N + m)
#
A_data_full <- 1 + epsilon[1]
for(j in 2:(N + m)){
  Y_temp <- 1 + epsilon[j] + 0.5 * epsilon[j - 1]
  A_data_full <- c(A_data_full, Y_temp)
}
A_data <- A_data_full[-c(1:m)]
np.random.seed(123)
#
m = 50
N = 150
epsilon = np.random.normal(loc = 0, scale = 0.5, size = N + m)
#
A_data_full = 1 + epsilon[0]
for j in range(1, N + m):
    Y_temp = 1 + epsilon[j] + 0.5 * epsilon[j - 1]
    A_data_full = np.append(A_data_full, Y_temp)
#
A_data = A_data_full[m:]
2.3.1.2.2 \((B)\) \(Y_t = 1 + \epsilon_t + 1.3 \epsilon_{t-1} - 0.4 \epsilon_{t-2}\)

The logic behind generating this process is as follows:

  • when \(t = 0\), we have \(\epsilon_0 = 0\), \(Y_0 = 0\)
  • when \(t = 1\), we have \(Y_1 = 1 + \epsilon_1 + 1.3 \cdot \epsilon_0 - 0.4 \cdot \epsilon_{-1}= 1 + \epsilon_1\)
  • when \(t = 2\), we have \(Y_2 = 1 + \epsilon_2 + 1.3 \cdot \epsilon_1 - 0.4 \cdot \epsilon_{0} = 1 + \epsilon_2 + 1.3 \epsilon_1\)
  • when \(t = 3\), we have \(Y_3 = 1 + \epsilon_3 + 1.3 \cdot \epsilon_2 - 0.4 \cdot \epsilon_1\)
  • when \(t = 4\), we have \(Y_4 = 1 + \epsilon_4 + 1.3 \cdot \epsilon_3 - 0.4 \cdot \epsilon_2\)
  • etc.

We can carry out these calculations as follows:

set.seed(123)
#
m <- 50
N <- 150
epsilon <- rnorm(mean = 0, sd = 0.5, n = N + m)
#
B_data_full <- 1 + epsilon[1]
B_data_full <- c(B_data_full, 1 + epsilon[2] + 1.3 * epsilon[1])
for(j in 3:(N + m)){
  Y_temp <- 1 + epsilon[j] + 1.3 * epsilon[j - 1] - 0.4 * epsilon[j - 2]
  B_data_full <- c(B_data_full, Y_temp)
}
B_data <- B_data_full[-c(1:m)]
np.random.seed(123)
#
m = 50
N = 150
epsilon = np.random.normal(loc = 0, scale = 0.5, size = N + m)
#
B_data_full = 1 + epsilon[0]
B_data_full = np.append(B_data_full, 1 + epsilon[1] + 1.3 * epsilon[0])
for j in range(2, N + m):
    Y_temp = 1 + epsilon[j] + 1.3 * epsilon[j - 1] - 0.4 * epsilon[j - 2]
    B_data_full = np.append(B_data_full, Y_temp)
#
B_data = B_data_full[m:]

We have now started the loop from the third element instead of the second, since the first two elements include \(\epsilon\) at values \(t \leq 0\), so they have a different formula. (Remember that arrays in Python start at 0, while in R they start at 1).

plot(B_data, type = "l", col = "cornflowerblue", lwd = 2)
lines(A_data, type = "l", col = "orange", lwd = 2)
title("Moving-Average Process")
legend("bottomleft", 
       legend = c(as.expression(bquote(Y[t]==1+epsilon[t]+0.5*epsilon[t-1]~"(invertible)")),
                  as.expression(bquote(Y[t]==1+epsilon[t]+1.3*epsilon[t-1]-0.4*epsilon[t-2]~"(non-invertible)"))),
       lty = 1, col = c("cornflowerblue", "orange"), bg = "grey95")

fig = plt.figure(figsize = (10, 8))
#
_ = plt.plot(B_data, label = "$Y_t = 1 + \\epsilon_t + 0.5 \\epsilon_{t-1}$ (invertible)")
_ = plt.plot(A_data, label = "$Y_t = 1 + \\epsilon_t + 1.3 \\epsilon_{t-1} - 0.4 \\epsilon_{t-2}$ (non-invertible)")
_ = plt.title("Moving-Average Processes")
_ = plt.legend()
plt.show()

Notice that the plots do not exhibit large differences - you may not be able to tell whether a process is invertrible or not from the time series plots alone.

2.3.1.2.3 \((C)\) \(Y_t = 1 + 1.1 Y_{t-1} + \epsilon_t\)

The logic behind generating this process is as follows:

  • when \(t = 0\), we have \(\epsilon_0 = 0\), \(Y_0 = 0\)
  • when \(t = 1\), we have \(Y_1 = 1 + 1.1 \cdot Y_{0} + \epsilon_1 = 1 + \epsilon_1\)
  • when \(t = 2\), we have \(Y_2 = 1 + 1.1 \cdot Y_{1} + \epsilon_2 = 1 + 1.1 \cdot (1 + \epsilon_1) + \epsilon_2\)
  • when \(t = 3\), we have \(Y_3 = 1+ 1.1 Y_{2} + \epsilon_3 = 1 + 1.1 \cdot (1 + 1.1 \cdot (1 + \epsilon_1) + \epsilon_2) + \epsilon_3\)
  • etc.

We can carry out these calculations as follows:

set.seed(123)
#
m <- 50
N <- 150
epsilon <- rnorm(mean = 0, sd = 0.5, n = N + m)
#
C_data_full <- 1 + epsilon[1]
for(j in 2:(N + m)){
  Y_temp <- 1 + 1.1 * C_data_full[j - 1] + epsilon[j]
  C_data_full <- c(C_data_full, Y_temp)
}
C_data <- C_data_full[-c(1:m)]
np.random.seed(123)
#
m = 50
N = 150
epsilon = np.random.normal(loc = 0, scale = 0.5, size = N + m)
#
C_data_full = np.array(1 + epsilon[0], ndmin = 1)
for j in range(1, N + m):
    Y_temp = 1 + 1.1 * C_data_full[j - 1] + epsilon[j]
    C_data_full = np.append(C_data_full, Y_temp)
#
C_data = C_data_full[m:]
2.3.1.2.4 \((D)\) \(Y_t = 1 + 1.1 Y_{t-1} - 0.2 Y_{t-2} + \epsilon_t\)

The logic behind generating this process is as follows:

  • when \(t = 0\), we have \(\epsilon_0 = 0\), \(Y_0 = 0\)
  • when \(t = 1\), we have \(Y_1 = 1 + 1.1 \cdot Y_{0} - 0.2 \cdot Y_{-1} + \epsilon_1 = 1 + \epsilon_1\)
  • when \(t = 2\), we have \(Y_2 = 1 + 1.1 \cdot Y_{1} - 0.2 \cdot Y_0 + \epsilon_2 = 1 + 1.1 \cdot (1 + \epsilon_1) + \epsilon_2\)
  • when \(t = 3\), we have \(Y_3 = 1+ 1.1 Y_{2} - 0.2 Y_{1} + \epsilon_3 = 1 + 1.1 \cdot (1 + 1.1 \cdot (1 + \epsilon_1) + \epsilon_2) - 0.2 \cdot (1 + \epsilon_1)+ \epsilon_3\)
  • etc.

We can carry out these calculations as follows:

set.seed(123)
#
m <- 50
N <- 150
epsilon <- rnorm(mean = 0, sd = 0.5, n = N + m)
#
D_data_full <- 1 + epsilon[1]
D_data_full <- c(D_data_full, 1 + 1.1 * D_data_full[1] + epsilon[2])
for(j in 3:(N + m)){
  Y_temp <- 1 + 1.1 * D_data_full[j - 1] - 0.2 * D_data_full[j - 2] + epsilon[j]
  D_data_full <- c(D_data_full, Y_temp)
}
D_data <- D_data_full[-c(1:m)]
np.random.seed(123)
#
m = 50
N = 150
epsilon = np.random.normal(loc = 0, scale = 0.5, size = N + m)
#
D_data_full = np.array(1 + epsilon[0], ndmin = 1)
D_data_full = np.append(D_data_full, 1 + 1.1 * D_data_full[0] + epsilon[1])
for j in range(2, N + m):
    Y_temp = 1 + 1.1 * D_data_full[j - 1] - 0.2 * D_data_full[j - 2] + epsilon[j]
    D_data_full = np.append(D_data_full, Y_temp)
# 
D_data = D_data_full[m:]

We have started the loop from 2 instead of 1, since the first two elements include pas values of \(Y\) at values \(t \leq 0\), so they have a different formula.

par(mfrow = c(1, 2))
#
plot(C_data, col = "cornflowerblue", type = "l", lwd = 2,
     main = as.expression(bquote(Y[t]==1+1.1*Y[t-1]+epsilon[t]~"(non-stationary)")))
plot(D_data, col = "cornflowerblue", type = "l", lwd = 2,
     main = as.expression(bquote(Y[t]==1+1.1*Y[t-1]-0.2*Y[t-2]+epsilon[t]~"(stationary)")))

fig = plt.figure(figsize = (10, 8))
_ = fig.add_subplot(1, 2, 1).plot(C_data)
_ = plt.title("$Y_t = 1 + 1.1 Y_{t-1} + \\epsilon_t$ (non-stationary)")
_ = fig.add_subplot(1, 2, 2).plot(D_data)
_ = plt.title("$Y_t = 1 + 1.1 Y_{t-1} - 0.2 Y_{t-2} + \\epsilon_t$ (stationary)")
plt.show()

2.3.1.2.5 \((E)\) \(Y_t = 1.1 Y_{t-1} - 0.2 Y_{t-2} + \epsilon_t + 1.3 \epsilon_{t-1}\)

The logic behind generating this process is as follows:

  • when \(t = 0\), we have \(\epsilon_0 = 0\), \(Y_0 = 0\)
  • when \(t = 1\), we have \(Y_1 = 1.1 \cdot Y_{0} - 0.2 \cdot Y_{-1} + \epsilon_1 + 1.3 \cdot \epsilon_0 = \epsilon_1\)
  • when \(t = 2\), we have \(Y_2 = 1.1 \cdot Y_{1} - 0.2 \cdot Y_0 + \epsilon_2 + 1.3 \cdot \epsilon_1 = 1.1 \cdot (1 + \epsilon_1) + \epsilon_2 + 1.3 \cdot \epsilon_1\)
  • when \(t = 3\), we have \(Y_3 = 1.1 Y_{2} - 0.2 Y_{1} + \epsilon_3 + 1.3 \cdot \epsilon_2 = 1.1 \cdot (1 + 1.1 \cdot (1 + \epsilon_1) + \epsilon_2 + 1.3 \cdot \epsilon_1) - 0.2 \cdot (1 + \epsilon_1)+ \epsilon_3 + 1.3 \cdot \epsilon_2\)
  • etc.

We can carry out these calculations as follows:

set.seed(123)
#
N <- 150
epsilon <- rnorm(mean = 0, sd = 0.5, n = N + m)
#
E_data_full <- epsilon[1]
E_data_full <- c(E_data_full, 1.1 * E_data_full[1] + epsilon[2] + 1.3* epsilon[1])
for(j in 3:(N + m)){
  Y_temp <- 1.1 * E_data_full[j - 1] - 0.2 * E_data_full[j - 2] + epsilon[j] + 1.3 * epsilon[j - 1]
  E_data_full <- c(E_data_full, Y_temp)
}
E_data <- E_data_full[-c(1:m)]
np.random.seed(123)
#
m = 50
N = 150
epsilon = np.random.normal(loc = 0, scale = 0.5, size = N + m)
#
E_data_full = np.array(epsilon[0], ndmin = 1)
E_data_full = np.append(E_data_full, 1.1 * E_data_full[0] + epsilon[1] + 1.3 * epsilon[0])
for j in range(2, N + m):
    Y_temp = 1.1 * E_data_full[j - 1] - 0.2 * E_data_full[j - 2] + epsilon[j] + 1.3 * epsilon[j - 1]
    E_data_full = np.append(E_data_full, Y_temp)
#
E_data = E_data_full[m:]
plot(E_data, type = "l", col = "cornflowerblue", lwd = 2, ylim = c(min(D_data, E_data), max(D_data, E_data)))
lines(D_data, type = "l", col = "orange", lwd = 2)
title("ARMA(2, 1) Process vs AR(2) process")
legend("bottomleft", 
       legend = c(as.expression(bquote(Y[t]==1.1*Y[t-1]-0.2*Y[t-2]+epsilon[t]+1.3*epsilon[t-1]~"(stationary non-invertible)")),
                  as.expression(bquote(Y[t]==1+1.1*Y[t-1]-0.2*Y[t-2]+epsilon[t]~"(stationary)"))),
       lty = 1, col = c("cornflowerblue", "orange"), bg = "grey95")

fig = plt.figure(figsize = (10, 8))
#
_ = plt.plot(E_data, label = "$Y_t = 1.1 Y_{t-1} - 0.2 Y_{t-2} + \\epsilon_t + 1.3 \\epsilon_{t-1}$ (stationary non-invertible)")
_ = plt.plot(D_data, label = "$Y_t = 1 + 1.1 Y_{t-1} - 0.2 Y_{t-2} + \\epsilon_t$ (stationary)")
_ = plt.title("ARMA(2, 1) Process vs AR(2) process")
_ = plt.legend()
plt.show()

Note the difference in the mean - our \(ARMA\) process does not have an intercept term.

Considerations

Alternatively, we can use the built-in functions to simulate the models. We need to specify the \(\Phi(z)\) and \(\Theta(x)\) functions.

set.seed(123)
E_autogenr <- arima.sim(model = list(ar = c(1.1, -0.2), ma = 1.3), 
                        n = N, sd = 0.5, n.start = 50)

Note that in R a burn-in must be specified, which must be at least equal to ar + ma parameters. This is to ensure that all of the observations follow the specified \(ARMA\) equation.

np.random.seed(123)
E_auto_spec = smt.arima_process.ArmaProcess(ar = np.array([1, -1.1, 0.2]), ma = [1, 1.3])
E_autogenr =  E_auto_spec.generate_sample(N, scale = 0.5, burnin = 50)

Note that for pure \(\rm AR\) models \(\Theta(x) = 1\) and for pure \(\rm MA\) models \(\Phi(z) = 1\).

The nice thing about this function is that we can examine these lag functions:

#
print(E_auto_spec.arpoly)
## poly([ 1.  -1.1  0.2])
#
print(E_auto_spec.mapoly)
## poly([1.  1.3])

as well as get their calculated roots:

#
print(E_auto_spec.arroots)
## [1.14921894 4.35078106]
#
print(E_auto_spec.maroots)
## [-0.76923077]

and check the stationarity/invertibility automatically:

#
print("Stationarity: " + str(E_auto_spec.isstationary))
## Stationarity: True
#
print("Invertibility: " + str(E_auto_spec.isinvertible))
## Invertibility: False

We can compare the two simulation methods:

plot(E_data, type = "l", col = "cornflowerblue", lwd = 2)
lines(E_autogenr, col = "red", lty = 2, lwd = 2)
title("ARMA(2, 1) Process: Automatic vs Manual Simulation")
legend("bottomleft", 
       legend = c(as.expression(bquote(Y[t]==1.1*Y[t-1]-0.2*Y[t-2]+epsilon[t]+1.3*epsilon[t-1]~"(stationary non-invertible)")),
                  as.expression(bquote("autogenerated ARMA(2, 1) process with the same parameters"))),
       lty = c(1, 2), lwd = 2, col = c("cornflowerblue", "red"), bg = "grey95")

In R the automatic simulation creates N + n.start observations. Then the first n.start observations are dropped as burn-in values.

fig = plt.figure(figsize = (10, 8))
#
_ = plt.plot(E_data, label = "$Y_t = 1.1 Y_{t-1} - 0.2 Y_{t-2} + \\epsilon_t + 1.3 \\epsilon_{t-1}$ (stationary non-invertible)", color = "blue")
_ = plt.plot(E_autogenr, label = "autogenerated ARMA(2, 1) process with the same parameters", linestyle = "-.", color = "red")
_ = plt.title("ARMA(2, 1) Process: Automatic vs Manual Simulation")
_ = plt.legend()
plt.show()

In Python the automatic simulation creates N + burnin observations. Then the first burnin observations are dropped as burn-in values.

Note - the same innovations were used to generate \((A)\) - \((E)\) processes (because for each of the process we have specified the same seed, and then we simulated the innovations using the same function and parameters), and the automated function simulated the innovations via the same specified seed.

#
#
#
par(mfrow = c(1, 2))
#
#
forecast::Acf(E_autogenr, lag.max = 20)
#
forecast::Pacf(E_autogenr, lag.max = 20)

fig = plt.figure(figsize = (10, 8))
_ = sm.tsa.graphics.plot_acf(E_autogenr, lags = 20, ax = fig.add_subplot(121),
                             zero = False)
_ = plt.ylabel('ACF')
_ = sm.tsa.graphics.plot_pacf(E_autogenr, lags = 20, ax = fig.add_subplot(122), 
                              method = "ywm", zero = False)
_ = plt.ylabel('PACF')
_ = plt.tight_layout()
plt.show()

NOTE on a possible PACF error, need to specify either method = "ywm", or method = "ols"

Remember to exclude Lag = 0 from the ACF/PACF plots! This will ensure that the first bar in the plots is for the correlation between \(Y_t\) and \(Y_{t-1}\), instead of \(Y_t\) and itself!

2.3.1.3 Task 3

We remember that \(\epsilon_t \sim \mathcal{N}(0, 0.5^2)\).

  • \((A)\) \(\mathbb{E} (Y_t) = \mathbb{E} (1 + \epsilon_t + 0.5 \epsilon_{t-1}) = 1\)
print(mean(A_data))
## [1] 0.9846655
print(A_data.mean())
## 1.0045623781868285
  • \((B)\) \(\mathbb{E} (Y_t) = \mathbb{E} (1 + \epsilon_t + 1.3 \epsilon_{t-1} - 0.4 \epsilon_{t-2}) = 1\)
print(mean(B_data))
## [1] 0.9797011
print(B_data.mean())
## 1.006202206163214
  • \((C)\) \(\mathbb{E} (Y_t) = \mathbb{E} (1 + 1.1 Y_{t-1} + \epsilon_t) = 1 + 1.1 \mathbb{E} (Y_{t-1}) = 1 + 1.1 + (1.1)^2 \mathbb{E} (Y_{t-2}) = 1 + 1.1 + (1.1)^2 + (1.1)^3 \mathbb{E} (Y_{t-2}) = ... = \sum_{j = 0}^\infty (1.1)^j = \infty\).

Assuming that \(Y_t = 0\) for \(t \leq 0\), this can be expressed as \(\mathbb{E} (Y_t) =\sum_{j = 0}^{t-1} (1.1)^j\) for an \(AR(1)\) process. So, for \(t = 150\) this is roughly:

tmp_sum <- 0
for(i in 1:N){
  tmp_sum <- tmp_sum + 1.1^i
}
print(tmp_sum)
## [1] 17794885
tmp_sum = 0
for i in range(0, N):
    tmp_sum = tmp_sum + (1.1)**i
#    
print(tmp_sum)
## 16177168.357762082
print(mean(C_data))
## [1] 147172811
print(C_data.mean())
## 132651310.6781233

The above is an example of a non-stationary process with a non-constant mean.

  • \((D)\) \(\mathbb{E} (Y_t) = \mathbb{E} (1 + 1.1 Y_{t-1} - 0.2 Y_{t-2} + \epsilon_t) = 1 + 1.1 \mathbb{E} (Y_{t-1}) - 0.2 \mathbb{E} (Y_{t-2})\). For a stationary process \(\mathbb{E} (Y_t) = \mu\), which gives \(\mu = 1 + 1.1 \mu - 0.2 \mu\), which gives \(\mu = \dfrac{1}{1 - 1.1 + 0.2} = \dfrac{1}{0.1} = 10\).
print(mean(D_data))
## [1] 9.957957
print(D_data.mean())
## 10.212536830480733
  • \((E)\) \(\mathbb{E} (Y_t) = \mathbb{E} (1.1 Y_{t-1} - 0.2 Y_{t-2} + \epsilon_t + 1.3 \epsilon_{t-1}) = 1.1 \mathbb{E} (Y_{t-1}) - 0.2 \mathbb{E} (Y_{t-2})\). Since \(Y_t\) is stationary and invertible, we can write it as an infinite MA process. Furthermore, for a stationary process \(\mathbb{E} (Y_t) = \mu\), which gives \(\mu = 1.1 \mu - 0.2 \mu \Longrightarrow \mu = 0\).
print(mean(E_data))
## [1] -0.0836812
print(E_data.mean())
## 0.5263666256423345

2.3.1.4 Task 4

  • \((A)\) \(Y_t = 1 + \epsilon_t + 0.5 \epsilon_{t-1}\)
par(mfrow = c(1, 2))
#
#
forecast::Acf(A_data, lag.max = 20)
#
#
forecast::Pacf(A_data, lag.max = 20)

fig = plt.figure(figsize = (10, 8))
_ = sm.tsa.graphics.plot_acf(A_data, lags = 20, zero = False, ax = fig.add_subplot(121))
_ = plt.ylabel('ACF')
_ = sm.tsa.graphics.plot_pacf(A_data, lags = 20, zero = False, ax = fig.add_subplot(122), method = "ywm")
_ = plt.ylabel('PACF')
_ = plt.tight_layout()
plt.show()

We see that the ACF is abruptly cutoff after the first lag. Regarding PACF - it appears to be somewhat declining, though it is difficult to say. We expect it to be an MA(1) process.

  • \((B)\) \(Y_t = 1 + \epsilon_t + 1.3 \epsilon_{t-1} - 0.4 \epsilon_{t-2}\)
par(mfrow = c(1, 2))
#
#
forecast::Acf(B_data, lag.max = 20)
#
#
forecast::Pacf(B_data, lag.max = 20)

fig = plt.figure(figsize = (10, 8))
_ = sm.tsa.graphics.plot_acf(B_data, lags = 20, zero = False, ax = fig.add_subplot(121))
_ = plt.ylabel('ACF')
_ = sm.tsa.graphics.plot_pacf(B_data, lags = 20, zero = False, ax = fig.add_subplot(122), method = "ywm")
_ = plt.ylabel('PACF')
_ = plt.tight_layout()
plt.show()

We see that the ACF is abruptly cutoff after the second lag in R. On the other hand - the ACF is abruptly cutoff after the first lag in Python. Regarding PACF - it appears to be somewhat declining, though it is difficult to say. We would expect an \({\rm MA}(1)\) process in Python (though in reality we know that it is an \({\rm MA}(2)\) process).

  • \((C)\) \(Y_t = 1 + 1.1 Y_{t-1} + \epsilon_t\)
par(mfrow = c(1, 2))
#
#
forecast::Acf(C_data, lag.max = 20)
#
#
forecast::Pacf(C_data, lag.max = 20)

fig = plt.figure(figsize = (10, 8))
_ = sm.tsa.graphics.plot_acf(C_data, lags = 20, zero = False, ax = fig.add_subplot(121))
_ = plt.ylabel('ACF')
_ = sm.tsa.graphics.plot_pacf(C_data, lags = 20, zero = False, ax = fig.add_subplot(122), method = "ywm")
_ = plt.ylabel('PACF')
_ = plt.tight_layout()
plt.show()

The ACF appears to decline, while the PACF is abruptly cutoff after the first lag. We would expect an AR(1) process.

  • \((D)\) \(Y_t = 1 + 1.1 Y_{t-1} - 0.2 Y_{t-2} + \epsilon_t\)
par(mfrow = c(1, 2))
#
#
forecast::Acf(D_data, lag.max = 20)
#
#
forecast::Pacf(D_data, lag.max = 20)

fig = plt.figure(figsize = (10, 8))
_ = sm.tsa.graphics.plot_acf(D_data, lags = 20, zero = False, ax = fig.add_subplot(121))
_ = plt.ylabel('ACF')
_ = sm.tsa.graphics.plot_pacf(D_data, lags = 20, zero = False, ax = fig.add_subplot(122), method = "ywm")
_ = plt.ylabel('PACF')
_ = plt.tight_layout()
plt.show()

The ACF appears to decline. The PACF is abruptly cutoff after the first lag in R. We would expect an \({\rm AR}(1)\) process (though in reality it is an \({\rm AR}(2)\) process). In Python, the ACF is abruptly cutoff after the second lag, indicating an \({\rm AR}(2)\) process.

Considerations

(Try to increase the sample size and re-examine the ACF/PACF plots - are they closer to the true theoretical cases?)

  • \((E)\) \(Y_t = 1.1 Y_{t-1} - 0.2 Y_{t-2} + \epsilon_t + 1.3 \epsilon_{t-1}\)
par(mfrow = c(1, 2))
#
#
forecast::Acf(E_data, lag.max = 20)
#
#
forecast::Pacf(E_data, lag.max = 20)

fig = plt.figure(figsize = (10, 8))
_ = sm.tsa.graphics.plot_acf(E_data, lags = 20, zero = False, ax = fig.add_subplot(121))
_ = plt.ylabel('ACF')
_ = sm.tsa.graphics.plot_pacf(E_data, lags = 20, zero = False, ax = fig.add_subplot(122), method = "ywm")
_ = plt.ylabel('PACF')
_ = plt.tight_layout()
plt.show()

We see both the ACF and PACF to be declining (it is more obvious in R, but less so in Python) - we believe that it is an \(\rm ARMA\) process. We cannot identify its order from these plots alone.

Considerations

Note that if we have also automatically generated the data using the built-in functions, we can examine the theoretical ACF/PACF plots:

acf_theor <- stats::ARMAacf(ar = c(1.1, -0.2), ma = 1.3, lag.max = 19)
acf_empir <- stats::acf(E_autogenr, plot = FALSE, lag.max = 19)$acf
pacf_theor <- stats::ARMAacf(pacf = TRUE, ar = c(1.1, -0.2), ma = 1.3, lag.max = 19)
pacf_empir <- stats::pacf(E_autogenr, plot = FALSE, lag.max = 19)$acf
#
par(mfrow = c(2, 1))
# ACF:
plot(0:19, acf_theor, type = "h", col = "red", pch = 21, 
     ylim = c(min(acf_theor, acf_empir), max(acf_theor, acf_empir)))
lines(0:19, acf_theor, col = "red")
points(0:19, acf_theor, col = "red", pch = 19)
#
abline(h = 0, lty = 2)
#
lines(0:19, acf_empir, type = "h", col = "darkgreen")
points(0:19, acf_empir, col = "darkgreen", pch = 19)
lines(0:19, acf_empir, col = "darkgreen")
#
legend("topright", legend = c("Theoretical ACF", "Empirical ACF"), 
       pch = 19, col = c("red", "darkgreen"))
# PACF:
plot(1:19, pacf_theor, type = "h", col = "red", pch = 21, 
     ylim = c(min(pacf_theor, pacf_empir), max(pacf_theor, pacf_empir)))
lines(1:19, pacf_theor, col = "red")
points(1:19, pacf_theor, col = "red", pch = 19)
#
abline(h = 0, lty = 2)
#
lines(1:19, pacf_empir, type = "h", col = "darkgreen")
points(1:19, pacf_empir, col = "darkgreen", pch = 19)
lines(1:19, pacf_empir, col = "darkgreen")
#
legend("topright", legend = c("Theoretical PACF", "Empirical PACF"), 
       pch = 19, col = c("red", "darkgreen"))

Note: The \(PACF\) in R starts from Lag = 1, instead of Lag = 0.

#
#
#
#
#
fig = plt.figure(figsize = (10, 8))
# ACF:
ax1 = fig.add_subplot(211)
_ = ax1.stem(E_auto_spec.acf(lags = 20), use_line_collection = True,
             label = "Theoretical ACF", basefmt = "white", 
             linefmt = "r-", markerfmt = "ro-")
#             
_ = ax1.stem(smt.stattools.acf(E_autogenr, nlags = 19), use_line_collection = True,
             label = "Empirical ACF", basefmt = "white", 
             linefmt = "g-", markerfmt = "go-")
#
_ = ax1.axhline(y = 0, color = "black")
#
_ = plt.legend()
#
# PACF:
_ = ax2 = fig.add_subplot(212)
_ = ax2.stem(E_auto_spec.pacf(lags = 20), 
             label = "Theoretical PACF", use_line_collection = True,
             basefmt = "white", linefmt = "r-", markerfmt = "ro-")
#             
_ = ax2.stem(smt.stattools.pacf(E_autogenr, nlags = 19, method = "ywm"), 
             label = "Empirical PACF", basefmt = "white", use_line_collection = True,
             linefmt = "g-", markerfmt = "go-")
#
_ = ax2.axhline(y = 0, color = "black")
#
_ = plt.legend()
plt.show()

As a further example, we will re-generate the process but with a larger sample size of \(N = 1500\):

set.seed(123)
E_autogenr_large <- arima.sim(model = list(ar = c(1.1, -0.2), ma = 1.3),
                              n = 1500, sd = 0.5, n.start = 50)
np.random.seed(123)
#
E_autogenr_large =  E_auto_spec.generate_sample(1500, scale = 0.5, burnin = 50)

And examine the ACF/PACF plots of this larger sample, as well as compare to the theoretical values:

acf_theor <- stats::ARMAacf(ar = c(1.1, -0.2), ma = 1.3, lag.max = 19)
acf_empir <- stats::acf(E_autogenr_large, plot = FALSE, lag.max = 19)$acf
pacf_theor <- stats::ARMAacf(pacf = TRUE, ar = c(1.1, -0.2), ma = 1.3, lag.max = 19)
pacf_empir <- stats::pacf(E_autogenr_large, plot = FALSE, lag.max = 19)$acf
#
par(mfrow = c(2, 1))
# ACF:
plot(0:19, acf_theor, type = "h", col = "red", pch = 21, 
     ylim = c(min(acf_theor, acf_empir), max(acf_theor, acf_empir)))
lines(0:19, acf_theor, col = "red")
points(0:19, acf_theor, col = "red", pch = 19)
#
abline(h = 0, lty = 2)
#
lines(0:19, acf_empir, type = "h", col = "darkgreen")
points(0:19, acf_empir, col = "darkgreen", pch = 19)
lines(0:19, acf_empir, col = "darkgreen")
#
legend("topright", legend = c("Theoretical ACF", "Empirical ACF, sample size N = 1500"), 
       pch = 19, col = c("red", "darkgreen"))
# PACF:
plot(1:19, pacf_theor, type = "h", col = "red", pch = 21, 
     ylim = c(min(pacf_theor, pacf_empir), max(pacf_theor, pacf_empir)))
lines(1:19, pacf_theor, col = "red")
points(1:19, pacf_theor, col = "red", pch = 19)
#
abline(h = 0, lty = 2)
#
lines(1:19, pacf_empir, type = "h", col = "darkgreen")
points(1:19, pacf_empir, col = "darkgreen", pch = 19)
lines(1:19, pacf_empir, col = "darkgreen")
#
legend("topright", legend = c("Theoretical PACF", "Empirical PACF, sample size N = 1500"), 
       pch = 19, col = c("red", "darkgreen"))

#
#
#
#
#
fig = plt.figure(figsize = (10, 8))
# ACF:
ax1 = fig.add_subplot(211)
_ = ax1.stem(E_auto_spec.acf(lags = 20), use_line_collection = True,
             label = "Theoretical ACF", basefmt = "white", 
             linefmt = "r-", markerfmt = "ro-")
#             
_ = ax1.stem(smt.stattools.acf(E_autogenr_large, nlags = 19), use_line_collection = True,
             label = "Empirical ACF, sample size N = 1500", basefmt = "white", 
             linefmt = "g-", markerfmt = "go-")
#
_ = ax1.axhline(y = 0, color = "black")
#
_ = plt.legend()
#
# PACF:
_ = ax2 = fig.add_subplot(212)
_ = ax2.stem(E_auto_spec.pacf(lags = 20), 
             label = "Theoretical PACF", use_line_collection = True,
             basefmt = "white", linefmt = "r-", markerfmt = "ro-")
#             
_ = ax2.stem(smt.stattools.pacf(E_autogenr_large, nlags = 19, method = "ywm"), 
             label = "Empirical PACF, sample size N = 1500", basefmt = "white", use_line_collection = True,
             linefmt = "g-", markerfmt = "go-")
#
_ = ax2.axhline(y = 0, color = "black")
#
_ = plt.legend()
plt.show()

The following result is important to note: the larger the sample size, the closer the estimates to the true values.

2.3.2 Exercise Set 2: Model Estimation

2.3.2.1 Task 5

We will estimate the models with the exact order that we used to generate the data.

Note that in this task the focus is on whether the coefficients are close to the true values, if we were to somehow correctly identify the true underlying lag order. Unfortunately, in empirical applications we almost never know the true lag order.

  • \((A)\) \(Y_t = 1 + \epsilon_t + 0.5 \epsilon_{t-1}\)
#
A_mdl_1 <- forecast::Arima(A_data, order = c(0, 0, 1), include.constant = TRUE)
A_spec_1 = smt.arima_model.ARMA(A_data, order = (0, 1))
A_mdl_1 = A_spec_1.fit(trend = 'c')
print(summary(A_mdl_1))
## Series: A_data 
## ARIMA(0,0,1) with non-zero mean 
## 
## Coefficients:
##          ma1    mean
##       0.4651  0.9829
## s.e.  0.0933  0.0566
## 
## sigma^2 estimated as 0.2277:  log likelihood=-100.98
## AIC=207.96   AICc=208.12   BIC=216.99
## 
## Training set error measures:
##                         ME     RMSE       MAE      MPE     MAPE      MASE        ACF1
## Training set -5.818588e-05 0.473997 0.3701291 3.276435 87.11847 0.7966185 -0.06266759
##                         ME     RMSE       MAE      MPE     MAPE      MASE        ACF1
## Training set -5.818588e-05 0.473997 0.3701291 3.276435 87.11847 0.7966185 -0.06266759
print(A_mdl_1.summary())
##                               ARMA Model Results                              
## ==============================================================================
## Dep. Variable:                      y   No. Observations:                  150
## Model:                     ARMA(0, 1)   Log Likelihood                -109.740
## Method:                       css-mle   S.D. of innovations              0.502
## Date:                Tue, 06 Apr 2021   AIC                            225.481
## Time:                        23:17:12   BIC                            234.513
## Sample:                             0   HQIC                           229.150
##                                                                               
## ==============================================================================
##                  coef    std err          z      P>|z|      [0.025      0.975]
## ------------------------------------------------------------------------------
## const          1.0058      0.062     16.178      0.000       0.884       1.128
## ma.L1.y        0.5190      0.081      6.384      0.000       0.360       0.678
##                                     Roots                                    
## =============================================================================
##                   Real          Imaginary           Modulus         Frequency
## -----------------------------------------------------------------------------
## MA.1           -1.9267           +0.0000j            1.9267            0.5000
## -----------------------------------------------------------------------------

Note that, in order to reduce the output size, we can print the coefficients only:

print(lmtest::coeftest(A_mdl_1))
## 
## z test of coefficients:
## 
##           Estimate Std. Error z value  Pr(>|z|)    
## ma1       0.465110   0.093316  4.9842  6.22e-07 ***
## intercept 0.982894   0.056585 17.3702 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Since by default the summary() function does not return the \(p\)-values associated with the parameter significance for \(ARMA\) models in R, so we used lmtest::coeftest().

However, since the arima() function in R uses the maximum likelihood estimation, the coefficients are asymptotically normal. Consequently, the lmtest::coeftest() returns the \(z\)-statistic and the associated \(p\)-values.

print(A_mdl_1.summary().tables[1])
## ==============================================================================
##                  coef    std err          z      P>|z|      [0.025      0.975]
## ------------------------------------------------------------------------------
## const          1.0058      0.062     16.178      0.000       0.884       1.128
## ma.L1.y        0.5190      0.081      6.384      0.000       0.360       0.678
## ==============================================================================

To make it easier to focus on the output, we will only print the resulting coefficient values for the remaining cases in this task.

  • \((B)\) \(Y_t = 1 + \epsilon_t + 1.3 \epsilon_{t-1} - 0.4 \epsilon_{t-2}\)
#
B_mdl_1 <- forecast::Arima(B_data, order = c(0, 0, 2), include.constant = TRUE)
B_spec_1 = smt.arima_model.ARMA(B_data, order = (0, 2))
B_mdl_1 = B_spec_1.fit(trend = 'c')
print(lmtest::coeftest(B_mdl_1))
## 
## z test of coefficients:
## 
##            Estimate Std. Error z value  Pr(>|z|)    
## ma1        0.328716   0.078560  4.1843 2.861e-05 ***
## ma2       -0.326702   0.084348 -3.8733 0.0001074 ***
## intercept  0.980641   0.059636 16.4437 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
print(B_mdl_1.summary().tables[1])
## ==============================================================================
##                  coef    std err          z      P>|z|      [0.025      0.975]
## ------------------------------------------------------------------------------
## const          1.0113      0.073     13.819      0.000       0.868       1.155
## ma.L1.y        0.3912      0.080      4.878      0.000       0.234       0.548
## ma.L2.y       -0.2516      0.080     -3.127      0.002      -0.409      -0.094
## ==============================================================================

Note that the coefficients are not close to the true values (except for the constant and the overall coefficient signs). This is because our true process is not invertible. Hence, the software attempts to fit an invertible model, which results in different coefficients.

  • \((C)\) \(Y_t = 1 + 1.1 Y_{t-1} + \epsilon_t\)

In R, the forecast::Arima function will return an error if the sample indicates that a process is non-stationary:

C_mdl_1 <- forecast::Arima(C_data, order = c(1, 0, 0), include.constant = TRUE)

If we set method = "ML", then we can force R to use MLE (maximum likelihood estimation) instead. This is a slower estimation method, however it always returns a stationary model and if the true process is stationary - may give better (in terms of coefficient efficiency) estimates.

#
C_mdl_1 <- forecast::Arima(C_data, order = c(1, 0, 0), include.constant = TRUE, method = "ML")
C_spec_1 = smt.arima_model.ARMA(C_data, order = (1, 0))
C_mdl_1 = C_spec_1.fit(trend = 'c')
print(lmtest::coeftest(C_mdl_1))
## Warning in sqrt(diag(se)): NaNs produced
## 
## z test of coefficients:
## 
##             Estimate Std. Error z value  Pr(>|z|)    
## ar1       9.9936e-01 3.6535e-04  2735.4 < 2.2e-16 ***
## intercept 8.0685e+08         NA      NA        NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
print(C_mdl_1.summary().tables[1])
## ==============================================================================
##                  coef    std err          z      P>|z|      [0.025      0.975]
## ------------------------------------------------------------------------------
## const       1.327e+08   3.04e+09      0.044      0.965   -5.82e+09    6.08e+09
## ar.L1.y        0.9996      0.001    682.744      0.000       0.997       1.002
## ==============================================================================

Note the coefficient values for the non-stationary process.

  • \((D)\) \(Y_t = 1 + 1.1 Y_{t-1} - 0.2 Y_{t-2} + \epsilon_t\)
#
D_mdl_1 <- forecast::Arima(D_data, order = c(2, 0, 0), include.constant = TRUE)
D_spec_1 = smt.arima_model.ARMA(D_data, order = (2, 0))
D_mdl_1 = D_spec_1.fit(trend = 'c')
print(lmtest::coeftest(D_mdl_1))
## 
## z test of coefficients:
## 
##            Estimate Std. Error z value Pr(>|z|)    
## ar1        0.991885   0.081114 12.2283   <2e-16 ***
## ar2       -0.121086   0.081146 -1.4922   0.1356    
## intercept  9.927723   0.287315 34.5535   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
print(D_mdl_1.summary().tables[1])
## ==============================================================================
##                  coef    std err          z      P>|z|      [0.025      0.975]
## ------------------------------------------------------------------------------
## const         10.1996      0.306     33.313      0.000       9.599      10.800
## ar.L1.y        1.1009      0.080     13.796      0.000       0.945       1.257
## ar.L2.y       -0.2304      0.081     -2.849      0.004      -0.389      -0.072
## ==============================================================================

Note that the const coefficient of the model is close to the process mean. This is not the intercept (even though it is labeled as intercept in R)!

  • \((E)\) \(Y_t = 1.1 Y_{t-1} - 0.2 Y_{t-2} + \epsilon_t + 1.3 \epsilon_{t-1}\)
#
E_mdl_1 <- forecast::Arima(E_data, order = c(2, 0, 1), include.constant = FALSE)
E_spec_1 = smt.arima_model.ARMA(E_data, order = (2, 1))
E_mdl_1 = E_spec_1.fit(trend = 'nc')
print(lmtest::coeftest(E_mdl_1))
## 
## z test of coefficients:
## 
##     Estimate Std. Error z value Pr(>|z|)    
## ar1 0.849206   0.088156  9.6330   <2e-16 ***
## ar2 0.019357   0.088083  0.2198   0.8261    
## ma1 0.955942   0.036991 25.8425   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
print(E_mdl_1.summary().tables[1])
## ==============================================================================
##                  coef    std err          z      P>|z|      [0.025      0.975]
## ------------------------------------------------------------------------------
## ar.L1.y        1.0817      0.090     12.046      0.000       0.906       1.258
## ar.L2.y       -0.2044      0.091     -2.249      0.025      -0.383      -0.026
## ma.L1.y        0.8161      0.049     16.537      0.000       0.719       0.913
## ==============================================================================

Note that that we used trend to decide whether to include a constant or not. c includes constant, nc - no constant.

We can also see the roots for stationarity and invertibility. Again, the fitted model attempts to fit a stationary and invertible process, hence the MA coefficient is different for this process (which is generated from a non-invertible data generating process).

We can also extract specific information about the model:

#
print("Included trend (this is actually the constant): ", E_mdl_1.k_trend)
## Included trend (this is actually the constant):  0
print("Included constant: ", E_mdl_1.k_constant)
## Included constant:  0
print(paste0("Number of AR parameters: ", E_mdl_1$arma[1]))
## [1] "Number of AR parameters: 2"
print(paste0("Number of MA parameters: ", E_mdl_1$arma[2]))
## [1] "Number of MA parameters: 1"
print("Number of AR parameters: ", E_mdl_1.k_ar)
## Number of AR parameters:  2
print("Number of MA parameters: " , E_mdl_1.k_ma)
## Number of MA parameters:  1
#
print("Included exogenous variables: ", E_mdl_1.k_exog)
## Included exogenous variables:  0

as well as specific parameters:

E_mdl_1_coefmat <- lmtest::coeftest(E_mdl_1)[, 1:dim(lmtest::coeftest(E_mdl_1))[2]]
print(paste0("All of the parameters: ", paste0(E_mdl_1_coefmat[, "Estimate"], collapse = ", ")))
## [1] "All of the parameters: 0.849205786332935, 0.0193571085494739, 0.95594224634259"
print(paste0("All of the p-values: ", paste0(E_mdl_1_coefmat[, "Pr(>|z|)"], collapse = ", ")))
## [1] "All of the p-values: 5.80260158765536e-22, 0.826057697486593, 2.95779260381029e-147"
print(paste0("Only AR parameters: ", paste0(coef(E_mdl_1)[grepl("^ar", names(coef(E_mdl_1)))], collapse = ", ")))
## [1] "Only AR parameters: 0.849205786332935, 0.0193571085494739"
print(paste0("Only MA parameters: ", paste0(coef(E_mdl_1)[grepl("^ma", names(coef(E_mdl_1)))], collapse = ", ")))
## [1] "Only MA parameters: 0.95594224634259"
print("All of the parameters: ", E_mdl_1.params)
## All of the parameters:  [ 1.08174983 -0.20442291  0.81605632]
print("All of the p-values: ", E_mdl_1.pvalues)
## All of the p-values:  [2.02459484e-33 2.45292519e-02 1.99072901e-61]
print("Only AR parameters: ", E_mdl_1.arparams)
## Only AR parameters:  [ 1.08174983 -0.20442291]
print("Only MA parameters: ", E_mdl_1.maparams)
## Only MA parameters:  [0.81605632]

Considerations

Interestingly, on the larger sample size the autoregressive parameters are indeed close to the true values:

print(forecast::Arima(E_autogenr_large, order = c(2, 0, 1), include.constant = FALSE))
## Series: E_autogenr_large 
## ARIMA(2,0,1) with zero mean 
## 
## Coefficients:
##          ar1      ar2     ma1
##       1.0790  -0.1854  0.7790
## s.e.  0.0305   0.0305  0.0201
## 
## sigma^2 estimated as 0.4135:  log likelihood=-1466.6
## AIC=2941.21   AICc=2941.23   BIC=2962.46
print(smt.arima_model.ARMA(E_autogenr_large, order = (2, 1)).fit(trend = 'nc').summary())
##                               ARMA Model Results                              
## ==============================================================================
## Dep. Variable:                      y   No. Observations:                 1500
## Model:                     ARMA(2, 1)   Log Likelihood               -1432.734
## Method:                       css-mle   S.D. of innovations              0.628
## Date:                Tue, 06 Apr 2021   AIC                           2873.469
## Time:                        23:17:14   BIC                           2894.722
## Sample:                             0   HQIC                          2881.386
##                                                                               
## ==============================================================================
##                  coef    std err          z      P>|z|      [0.025      0.975]
## ------------------------------------------------------------------------------
## ar.L1.y        1.1109      0.030     37.073      0.000       1.052       1.170
## ar.L2.y       -0.2139      0.030     -7.157      0.000      -0.272      -0.155
## ma.L1.y        0.7735      0.020     39.262      0.000       0.735       0.812
##                                     Roots                                    
## =============================================================================
##                   Real          Imaginary           Modulus         Frequency
## -----------------------------------------------------------------------------
## AR.1            1.1586           +0.0000j            1.1586            0.0000
## AR.2            4.0356           +0.0000j            4.0356            0.0000
## MA.1           -1.2929           +0.0000j            1.2929            0.5000
## -----------------------------------------------------------------------------

However, the moving-average parameters are not, since the process is non-invertible.

The estimated model, in Python, is: \[\widehat{Y}_t = 1.1109 \cdot \widehat{Y}_{t-1} - 0.2139 \cdot \widehat{Y}_{t-2} + e_t + 0.7735 \cdot e_{t-1},\]

where \(e_t = Y_t - 1.1109 \cdot \widehat{Y}_{t-1} + 0.2139 \cdot \widehat{Y}_{t-2} - 0.7735 \cdot e_{t-1}\) and \(e_0 = 0\).

We will see more on model fitting in the last task.

2.3.2.2 Task 6

We will now attempt to automatically determine the appropriate lag order and estimate the unknown parameters using the built-in functions in R and Python. For this example, we will simply use the \(AIC\).

Note that an \(AICc\), or a \(BIC\) information criterion would be better, as those put penalize models with more parameters.

In this specific task, we chose \(AIC\) to highlight just how much the model order could differ from the true one. In real empirical examples, it is recommended to use \(BIC\), or \(AICc\).

Note: for this task in R, we will use stationary = TRUE to only search for stationary models. Generally, if a process is found to be non-stationary, R will try to differentiate the series (i.e. calculate \(\Delta Y_t\)) an then try to fit a stationary model for the differences. We will examine more about this in later chapters.

Note the documentation: This method can be used to tentatively identify the order of an ARMA process, provided that the time series is stationary and invertible.

  • \((A)\) \(Y_t = 1 + \epsilon_t + 0.5 \epsilon_{t-1}\)
A_est_auto <- forecast::auto.arima(A_data, ic = "aic", allowmean = TRUE, stationary = TRUE)
A_est_auto = sm.tsa.stattools.arma_order_select_ic(A_data, ic = 'aic', trend = 'c')
print(paste0("(A) process order: ", paste0(forecast::arimaorder(A_est_auto)[c("p", "q")], collapse = ", ")))
## [1] "(A) process order: 1, 2"
print("(A) process order: ", A_est_auto.aic_min_order)
## (A) process order:  (4, 2)

The automatically selected model is:

#
print(lmtest::coeftest(A_est_auto))
## 
## z test of coefficients:
## 
##            Estimate Std. Error z value  Pr(>|z|)    
## ar1       -0.747592   0.116093 -6.4396 1.198e-10 ***
## ma1        1.202935   0.155282  7.7468 9.425e-15 ***
## ma2        0.229716   0.140925  1.6301    0.1031    
## intercept  0.982063   0.052107 18.8469 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
A_mdl_auto = smt.arima_model.ARMA(A_data, order = A_est_auto.aic_min_order).fit(trend = 'c')
print(A_mdl_auto.summary().tables[1])
## ==============================================================================
##                  coef    std err          z      P>|z|      [0.025      0.975]
## ------------------------------------------------------------------------------
## const          1.0046      0.055     18.121      0.000       0.896       1.113
## ar.L1.y       -1.0083      0.080    -12.614      0.000      -1.165      -0.852
## ar.L2.y       -0.3666      0.113     -3.242      0.001      -0.588      -0.145
## ar.L3.y        0.1575      0.113      1.390      0.165      -0.065       0.379
## ar.L4.y       -0.2547      0.080     -3.176      0.001      -0.412      -0.097
## ma.L1.y        1.6032      0.035     45.238      0.000       1.534       1.673
## ma.L2.y        0.9999      0.042     24.066      0.000       0.918       1.081
## ==============================================================================

Considerations

If we wanted, we could restrict the models by specifying the maximum order of the \(MA\) or the \(AR\) (or both) parts:

A_est_auto <- forecast::auto.arima(A_data, ic = "aic", allowmean = TRUE, max.p = 0, stationary = TRUE)
A_est_auto = sm.tsa.stattools.arma_order_select_ic(A_data, ic = 'aic', trend = 'c', max_ar = 0)
print(paste0("(A) process order: ", paste0(forecast::arimaorder(A_est_auto)[c("p", "q")], collapse = ", ")))
## [1] "(A) process order: 0, 2"
print("(A) process order: ", A_est_auto.aic_min_order)
## (A) process order:  (0, 1)
  • \((B)\) \(Y_t = 1 + \epsilon_t + 1.3 \epsilon_{t-1} - 0.4 \epsilon_{t-2}\)
B_est_auto <- forecast::auto.arima(B_data, ic = "aic", allowmean = TRUE, stationary = TRUE)
B_est_auto = sm.tsa.stattools.arma_order_select_ic(B_data, ic = 'aic', trend = 'c')
print(paste0("(B) process order: ", paste0(forecast::arimaorder(B_est_auto)[c("p", "q")], collapse = ", ")))
## [1] "(B) process order: 1, 1"
print("(B) process order: ", B_est_auto.aic_min_order)
## (B) process order:  (4, 2)

The automatically selected model is:

#
print(lmtest::coeftest(B_est_auto))
## 
## z test of coefficients:
## 
##            Estimate Std. Error z value  Pr(>|z|)    
## ar1       -0.612249   0.077360 -7.9143 2.486e-15 ***
## ma1        0.964260   0.031071 31.0342 < 2.2e-16 ***
## intercept  0.980303   0.071183 13.7716 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
B_mdl_auto = smt.arima_model.ARMA(B_data, order = B_est_auto.aic_min_order).fit(trend = 'c')
print(B_mdl_auto.summary().tables[1])
## ==============================================================================
##                  coef    std err          z      P>|z|      [0.025      0.975]
## ------------------------------------------------------------------------------
## const          1.0099      0.071     14.219      0.000       0.871       1.149
## ar.L1.y       -1.0880      0.136     -7.984      0.000      -1.355      -0.821
## ar.L2.y       -0.5681      0.165     -3.447      0.001      -0.891      -0.245
## ar.L3.y       -0.0052      0.123     -0.042      0.966      -0.246       0.235
## ar.L4.y       -0.2750      0.091     -3.030      0.002      -0.453      -0.097
## ma.L1.y        1.5205      0.099     15.291      0.000       1.326       1.715
## ma.L2.y        0.8827      0.204      4.327      0.000       0.483       1.283
## ==============================================================================

Considerations

Even though our process is invertible - the suggested model, in some cases, may still result in an error when estimating in Python. For example, if we did not use burn-in, and tried to fit the model on the first \(N\) observations:

B_est_auto_tmp = sm.tsa.stattools.arma_order_select_ic(B_data_full[:N], ic = 'aic', trend = 'c')
B_mdl_auto_tmp = smt.arima_model.ARMA(B_data_full[:N], order = B_est_auto_tmp.aic_min_order).fit(trend = 'c')
## Error in py_call_impl(callable, dots$args, dots$keywords): ValueError: The computed initial MA coefficients are not invertible
## You should induce invertibility, choose a different model order, or you can
## pass your own start_params.
## 
## Detailed traceback: 
##   File "<string>", line 1, in <module>
##   File "C:\Users\Andrius\AppData\Local\Programs\Python\Python38\lib\site-packages\statsmodels\tsa\arima_model.py", line 1016, in fit
##     start_params = self._fit_start_params((k_ar, k_ma, k), method,
##   File "C:\Users\Andrius\AppData\Local\Programs\Python\Python38\lib\site-packages\statsmodels\tsa\arima_model.py", line 608, in _fit_start_params
##     start_params = self._fit_start_params_hr(order, start_ar_lags)
##   File "C:\Users\Andrius\AppData\Local\Programs\Python\Python38\lib\site-packages\statsmodels\tsa\arima_model.py", line 593, in _fit_start_params_hr
##     raise ValueError("The computed initial MA coefficients are not "
print(B_mdl_auto_tmp.summary().tables[1])
## Error in py_call_impl(callable, dots$args, dots$keywords): NameError: name 'B_mdl_auto_tmp' is not defined
## 
## Detailed traceback: 
##   File "<string>", line 1, in <module>

To make the error more compact, we will wrap the function in a try-catch block.

Note that in R we did not have any errors when searching for an appropriate model.

However, we did get an error, when trying to manually estimate a model for a subset of the data from \((C)\):

tryCatch({
  forecast::Arima(C_data_full[1:N], order = c(1, 0, 0), include.constant = TRUE)
}, error = function(e){
  print(e)
})
## <simpleError in stats::arima(x = x, order = order, seasonal = seasonal, include.mean = include.mean,     method = method, ...): non-stationary AR part from CSS>
try:
    B_mdl_auto_tmp = smt.arima_model.ARMA(B_data_full[:N], order = B_est_auto_tmp.aic_min_order).fit(trend = 'c')
    print(B_mdl_auto_tmp.summary().tables[1])
except ValueError as err:
        print('Handling run-time error:', err)
## Handling run-time error: The computed initial MA coefficients are not invertible
## You should induce invertibility, choose a different model order, or you can
## pass your own start_params.

We could try restricting the order of some of the lags. Since the error concerns invertibility - let’s restrict the maximum \(MA\) order to \(q_{\max} = 1\):

Note that in R we can restrict the \(MA\) coefficients by specifying max.q.

B_mdl_auto_tmp = sm.tsa.stattools.arma_order_select_ic(B_data_full[:N], ic = 'aic', trend = 'c', max_ma = 1)
print("(B) process order: ", B_mdl_auto_tmp.aic_min_order)
## (B) process order:  (1, 1)

Note that now the \(AR\) order is also different - restricting \(MA\), or \(AR\) terms results in the rest of the model parameters having a different lag order in some cases. This is because the automatic order selection tries various model combinations to find the best one (in terms of some specified criterion).

B_mdl_auto_tmp = smt.arima_model.ARMA(B_data_full[:N], order = B_mdl_auto_tmp.aic_min_order).fit(trend = 'c')
print(B_mdl_auto_tmp.summary().tables[1])
## ==============================================================================
##                  coef    std err          z      P>|z|      [0.025      0.975]
## ------------------------------------------------------------------------------
## const          1.0542      0.089     11.817      0.000       0.879       1.229
## ar.L1.y       -0.3954      0.101     -3.904      0.000      -0.594      -0.197
## ma.L1.y        0.8184      0.055     14.846      0.000       0.710       0.926
## ==============================================================================
  • \((C)\) \(Y_t = 1 + 1.1 Y_{t-1} + \epsilon_t\)
C_est_auto <- forecast::auto.arima(C_data, ic = "aic", allowmean = TRUE, stationary = TRUE)
C_est_auto = sm.tsa.stattools.arma_order_select_ic(C_data, ic='aic', trend='c')
print(paste0("(C) process order: ", paste0(forecast::arimaorder(C_est_auto)[c("p", "q")], collapse = ", ")))
## [1] "(C) process order: 0, 0"
print("(C) process order: ", C_est_auto.aic_min_order)
## (C) process order:  (2, 2)

Since our process is non-stationary, we might get an error when estimating. This depends on the package version, as well as the randomness of the data.

To make the error more compact, we will wrap the function in a try-catch block:

#
try:
    C_mdl_auto = smt.arima_model.ARMA(C_data, order = C_est_auto.aic_min_order).fit(trend = 'c')
    print(C_mdl_auto.summary().tables[1])
except ValueError as err:
        print('Handling run-time error:', err)
## Handling run-time error: The computed initial AR coefficients are not stationary
## You should induce stationarity, choose a different model order, or you can
## pass your own start_params.

So our suggestions are to either (a) induce stationarity; or (b) choose a different model order.

Since we know that this process is non-stationary, we may try to specify an \(\rm ARMA(2, 2)\) model and hope that the \(\rm AR\) coefficients approximately stationary, or a pure \(\rm MA\) model. We will go with a pure \(MA\) process, since they are always stationary (though they may not be the best fit on the data). We opt to choose an order automatically.

Considerations

We will examine how to deal with non-stationary processes in later lectures (to be more specific, we will examine the unit root case).

Note that for larger orders the calculations take a while, so we restrict to \(q_\max = 6\) (default max in Python is 2 for \(\rm MA\) and \(p_\max = 4\) for \(\rm AR\)).

C_est_auto <- forecast::auto.arima(C_data, ic = "aic", allowmean = TRUE, stationary = TRUE, max.p = 0, max.q = 6)
C_mdl_auto = sm.tsa.stattools.arma_order_select_ic(C_data, ic='aic', trend='c', max_ar = 0, max_ma = 6)
print(paste0("(C) process order: ", paste0(forecast::arimaorder(C_est_auto)[c("p", "q")], collapse = ", ")))
## [1] "(C) process order: 0, 0"
print("(C) process order: ", C_mdl_auto.aic_min_order)
## (C) process order:  (0, 6)
#
print(lmtest::coeftest(C_est_auto))
## 
## z test of coefficients:
## 
##            Estimate Std. Error z value  Pr(>|z|)    
## intercept 147172811   23680140   6.215 5.131e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The results are unchanged in R, since our previous model was an \(ARMA(0,0)\). This would suggest that a constant-only model is the best fit for our data. However, there is no guarantee that it is stationary - we still need to check the residuals!

C_mdl_auto = smt.arima_model.ARMA(C_data, order = C_mdl_auto.aic_min_order).fit(trend = 'c')
print(C_mdl_auto.summary().tables[1])
## ==============================================================================
##                  coef    std err          z      P>|z|      [0.025      0.975]
## ------------------------------------------------------------------------------
## const       1.327e+08    3.5e+07      3.789      0.000     6.4e+07    2.01e+08
## ma.L1.y        6.0000        nan        nan        nan         nan         nan
## ma.L2.y       15.0000        nan        nan        nan         nan         nan
## ma.L3.y       20.0000        nan        nan        nan         nan         nan
## ma.L4.y       15.0000   2.63e-05   5.71e+05      0.000      15.000      15.000
## ma.L5.y        6.0000        nan        nan        nan         nan         nan
## ma.L6.y        1.0000   3.54e-06   2.82e+05      0.000       1.000       1.000
## ==============================================================================

Note again that the non-stationarity of the process is quite a problem even looking at the results - some standard errors are nan in Python.

  • \((D)\) \(Y_t = 1 + 1.1 Y_{t-1} - 0.2 Y_{t-2} + \epsilon_t\)
D_est_auto <- forecast::auto.arima(D_data, ic = "aic", allowmean = TRUE, stationary = TRUE)
D_est_auto = sm.tsa.stattools.arma_order_select_ic(D_data, ic = 'aic', trend = 'c')
print(paste0("(D) process order: ", paste0(forecast::arimaorder(D_est_auto)[c("p", "q")], collapse = ", ")))
## [1] "(D) process order: 2, 1"
print("(D) process order: ", D_est_auto.aic_min_order)
## (D) process order:  (4, 2)

The automatically selected model is:

#
print(lmtest::coeftest(D_est_auto))
## 
## z test of coefficients:
## 
##           Estimate Std. Error z value Pr(>|z|)    
## ar1       0.049504   0.069359  0.7137   0.4754    
## ar2       0.718813   0.069363 10.3630   <2e-16 ***
## ma1       0.971401   0.027784 34.9626   <2e-16 ***
## intercept 9.917819   0.306341 32.3751   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
D_mdl_auto = smt.arima_model.ARMA(D_data, order = D_est_auto.aic_min_order).fit(trend = 'c')
print(D_mdl_auto.summary().tables[1])
## ==============================================================================
##                  coef    std err          z      P>|z|      [0.025      0.975]
## ------------------------------------------------------------------------------
## const         10.1960      0.296     34.463      0.000       9.616      10.776
## ar.L1.y       -0.3933      0.079     -4.980      0.000      -0.548      -0.239
## ar.L2.y        0.5553      0.063      8.813      0.000       0.432       0.679
## ar.L3.y        0.6931      0.064     10.766      0.000       0.567       0.819
## ar.L4.y       -0.3052      0.080     -3.816      0.000      -0.462      -0.148
## ma.L1.y        1.5996      0.033     49.043      0.000       1.536       1.663
## ma.L2.y        0.9997      0.037     26.676      0.000       0.926       1.073
## ==============================================================================
  • \((E)\) \(Y_t = 1.1 Y_{t-1} - 0.2 Y_{t-2} + \epsilon_t + 1.3 \epsilon_{t-1}\)

For this dataset, we will attempt to find the best model without a constant term. In practice, if we are unsure of whether we should include the mean or not - we could always try to include the mean and examine whether it is statistically significant.

E_est_auto <- forecast::auto.arima(E_data, ic = "aic", allowmean = FALSE, stationary = TRUE)
E_est_auto = sm.tsa.stattools.arma_order_select_ic(E_data, ic = 'aic', trend = 'nc')
print(paste0("(E) process order: ", paste0(forecast::arimaorder(E_est_auto)[c("p", "q")], collapse = ", ")))
## [1] "(E) process order: 1, 1"
print("(E) process order: ", E_est_auto.aic_min_order)
## (E) process order:  (2, 1)

Note that we would expect to have problems with \((B)\) and \((C)\), since the automatic model selection functions (as well as the parameter estimation assumptions) require stationarity and invertibility.

Note that in the case of \((A)\), in Python specifying max_ar = 0 returns a result without any warnings. This is a current drawback of this method as it does not automatically distinguish when an order of 0 is appropriate for either MA, or AR components. Update: after a package update some warnings may have disappeared.

Either way, in all cases the final suggested models are ARMA models, instead of pure AR or pure MA models.

The automatically selected model is:

#
print(lmtest::coeftest(E_est_auto))
## 
## z test of coefficients:
## 
##     Estimate Std. Error z value  Pr(>|z|)    
## ar1 0.866539   0.039792  21.777 < 2.2e-16 ***
## ma1 0.952700   0.037476  25.422 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
E_mdl_auto = smt.arima_model.ARMA(E_data, order = E_est_auto.aic_min_order).fit(trend = 'nc')
print(E_mdl_auto.summary().tables[1])
## ==============================================================================
##                  coef    std err          z      P>|z|      [0.025      0.975]
## ------------------------------------------------------------------------------
## ar.L1.y        1.0817      0.090     12.046      0.000       0.906       1.258
## ar.L2.y       -0.2044      0.091     -2.249      0.025      -0.383      -0.026
## ma.L1.y        0.8161      0.049     16.537      0.000       0.719       0.913
## ==============================================================================

Considerations

For the larger sample size:

E_large_est_auto <- forecast::auto.arima(E_autogenr_large, ic = "aic", allowmean = FALSE, stationary = TRUE)
print(paste0("(E) process with larger sample order: ", 
             paste0(forecast::arimaorder(E_large_est_auto)[c("p", "q")], collapse = ", ")))
## [1] "(E) process with larger sample order: 2, 1"
E_large_est_auto = sm.tsa.stattools.arma_order_select_ic(E_autogenr_large, ic='aic', trend='nc')
print("(E) process with larger sample order: ", 
      E_large_est_auto.aic_min_order)
## (E) process with larger sample order:  (2, 1)

The suggested process is actually the same order as the true underlying process. Note that because of the non-invertibility of the process this may not always be the case.

If the process was both stationary and invertible, then it is very likely that for a larger sample the automated order selection would suggest a model order, which is close to the true process, as long as the true underlying process is an \(\rm ARMA(p, q)\) process (which is stationary and invertible).

2.3.2.3 Task 7

We will show with \((A)\) since it is both stationary and invertible.

  • \((A)\) \(Y_t = 1 + \epsilon_t + 0.5 \epsilon_{t-1}\)

Rewriting as an \(AR(\infty)\) process yields \(Y_t = \alpha + \sum_{j = 1}^\infty (-1)^{j+1} \theta^j Y_{t-j} + \epsilon_t\). The first \(j = 1, ... , 10\) coefficients \((-1)^{j+1} \theta^j\) are equal to:

theta_coef <- 0.5
print((-1)^(1:10 + 1)*theta_coef^(1:10))
##  [1]  0.5000000000 -0.2500000000  0.1250000000 -0.0625000000  0.0312500000 -0.0156250000  0.0078125000 -0.0039062500  0.0019531250 -0.0009765625
theta_coef = 0.5
print(np.array(-1)**range(1 + 1, 11 + 1)*np.array(theta_coef)**range(1, 11))
## [ 0.5        -0.25        0.125      -0.0625      0.03125    -0.015625
##   0.0078125  -0.00390625  0.00195312 -0.00097656]

As we can see, the coefficients are decaying to zero. Hence after some lag \(j^*\), the coefficient \((-1)^{j+1} \approx 0\), \(\forall j > j^*\).

A manually specified arbitrary approximation could be, for example, an \(\rm AR(10)\) process:

#
A_mdl_2 <- forecast::Arima(A_data, order = c(10, 0, 0), include.constant = TRUE)
A_spec_2 = smt.arima_model.ARMA(A_data, order = (10, 0))
A_mdl_2 = A_spec_2.fit(trend = 'c')
print(summary(A_mdl_2))
## Series: A_data 
## ARIMA(10,0,0) with non-zero mean 
## 
## Coefficients:
##          ar1      ar2     ar3      ar4     ar5      ar6     ar7      ar8     ar9    ar10    mean
##       0.4194  -0.2873  0.2168  -0.2302  0.1488  -0.1227  0.1358  -0.0228  0.0180  0.0259  0.9826
## s.e.  0.0819   0.0890  0.0926   0.0952  0.0969   0.0994  0.0983   0.0965  0.0927  0.0846  0.0538
## 
## sigma^2 estimated as 0.2316:  log likelihood=-97.66
## AIC=219.32   AICc=221.59   BIC=255.44
## 
## Training set error measures:
##                         ME      RMSE       MAE      MPE     MAPE      MASE        ACF1
## Training set -0.0001044864 0.4632639 0.3684176 3.473363 86.57543 0.7929348 0.004850416
##                         ME      RMSE       MAE      MPE     MAPE      MASE        ACF1
## Training set -0.0001044864 0.4632639 0.3684176 3.473363 86.57543 0.7929348 0.004850416
print(A_mdl_2.summary().tables[0])
##                               ARMA Model Results                              
## ==============================================================================
## Dep. Variable:                      y   No. Observations:                  150
## Model:                    ARMA(10, 0)   Log Likelihood                -104.547
## Method:                       css-mle   S.D. of innovations              0.485
## Date:                Tue, 06 Apr 2021   AIC                            233.094
## Time:                        23:17:42   BIC                            269.222
## Sample:                             0   HQIC                           247.771
##                                                                               
## ==============================================================================
print(lmtest::coeftest(A_mdl_2))
## 
## z test of coefficients:
## 
##            Estimate Std. Error z value  Pr(>|z|)    
## ar1        0.419395   0.081882  5.1219 3.024e-07 ***
## ar2       -0.287303   0.089035 -3.2268  0.001252 ** 
## ar3        0.216785   0.092640  2.3401  0.019280 *  
## ar4       -0.230217   0.095180 -2.4187  0.015574 *  
## ar5        0.148844   0.096902  1.5360  0.124530    
## ar6       -0.122722   0.099436 -1.2342  0.217133    
## ar7        0.135801   0.098285  1.3817  0.167061    
## ar8       -0.022780   0.096475 -0.2361  0.813335    
## ar9        0.017985   0.092739  0.1939  0.846232    
## ar10       0.025864   0.084602  0.3057  0.759827    
## intercept  0.982574   0.053798 18.2642 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
print(A_mdl_2.summary().tables[1])
## ==============================================================================
##                  coef    std err          z      P>|z|      [0.025      0.975]
## ------------------------------------------------------------------------------
## const          1.0092      0.053     19.167      0.000       0.906       1.112
## ar.L1.y        0.5036      0.081      6.213      0.000       0.345       0.662
## ar.L2.y       -0.2746      0.092     -2.984      0.003      -0.455      -0.094
## ar.L3.y        0.1230      0.096      1.278      0.201      -0.066       0.312
## ar.L4.y       -0.2126      0.096     -2.213      0.027      -0.401      -0.024
## ar.L5.y        0.1443      0.098      1.465      0.143      -0.049       0.337
## ar.L6.y       -0.1727      0.099     -1.738      0.082      -0.368       0.022
## ar.L7.y        0.0480      0.099      0.486      0.627      -0.146       0.242
## ar.L8.y        0.0247      0.099      0.251      0.802      -0.168       0.218
## ar.L9.y       -0.0801      0.095     -0.842      0.400      -0.266       0.106
## ar.L10.y       0.1472      0.086      1.708      0.088      -0.022       0.316
## ==============================================================================

We see that \(\widehat{\phi}_j\) for \(j \geq 5\) are statistically insignificant, since their \(p\)-values are less than 0.05.

We can also get the roots of the estimated \(ARMA\) model.

a_ar_mdl_roots <- polyroot(c(1, - A_mdl_2$model$phi))
tmp <- data.frame(Real = Re(a_ar_mdl_roots),
                  Imaginary = Im(a_ar_mdl_roots),
                  Modulus = Mod(a_ar_mdl_roots),
                  row.names = paste0("AR.", 1:length(A_mdl_2$model$phi)))
print(round(tmp, 4))
##          Real Imaginary Modulus
## AR.1   0.9856    0.8600  1.3081
## AR.2  -0.9892    0.7831  1.2616
## AR.3  -0.9892   -0.7831  1.2616
## AR.4   0.9856   -0.8600  1.3081
## AR.5   0.5588    1.4510  1.5549
## AR.6  -0.2976    1.3156  1.3488
## AR.7  -0.2976   -1.3156  1.3488
## AR.8   1.2905    0.0000  1.2905
## AR.9   0.5588   -1.4510  1.5549
## AR.10 -2.5012    0.0000  2.5012
#
#
#
#
#
print(A_mdl_2.summary().tables[2])
##                                     Roots                                     
## ==============================================================================
##                    Real          Imaginary           Modulus         Frequency
## ------------------------------------------------------------------------------
## AR.1            -0.8607           -0.7460j            1.1390           -0.3863
## AR.2            -0.8607           +0.7460j            1.1390            0.3863
## AR.3            -1.3889           -0.0000j            1.3889           -0.5000
## AR.4            -0.2233           -1.2090j            1.2294           -0.2791
## AR.5            -0.2233           +1.2090j            1.2294            0.2791
## AR.6             0.4632           -1.1232j            1.2150           -0.1877
## AR.7             0.4632           +1.1232j            1.2150            0.1877
## AR.8             0.9515           -0.6509j            1.1529           -0.0955
## AR.9             0.9515           +0.6509j            1.1529            0.0955
## AR.10            1.2715           -0.0000j            1.2715           -0.0000
## ------------------------------------------------------------------------------

We can also try to automatically fit the best \(AR\) model:

A_mdl_2 <- forecast::auto.arima(A_data, ic = 'aic', allowmean = TRUE, stationary = TRUE, max.q = 0)
print(paste0("(A) process order: ", paste0(forecast::arimaorder(A_mdl_2)[c("p", "q")], collapse = ", ")))
## [1] "(A) process order: 2, 0"
A_est_auto = sm.tsa.stattools.arma_order_select_ic(A_data, ic = 'aic', trend = 'c', max_ma = 0)
print("(A) process order: ", A_est_auto.aic_min_order)
## (A) process order:  (2, 0)
#
A_mdl_2 <- forecast::auto.arima(A_data, ic = 'aic', allowmean = TRUE, stationary = TRUE, max.q = 0)
A_spec_2 = smt.arima_model.ARMA(A_data, order = A_est_auto.aic_min_order)
A_mdl_2 = A_spec_2.fit(trend = 'c')
print(A_mdl_2)
## Series: A_data 
## ARIMA(2,0,0) with non-zero mean 
## 
## Coefficients:
##          ar1      ar2    mean
##       0.3575  -0.1772  0.9845
## s.e.  0.0805   0.0805  0.0476
## 
## sigma^2 estimated as 0.2328:  log likelihood=-102.08
## AIC=212.17   AICc=212.44   BIC=224.21
print(lmtest::coeftest(A_mdl_2))
## 
## z test of coefficients:
## 
##            Estimate Std. Error z value  Pr(>|z|)    
## ar1        0.357521   0.080459  4.4435  8.85e-06 ***
## ar2       -0.177160   0.080533 -2.1998   0.02782 *  
## intercept  0.984549   0.047585 20.6904 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
print(A_mdl_2.summary2())
##                           Results: ARMA
## ==================================================================
## Model:              ARMA             BIC:                 241.3021
## Dependent Variable: y                Log-Likelihood:      -110.63 
## Date:               2021-04-06 23:17 Scale:               1.0000  
## No. Observations:   150              Method:              css-mle 
## Df Model:           3                Sample:              0       
## Df Residuals:       147                                   0       
## Converged:          1.0000           S.D. of innovations: 0.506   
## No. Iterations:     8.0000           HQIC:                234.152 
## AIC:                229.2596                                      
## --------------------------------------------------------------------
##             Coef.    Std.Err.      t      P>|t|     [0.025    0.975]
## --------------------------------------------------------------------
## const       1.0056     0.0571   17.6259   0.0000    0.8938    1.1175
## ar.L1.y     0.4661     0.0800    5.8262   0.0000    0.3093    0.6229
## ar.L2.y    -0.1890     0.0803   -2.3528   0.0186   -0.3464   -0.0316
## ---------------------------------------------------------------------------
##                Real           Imaginary          Modulus          Frequency
## ---------------------------------------------------------------------------
## AR.1          1.2331            -1.9418           2.3003            -0.1600
## AR.2          1.2331             1.9418           2.3003             0.1600
## ==================================================================

Note the different coefficients. This is because the lower level model is a sort of approximation of the higher order model.

We leave the remaining cases as exercises (might be added later).

Considerations

From our automatically simulated data, we can transform it into a pure AR, or pure MA process of different orders:

print(stats::ARMAtoMA(ar = c(1.1, -0.2), ma = 1.3, lag.max = 5))
## [1] 2.40000 2.44000 2.20400 1.93640 1.68924
print(E_auto_spec.arma2ma(lags = 6))
## [1.      2.4     2.44    2.204   1.9364  1.68924]
print(astsa::ARMAtoAR(ar = c(1.1, -0.2), ma = 1.3, lag.max = 5))
## [1] -2.400  3.320 -4.316  5.611 -7.294
print(E_auto_spec.arma2ar(lags = 6))
## [ 1.      -2.4      3.32    -4.316    5.6108  -7.29404]

For Python - looking at the documentation (here and here), this should be used with caution as it (as of 2019) has minimal documentation and no examples.

2.3.3 Exercise Set 3: Residual Diagnostics

Remember that our underlying assumption is that \(\epsilon_t \sim WN\). In other words the residuals:

  • must be zero-mean;
  • must have a constant variance;
  • must not be autocorrelated;

The zero-mean property is guaranteed by the estimation method (the proof is similar to the OLS estimation method, which minimizes the sum of squared residuals in the univariate cross-sectional regression case). We can examine the variance by checking for ARCH effects (a topic, which is covered separately). For the purpose of these exercises, we will examine the variance only visually.

We note that, unlike the forecast package in R, the statsmodels package in Python does not have the tsdisplay function. Fortunately, we can always create one ourselves for simplicity:

def tsdisplay(y, figsize = (10, 8), title = "", lags = 20):
    tmp_data = pd.Series(y)
    fig = plt.figure(figsize = figsize)
    #Plot the time series
    _ = tmp_data.plot(ax = fig.add_subplot(311), title = "$Time\ Series\ " + title + "$", legend = False)
    #Plot the ACF:
    _ = sm.graphics.tsa.plot_acf(tmp_data, lags = lags, zero = False, ax = fig.add_subplot(323))
    _ = plt.xticks(np.arange(1,  lags + 1, 1.0))
    #Plot the PACF:
    _ = sm.graphics.tsa.plot_pacf(tmp_data, lags = 20, zero = False, ax = fig.add_subplot(324))
    _ = plt.xticks(np.arange(1,  lags + 1, 1.0))
    #Plot the QQ plot of the data:
    _ = sm.qqplot(tmp_data, line='s', ax = fig.add_subplot(325)) 
    _ = plt.title("QQ Plot")
    #Plot the residual histogram:
    _ = fig.add_subplot(326).hist(tmp_data, bins = 40, density = 1)
    _ = plt.title("Histogram")
    #Fix the layout of the plots:
    _ = plt.tight_layout()
    plt.show()

Note that the stats::tsdiag() does not take into account the reduction in degrees of freedom, which results from using the residuals of an estimated model. More similar issues can be found here. We can specify our own version of tsdiag():

Furthermore, the tsdiag() function, which is available in R, does not have an analog in statsmodels. Nevertheless, we can create our own:

tsdiag2 <- function(y, title = "", lags = 30, df = 0){
  rez <- matrix(0, nrow = 4, ncol = lags)
  rownames(rez) <- c("Ljung-Box: X-squared", "Ljung-Box: p-value", "Box-Pierce: X-squared", "Box-Pierce: p-value")
  colnames(rez) <- 1:lags
  suppressWarnings({
    rez[1, ] <- sapply(1:lags, function(z){Box.test(y, lag = z, fitdf = df, type = "Ljung-Box")$statistic})
    rez[2, ] <- sapply(1:lags, function(z){Box.test(y, lag = z, fitdf = df, type = "Ljung-Box")$p.value})
    rez[3, ] <- sapply(1:lags, function(z){Box.test(y, lag = z, fitdf = df, type = "Box-Pierce")$statistic})
    rez[4, ] <- sapply(1:lags, function(z){Box.test(y, lag = z, fitdf = df, type = "Box-Pierce")$p.value})
  })
  # Fix for when df - fitdf = 0, since Box.test(y, lag = 0) and Box.test(y, lag = 1, fitdf = 1) do not always return NA
  rez[ c(2, 4), 1:lags == df] <- NA
  print(rez)
  # Plot Ljung-Box and Box-Pierce statistic p-values:
  plot(1:lags, rez[2, ], pch = 19, col = "blue", main = paste0("Time Series, with model fitted df = ", df),
       ylim = c(0, max(rez[c(2, 4),], 0.05, na.rm = TRUE)))
  points(1:lags, rez[4, ], pch = 19, col = "darkgreen")
  abline(h = 0.05, col = "red")
  axis(1, at = 1:lags, labels = 1:lags)
  legend("topleft", legend = c("Ljung-Box", "Box-Pierce", "5% critical value"),
         lty = c(NA, NA, 1), pch = c(19, 19, NA), col = c("blue", "darkgreen", "red"))
  # Return nothing:
  #return(NULL)
}
def tsdiag(y, fig_size = (10, 8), title = "", lags = 30, df = 0):
    tmp_data = pd.Series(y.copy()).reset_index(drop = True)
    tmp_data.index = list(range(1, len(tmp_data) + 1))
    # Carry out the Ljung-Box and Box-Pierce tests:
    tmp_acor = list(sm_stat.diagnostic.acorr_ljungbox(tmp_data, lags = lags, model_df = df, boxpierce = True))
    #
    col_index = ["Ljung-Box: X-squared", "Ljung-Box: p-value", "Box-Pierce: X-squared", "Box-Pierce: p-value"]
    rez = pd.DataFrame(tmp_acor, index = col_index, columns = range(1, len(tmp_acor[0]) + 1))
    print(rez)
    # Plot Ljung-Box and Box-Pierce statistic p-values:
    fig = plt.figure(figsize = fig_size)
    _ = plt.plot(range(1, len(tmp_acor[0]) + 1), tmp_acor[1], 'bo', label = "Ljung-Box values")
    _ = plt.plot(range(1, len(tmp_acor[0]) + 1), tmp_acor[3], 'go', label = "Box-Pierce values")
    _ = plt.xticks(np.arange(1,  len(tmp_acor[0]) + 1, 1.0))
    _ = plt.axhline(y = 0.05, color = "red", label = "5% critical value")
    _ = plt.title("$Time\ Series\ " + title + "$, with model fitted df = " + str(df))
    _ = plt.legend()
    plt.show()
    # Return nothing
    return None

where df is passed to the model_df in the acorr_ljungbox() function. As per the documentation, it is the number of degrees of freedom consumed by the model. In an \(ARMA(p, q)\) model, this value is usually equal to \(p + q\) (unless we also estimate an intercept). This value is subtracted from the degrees-of-freedom (dof) used in the test so that the adjusted dof for the statistics are ags - model_df.

Note: If lags - model_df <= 0, then NaN is returned.

IMPORTANT: this requires at leaststatsmodels 0.11.0! An older version of this package does not have the option to specify model_df!

2.3.3.1 Task 8, Task 9 and Task 10

To make it easier to follow, we will carry out the residual diagnostic tasks for each model separately, instead of doing each of the task for all models at once.

  • \((A)\) \(Y_t = 1 + \epsilon_t + 0.5 \epsilon_{t-1}\)

The residuals of our manually specified model (with the true underlying lag order):

forecast::tsdisplay(A_mdl_1$residuals, main = as.expression(bquote("Time Series: Residuals of"~Y[t]==1+epsilon[t]+0.5*epsilon[t-1])))

tsdisplay(A_mdl_1.resid, title = ":\ Residuals\ Of\ Y_t = 1 + \epsilon_t + 0.5 \epsilon_{t-1}")
## Traceback (most recent call last):
##   File "C:\Users\Andrius\AppData\Local\Programs\Python\Python38\lib\site-packages\matplotlib\backends\backend_qt5.py", line 480, in _draw_idle
##     self.draw()
##   File "C:\Users\Andrius\AppData\Local\Programs\Python\Python38\lib\site-packages\matplotlib\backends\backend_agg.py", line 407, in draw
##     self.figure.draw(self.renderer)
##   File "C:\Users\Andrius\AppData\Local\Programs\Python\Python38\lib\site-packages\matplotlib\artist.py", line 41, in draw_wrapper
##     return draw(artist, renderer, *args, **kwargs)
##   File "C:\Users\Andrius\AppData\Local\Programs\Python\Python38\lib\site-packages\matplotlib\figure.py", line 1863, in draw
##     mimage._draw_list_compositing_images(
##   File "C:\Users\Andrius\AppData\Local\Programs\Python\Python38\lib\site-packages\matplotlib\image.py", line 131, in _draw_list_compositing_images
##     a.draw(renderer)
##   File "C:\Users\Andrius\AppData\Local\Programs\Python\Python38\lib\site-packages\matplotlib\artist.py", line 41, in draw_wrapper
##     return draw(artist, renderer, *args, **kwargs)
##   File "C:\Users\Andrius\AppData\Local\Programs\Python\Python38\lib\site-packages\matplotlib\cbook\deprecation.py", line 411, in wrapper
##     return func(*inner_args, **inner_kwargs)
##   File "C:\Users\Andrius\AppData\Local\Programs\Python\Python38\lib\site-packages\matplotlib\axes\_base.py", line 2707, in draw
##     self._update_title_position(renderer)
##   File "C:\Users\Andrius\AppData\Local\Programs\Python\Python38\lib\site-packages\matplotlib\axes\_base.py", line 2636, in _update_title_position
##     if (ax.xaxis.get_ticks_position() in ['top', 'unknown']
##   File "C:\Users\Andrius\AppData\Local\Programs\Python\Python38\lib\site-packages\matplotlib\axis.py", line 2207, in get_ticks_position
##     self._get_ticks_position()]
##   File "C:\Users\Andrius\AppData\Local\Programs\Python\Python38\lib\site-packages\matplotlib\axis.py", line 1892, in _get_ticks_position
##     major = self.majorTicks[0]
## IndexError: list index out of range

We now want to test the null hypothesis of no autocorrelation: \[ \begin{aligned} H_0 &: \rho(1) = ... = \rho(k) = 0\\ H_1 &: \exists j: \rho(j) \neq 0 \end{aligned} \]

tsdiag2(A_mdl_1$residuals, lags = 20, df = sum(grepl("^ar|^ma", names(coef(A_mdl_1)))))
##                               1         2         3         4         5         6         7         8         9        10        11         12         13         14         15         16         17         18         19         20
## Ljung-Box: X-squared  0.6009447 1.3859901 2.4992287 4.8520838 5.0756260 5.2298353 5.5388280 6.2152166 6.4117377 7.1645985 7.4158427 10.1369025 10.6403396 10.8707988 13.6356228 15.5397188 15.7110603 16.0355597 17.1630659 20.1988345
## Ljung-Box: p-value           NA 0.2390834 0.2866153 0.1829543 0.2796255 0.3884796 0.4767763 0.5148566 0.6012135 0.6199869 0.6856895  0.5181164  0.5599668  0.6216404  0.4771950  0.4132833  0.4733022  0.5213155  0.5119218  0.3827038
## Box-Pierce: X-squared 0.5890840 1.3534703 2.4300892 4.6900684 4.9033159 5.0494089 5.3401060 5.9719953 6.1542945 6.8477190 7.0774752  9.5479111 10.0016668 10.2078672 12.6634674 14.3420784 14.4920022 14.7738043 15.7455366 18.3419177
## Box-Pierce: p-value          NA 0.2446724 0.2966968 0.1959504 0.2973625 0.4098801 0.5009887 0.5430231 0.6299533 0.6529705 0.7181111  0.5714513  0.6158144  0.6768544  0.5531768  0.4997593  0.5621108  0.6117704  0.6103011  0.4997182

tsdiag(A_mdl_1.resid, lags = 20, df = len(A_mdl_1.arparams) + len(A_mdl_1.maparams)) # len(A_mdl_1.params)
##                              1         2         3         4         5         6         7         8         9          10         11         12         13         14         15         16         17         18         19         20
## Ljung-Box: X-squared   0.111312  0.124826  0.131003  4.539037  4.819075  6.472439  6.919087  7.798845  9.293600  13.483090  13.483101  13.483240  14.289143  17.494535  17.677863  18.312421  21.129353  23.182770  26.608928  28.872834
## Ljung-Box: p-value          NaN  0.723858  0.936598  0.208835  0.306370  0.262922  0.328395  0.350665  0.318137   0.141937   0.197901   0.262917   0.282627   0.177671   0.221849   0.246597   0.173592   0.143395   0.086628   0.068012
## Box-Pierce: X-squared  0.109115  0.122273  0.128247  4.362280  4.629421  6.195766  6.615968  7.437848  8.824429  12.683170  12.683180  12.683306  13.409679  16.277662  16.440486  16.999899  19.464714  21.247945  24.200752  26.136987
## Box-Pierce: p-value         NaN  0.726582  0.937889  0.224908  0.327476  0.287633  0.357826  0.384756  0.357323   0.177474   0.241929   0.314527   0.339980   0.234461   0.287221   0.318870   0.245302   0.215396   0.148560   0.126417

Considerations

Note that the stats::tsdiag() does not take into account the reduction in degrees of freedom, which come from using the residuals of an estimated model:

stats::tsdiag(A_mdl_1, gof.lag = 20)

More similar issues can be found here. To account for this, as well as keep the \(Ljung-Box\) test p-value plots, we can use the astsa package:

tmp <- astsa::sarima(A_data, p = forecast::arimaorder(A_mdl_1)["p"], 
              d = 0,  q = forecast::arimaorder(A_mdl_1)["q"], details = TRUE, no.constant = FALSE)
## initial  value -0.674107 
## iter   2 value -0.736743
## iter   3 value -0.746494
## iter   4 value -0.746563
## iter   5 value -0.746564
## iter   6 value -0.746564
## iter   7 value -0.746564
## iter   8 value -0.746564
## iter   8 value -0.746564
## iter   8 value -0.746564
## final  value -0.746564 
## converged
## initial  value -0.745731 
## iter   2 value -0.745741
## iter   3 value -0.745742
## iter   3 value -0.745742
## iter   3 value -0.745742
## final  value -0.745742 
## converged

Although, it requires re-estimation of the model.

Alternatively, we have specified our own version of tsdiag(), which we will be using.

In Pythohn, for the first 16 lags we have that \(p-value > 0.05\), which means we have no grounds to reject the null hypothesis of no correlation. For larger lag values, \(p-value\) decreases and is very close to \(0.05\), though it occurs only at lag = 20.

Looking at the automatically-selected order model residuals we see similar results:

forecast::tsdisplay(A_est_auto$residuals, main = "Time Series: Residuals Of (A) automated procedure")

tsdisplay(A_mdl_auto.resid, title = ":\ Residuals\ Of\ (A)\ automated\ procedure")

tsdiag2(A_est_auto$residuals, lags = 20, df = sum(grepl("^ar|^ma", names(coef(A_est_auto)))))
##                                1         2         3         4         5         6         7         8         9        10        11        12        13        14         15         16         17         18         19         20
## Ljung-Box: X-squared  0.01572267 0.4110617 0.4820004 1.6843053 1.7666553 1.8495815 2.1803248 2.8017476 2.8269930 3.1224093 3.2555664 6.4365674 7.1547242 7.7270910 10.8373434 13.1606312 13.1936669 13.2041922 14.2118247 17.6055449
## Ljung-Box: p-value           NaN        NA 0.4875176 0.4307822 0.6222179 0.7633997 0.8236728 0.8332865 0.9005322 0.9264423 0.9532950 0.7773489 0.7864187 0.8060772  0.6244418  0.5139162  0.5873421  0.6577734  0.6520551  0.4819082
## Box-Pierce: X-squared 0.01541235 0.4003477 0.4689529 1.6237984 1.7023560 1.7809176 2.0920774 2.6726172 2.6960356 2.9681296 3.0898983 5.9779123 6.6251984 7.1373160  9.8997112 11.9478729 11.9767791 11.9859195 12.8543396 15.7568634
## Box-Pierce: p-value          NaN        NA 0.4934693 0.4440140 0.6364101 0.7759717 0.8362633 0.8486709 0.9116268 0.9363416 0.9606238 0.8171155 0.8285731 0.8484072  0.7021141  0.6104891  0.6807859  0.7449485  0.7458812  0.6095114

tsdiag(A_mdl_auto.resid, lags = 20, df = len(A_mdl_auto.arparams) + len(A_mdl_auto.maparams))
##                              1         2         3         4         5         6         7         8         9         10        11        12        13        14        15        16        17        18        19        20
## Ljung-Box: X-squared   0.002332  0.087243  0.245451  0.449782  1.170380  1.545317  1.711803  1.794964  1.880164  2.396379  2.525659  2.893903  3.110470  3.189565  3.518778  4.076212  4.891414  4.918303  5.338921  6.441716
## Ljung-Box: p-value          NaN       NaN       NaN       NaN       NaN       NaN  0.190752  0.407595  0.597648  0.663282  0.772626  0.822041  0.874598  0.921904  0.940145  0.943843  0.936311  0.960652  0.966955  0.954101
## Box-Pierce: X-squared  0.002286  0.084962  0.237966  0.434232  1.121644  1.476848  1.633476  1.711166  1.790200  2.265661  2.383884  2.718211  2.913406  2.984175  3.276568  3.767990  4.481292  4.504644  4.867150  5.810329
## Box-Pierce: p-value         NaN       NaN       NaN       NaN       NaN       NaN  0.201223  0.425035  0.617070  0.687028  0.793872  0.843292  0.892896  0.935347  0.952314  0.957207  0.953671  0.972517  0.978002  0.971048

Considerations

In comparison, the original data had significant autocorrelation from the very first lag (and further on):

tsdiag2(A_data, lags = 20, df = 0)
##                                  1            2            3            4            5           6           7           8           9          10          11          12          13         14         15         16         17         18         19          20
## Ljung-Box: X-squared  1.401034e+01 1.477750e+01 14.787220416 15.647933460 15.694417642 15.70020082 16.21794556 17.04776059 17.08994640 17.37515405 18.08502158 20.69821210 20.70501565 20.9475634 21.9249994 22.8750211 23.5331092 23.5335174 25.9231975 31.27670053
## Ljung-Box: p-value    1.818077e-04 6.181695e-04  0.002007815  0.003529867  0.007772904  0.01545684  0.02319808  0.02961558  0.04732592  0.06646412  0.07962866  0.05497819  0.07897171  0.1029838  0.1097856  0.1171166  0.1326861  0.1709121  0.1323433  0.05164192
## Box-Pierce: X-squared 1.373382e+01 1.448079e+01 14.490192375 15.316929904 15.361273367 15.36675217 15.85384096 16.62906290 16.66819580 16.93088705 17.58004223 19.95254415 19.95867629 20.1756927 21.0438102 21.8813293 22.4571564 22.4575109 24.5170378 29.09569174
## Box-Pierce: p-value   2.106269e-04 7.170294e-04  0.002308455  0.004087068  0.008925224  0.01758820  0.02648576  0.03421193  0.05417273  0.07590733  0.09185068  0.06798901  0.09624546  0.1246988  0.1354354  0.1470647  0.1677709  0.2123048  0.1770609  0.08589220

tsdiag(A_data, lags = 20)
##                               1          2          3          4          5          6          7          8          9          10         11         12         13         14         15         16         17         18         19         20
## Ljung-Box: X-squared   23.728816  23.732151  24.554599  28.436812  29.235744  31.091103  31.769012  31.798655  31.798666  34.321337  35.070795  35.215160  35.244828  36.900743  37.345752  39.152065  41.268670  41.268753  41.696871  41.894049
## Ljung-Box: p-value      0.000001   0.000007   0.000019   0.000010   0.000021   0.000024   0.000045   0.000101   0.000216   0.000163   0.000241   0.000433   0.000776   0.000764   0.001126   0.001034   0.000854   0.001398   0.001947   0.002855
## Box-Pierce: X-squared  23.260484  23.263731  24.059125  27.788093  28.550232  30.307940  30.945710  30.973403  30.973413  33.296926  33.982285  34.113353  34.140094  35.621703  36.016941  37.609348  39.461378  39.461449  39.830419  39.999058
## Box-Pierce: p-value     0.000001   0.000009   0.000024   0.000014   0.000028   0.000034   0.000064   0.000142   0.000299   0.000243   0.000364   0.000647   0.001146   0.001188   0.001758   0.001720   0.001541   0.002470   0.003444   0.004997

  • \((B)\) \(Y_t = 1 + \epsilon_t + 1.3 \epsilon_{t-1} - 0.4 \epsilon_{t-2}\)

The residuals of our manually specified model (with the true underlying lag order):

forecast::tsdisplay(B_mdl_1$residuals, main = as.expression(bquote("Time Series: Residuals of "~Y[t]==1+epsilon[t]+1.3*epsilon[t-1]+0.4*epsilon[t-2])))

tsdisplay(B_mdl_1.resid, title = ":\ Residuals\ Of\ Y_t = 1 + \epsilon_t + 1.3 \epsilon_{t-1} - 0.4 \epsilon_{t-2}")

tsdiag2(B_mdl_1$residuals, lags = 20, df = sum(grepl("^ar|^ma", names(coef(B_mdl_1)))))
##                               1         2         3         4         5         6         7         8         9        10        11        12        13        14         15         16         17         18         19         20
## Ljung-Box: X-squared  0.1123942 0.6441230 0.8498436 1.6158201 1.7093052 1.8330825 2.2325655 2.6810754 2.6948382 2.9021889 2.9542813 6.2092355 7.1384382 7.8094386 10.8527148 13.4033257 13.4856532 13.4865008 14.5921882 18.2662862
## Ljung-Box: p-value          NaN        NA 0.3565966 0.4457888 0.6348670 0.7664245 0.8161184 0.8476775 0.9117255 0.9403448 0.9660800 0.7973884 0.7877642 0.7998392  0.6231547  0.4950400  0.5648402  0.6369092  0.6248297  0.4382444
## Box-Pierce: X-squared 0.1101759 0.6279118 0.8268653 1.5626059 1.6517857 1.7690485 2.1448778 2.5638806 2.5766474 2.7676282 2.8152653 5.7704212 6.6079263 7.2082950  9.9112048 12.1597696 12.2318062 12.2325423 13.1854703 16.3277909
## Box-Pierce: p-value         NaN        NA 0.3631805 0.4578091 0.6477056 0.7781400 0.8287570 0.8612510 0.9212157 0.9480832 0.9711708 0.8341685 0.8298881 0.8435470  0.7011789  0.5934737  0.6614051  0.7278228  0.7236841  0.5696824

tsdiag(B_mdl_1.resid, lags = 20, df = len(B_mdl_1.arparams) + len(B_mdl_1.maparams))
##                              1         2         3         4         5         6         7         8         9          10         11         12         13         14         15         16         17         18         19         20
## Ljung-Box: X-squared   0.047104  0.590007  0.622402  4.638975  4.662342  5.887556  6.633527  7.703376  9.089350  13.733231  13.753438  13.792274  14.380635  17.884187  17.982541  18.651011  21.447878  23.417510  27.109541  29.309129
## Ljung-Box: p-value          NaN       NaN  0.430156  0.098324  0.198259  0.207704  0.249353  0.260650  0.246305   0.088989   0.131370   0.182679   0.212642   0.119253   0.158184   0.178717   0.123119   0.103021   0.056481   0.044730
## Box-Pierce: X-squared  0.046174  0.574790  0.606119  4.464144  4.486435  5.647164  6.348966  7.348429  8.634103  12.911362  12.929840  12.965100  13.495398  16.630156  16.717509  17.306818  19.754077  21.464547  24.646495  26.527721
## Box-Pierce: p-value         NaN       NaN  0.436253  0.107306  0.213503  0.227093  0.273728  0.289823  0.280004   0.114936   0.165807   0.225630   0.262182   0.164047   0.212542   0.240196   0.181571   0.161337   0.102906   0.088290

The residuals from the automated procedure:

forecast::tsdisplay(B_est_auto$residuals, main = "Time Series: Residuals Of (B) automated procedure")

tsdisplay(B_mdl_auto.resid, title = ":\ Residuals\ Of\ (B)\ automated\ procedure")

tsdiag2(B_est_auto$residuals, lags = 20, df = sum(grepl("^ar|^ma", names(coef(B_est_auto)))))
##                                1         2         3         4         5         6         7         8         9        10        11        12        13        14        15        16         17         18         19         20
## Ljung-Box: X-squared  0.03282845 0.6722802 0.8143308 0.8278391 1.3408265 1.4466935 1.4749914 2.5459315 2.6915384 3.4039713 3.8030374 6.1417021 6.2763201 6.2880583 8.1509414 9.8296086 10.0302293 10.3042081 11.3544104 15.5378610
## Ljung-Box: p-value           NaN        NA 0.3668429 0.6610541 0.7194620 0.8360402 0.9159376 0.8632922 0.9119972 0.9065133 0.9238912 0.8032208 0.8543058 0.9008704 0.8336295 0.7745319  0.8178341  0.8502663  0.8375581  0.6247652
## Box-Pierce: X-squared 0.03218052 0.6548046 0.7921824 0.8051575 1.2945205 1.3948156 1.4214379 2.4219215 2.5569910 3.2131792 3.5781147 5.7013761 5.8227093 5.8332120 7.4877463 8.9676239  9.1431671  9.3810961 10.2862046 13.8641558
## Box-Pierce: p-value          NaN        NA 0.3734410 0.6685937 0.7304346 0.8450956 0.9219539 0.8771015 0.9227448 0.9202757 0.9369269 0.8396973 0.8849277 0.9242550 0.8752999 0.8331190  0.8699134  0.8968934  0.8911812  0.7379028

tsdiag(B_mdl_auto.resid, lags = 20, df = len(B_mdl_auto.arparams) + len(B_mdl_auto.maparams))
##                              1         2         3         4         5         6         7         8         9         10        11        12        13        14        15        16        17        18        19         20
## Ljung-Box: X-squared   0.013248  0.326687  1.008425  1.474426  3.290342  3.542407  3.574536  4.113290  4.913923  5.223662  5.248268  6.297730  6.864674  6.986136  7.172068  8.321590  8.460358  8.746002  9.028129  10.331547
## Ljung-Box: p-value          NaN       NaN       NaN       NaN       NaN       NaN  0.058672  0.127882  0.178210  0.265109  0.386339  0.390678  0.443105  0.538129  0.619210  0.597456  0.671568  0.724459  0.770816   0.737564
## Box-Pierce: X-squared  0.012986  0.318177  0.977489  1.425096  3.157384  3.396183  3.426409  3.929719  4.672411  4.957697  4.980198  5.932999  6.443995  6.552672  6.717809  7.731203  7.852625  8.100684  8.343833   9.458598
## Box-Pierce: p-value         NaN       NaN       NaN       NaN       NaN       NaN  0.064161  0.140176  0.197418  0.291666  0.418302  0.430737  0.488964  0.585571  0.666473  0.655074  0.726442  0.777214  0.820501   0.800599

Both model residuals indicate similar results in terms of adequacy - the residuals do not exhibit any autocorrelation for lag values up to 20 in Python.

  • \((C)\) \(Y_t = 1 + 1.1 Y_{t-1} + \epsilon_t\)

The residuals of our manually specified model (with the true underlying lag order):

forecast::tsdisplay(C_mdl_1$residuals, main = as.expression(bquote("Time Series: Residuals Of "~Y[t]==1+1.1*Y[t-1]+epsilon[t])))

tsdisplay(C_mdl_1.resid, title = ":\ Residuals\ Of\ Y_t = 1 + 1.1 Y_{t-1} + \epsilon_t")

tsdiag2(C_mdl_1$residuals, lags = 20, df = sum(grepl("^ar|^ma", names(coef(C_mdl_1)))))
##                              1        2        3        4        5        6        7        8        9       10       11       12       13       14       15       16       17       18       19       20
## Ljung-Box: X-squared  124.4212 227.6557 313.2688 384.2277 443.0014 491.6448 531.8679 565.0938 592.5071 615.0934 633.6730 648.9287 661.4285 671.6452 679.9722 686.7367 692.2109 696.6211 700.1558 702.9713
## Ljung-Box: p-value          NA   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000
## Box-Pierce: X-squared 121.9656 222.4833 305.2802 373.4380 429.5051 475.5883 513.4298 544.4698 569.8992 590.7024 607.6929 621.5435 632.8098 641.9511 649.3468 655.3102 660.1001 663.9300 666.9763 669.3843
## Box-Pierce: p-value         NA   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000

tsdiag(C_mdl_1.resid, lags = 20, df = len(C_mdl_1.arparams) + len(C_mdl_1.maparams))
##                               1             2             3             4             5             6             7             8             9             10             11             12             13             14             15             16             17             18             19             20
## Ljung-Box: X-squared   97.187614  1.778538e+02  2.447787e+02  3.002756e+02  3.462693e+02  3.843617e+02  4.158857e+02  4.419505e+02  4.634793e+02  4.812403e+02   4.958727e+02   5.079086e+02   5.177905e+02   5.258869e+02   5.325043e+02   5.378975e+02   5.422788e+02   5.458245e+02   5.486813e+02   5.509712e+02
## Ljung-Box: p-value           NaN  1.425706e-40  7.030240e-54  8.671976e-65  1.120608e-73  6.953902e-81  1.073011e-86  2.376093e-91  4.777773e-95  5.961432e-98  3.363506e-100  6.596815e-102  3.614881e-103  4.616077e-104  1.181550e-104  5.349797e-105  3.862663e-105  4.080848e-105  5.874505e-105  1.086193e-104
## Box-Pierce: X-squared  95.269437  1.738129e+02  2.385363e+02  2.918425e+02  3.357181e+02  3.718056e+02  4.014630e+02  4.258131e+02  4.457839e+02  4.621427e+02   4.755236e+02   4.864509e+02   4.953577e+02   5.026018e+02   5.084791e+02   5.132337e+02   5.170673e+02   5.201464e+02   5.226085e+02   5.245670e+02
## Box-Pierce: p-value          NaN  1.087556e-39  1.594046e-52  5.796974e-63  2.124550e-71  3.525582e-78  1.354997e-83  6.916186e-88  2.959725e-91  7.262100e-94   7.465383e-96   2.481930e-97   2.155433e-98   4.100348e-99   1.477844e-99  8.954584e-100  8.266683e-100   1.072310e-99   1.828514e-99   3.880704e-99

The residuals clearly still exhibit autocorrelation - as indicated by the \(ACF\), \(PACF\) and \(Ljung-Box\) tests (p-value < 0.05).

The automatically selected model exhibits the same problems of autocorrelated residuals:

forecast::tsdisplay(C_est_auto$residuals, main = "Time Series: Residuals Of (C) automated procedure")

tsdisplay(C_mdl_auto.resid, title = ":\ Residuals\ Of\ (C)\ automated\ procedure")

tsdiag2(C_est_auto$residuals, lags = 20, df = sum(grepl("^ar|^ma", names(coef(C_est_auto)))))
##                              1        2        3        4        5        6        7        8        9       10       11       12       13       14       15       16       17       18       19       20
## Ljung-Box: X-squared  126.1612 230.8302 317.6241 389.5528 449.1213 498.4139 539.1657 572.8204 600.5798 623.4438 642.2449 657.6755 670.3123 680.6347 689.0419 695.8659 701.3830 705.8228 709.3763 712.2023
## Ljung-Box: p-value      0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000
## Box-Pierce: X-squared 123.6711 225.5857 309.5246 378.6140 435.4392 482.1375 520.4763 551.9169 577.6674 598.7264 615.9194 629.9288 641.3185 650.5544 658.0213 664.0372 668.8646 672.7202 675.7828 678.1998
## Box-Pierce: p-value     0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000

tsdiag(C_mdl_auto.resid, lags = 20, df = len(C_mdl_auto.arparams) + len(C_mdl_auto.maparams))
##                               1          2          3          4           5           6             7             8             9             10            11            12            13            14            15            16            17            18            19            20
## Ljung-Box: X-squared   29.404663  62.785773  79.709713  97.148617  108.045406  118.831454  1.261338e+02  1.332248e+02  1.382142e+02  1.430152e+02  1.464634e+02  1.497702e+02  1.521715e+02  1.544752e+02  1.561570e+02  1.577753e+02  1.589584e+02  1.601028e+02  1.609381e+02  1.617517e+02
## Ljung-Box: p-value           NaN        NaN        NaN        NaN         NaN         NaN  2.874316e-29  1.176553e-29  9.172601e-30  6.382885e-30  7.553504e-30  8.653132e-30  1.420255e-29  2.282704e-29  4.664508e-29  9.325086e-29  2.187137e-28  5.005214e-28  1.272958e-27  3.152625e-27
## Box-Pierce: X-squared  28.824307  61.326967  77.694199  94.444726  104.839688  115.058050  1.219280e+02  1.285525e+02  1.331808e+02  1.376029e+02  1.407561e+02  1.437583e+02  1.459227e+02  1.479838e+02  1.494776e+02  1.509042e+02  1.519395e+02  1.529332e+02  1.536531e+02  1.543490e+02
## Box-Pierce: p-value          NaN        NaN        NaN        NaN         NaN         NaN  2.393636e-28  1.216708e-28  1.115669e-28  9.199940e-29  1.235777e-28  1.612636e-28  2.913240e-28  5.162295e-28  1.131528e-27  2.428661e-27  5.983291e-27  1.439147e-26  3.780466e-26  9.676194e-26

  • \((D)\) \(Y_t = 1 + 1.1 Y_{t-1} - 0.2 Y_{t-2} + \epsilon_t\)

The residuals of our manually specified model (with the true underlying lag order):

forecast::tsdisplay(D_mdl_1$residuals, main = as.expression(bquote("Time Series: Residuals Of "~Y[t]==1+1.1*Y[t-1]-0.2*Y[t-1]+epsilon[t])))

tsdisplay(D_mdl_1.resid, title = ":\ Residuals\ Of\ Y_t = 1 + 1.1 Y_{t-1} - 0.2 Y_{t-2} + \epsilon_t")

tsdiag2(D_mdl_1$residuals, lags = 20, df = sum(grepl("^ar|^ma", names(coef(D_mdl_1)))))
##                                1         2         3         4         5         6         7         8         9        10        11        12        13        14         15         16         17         18         19         20
## Ljung-Box: X-squared  0.01139571 0.1081637 1.2117477 2.8405120 3.1638785 3.2318556 3.6763186 4.4847515 4.5588240 5.2242027 5.4799315 8.0742502 8.4542493 8.6588240 11.0516984 12.8440203 13.0673895 13.1704589 14.4733255 17.8223203
## Ljung-Box: p-value           NaN        NA 0.2709863 0.2416521 0.3670419 0.5198011 0.5968920 0.6113742 0.7136228 0.7333663 0.7906273 0.6215843 0.6721271 0.7317620  0.6064866  0.5388408  0.5970927  0.6602559  0.6333598  0.4674142
## Box-Pierce: X-squared 0.01117080 0.1053922 1.1726742 2.7371452 3.0456198 3.1100191 3.5281652 4.2834118 4.3521237 4.9649725 5.1988298 7.5541981 7.8966973 8.0797378 10.2049881 11.7850613 11.9805094 12.0700171 13.1928823 16.0571542
## Box-Pierce: p-value          NaN        NA 0.2788524 0.2544699 0.3846449 0.5395857 0.6191302 0.6383827 0.7384430 0.7613133 0.8166427 0.6722918 0.7225266 0.7788685  0.6770920  0.6235593  0.6805038  0.7391460  0.7231810  0.5885584

tsdiag(D_mdl_1.resid, lags = 20, df = len(D_mdl_1.arparams) + len(D_mdl_1.maparams))
##                              1         2         3         4         5         6         7         8         9          10         11         12         13         14         15         16         17         18         19         20
## Ljung-Box: X-squared   0.156269  0.643704  0.659390  3.471816  4.540305  5.178952  5.180896  7.187222  7.908971  11.238173  11.325804  11.331107  11.685370  13.430618  13.801404  14.378648  17.261784  18.323109  21.183329  23.395932
## Ljung-Box: p-value          NaN       NaN  0.416775  0.176240  0.208723  0.269424  0.394205  0.303880  0.340689   0.188566   0.254036   0.332310   0.387752   0.338533   0.387977   0.421899   0.303457   0.305319   0.218197   0.175830
## Box-Pierce: X-squared  0.153185  0.627793  0.642963  3.344372  4.363654  4.968688  4.970517  6.844847  7.514365  10.580735  10.660872  10.665686  10.984989  12.546527  12.875843  13.384729  15.907473  16.829151  19.294208  21.186566
## Box-Pierce: p-value         NaN       NaN  0.422640  0.187836  0.224779  0.290525  0.419489  0.335434  0.377354   0.226605   0.299672   0.384150   0.444522   0.402843   0.457447   0.496478   0.388228   0.396735   0.311909   0.270105

The residuals from the automated procedure:

forecast::tsdisplay(D_est_auto$residuals, main = "Time Series: Residuals Of (D) automated procedure")

tsdisplay(D_mdl_auto.resid, title = ":\ Residuals\ Of\ (D)\ automated\ procedure")

tsdiag2(D_est_auto$residuals, lags = 20, df = sum(grepl("^ar|^ma", names(coef(D_est_auto)))))
##                               1         2         3         4         5         6         7         8         9        10        11        12        13        14        15        16        17        18         19         20
## Ljung-Box: X-squared  0.1658588 0.5270103 0.5397261 0.9390750 1.0245543 1.0641166 1.0970349 2.3821178 2.5312818 3.1678999 3.6720324 5.8889019 6.0394658 6.1248719 8.0431473 9.5391665 9.7066792 9.7068043 10.9558239 14.3053791
## Ljung-Box: p-value          NaN       NaN        NA 0.3325158 0.5991297 0.7857428 0.8947424 0.7941341 0.8649504 0.8690478 0.8854360 0.7509839 0.8119367 0.8649038 0.7817495 0.7311139 0.7832988 0.8377662  0.8122038  0.6453798
## Box-Pierce: X-squared 0.1625852 0.5142328 0.5265303 0.9101154 0.9916582 1.0291383 1.0601075 2.2606454 2.3990147 2.9853735 3.4463893 5.4590735 5.5947791 5.6711951 7.3749265 8.6937856 8.8403592 8.8404678  9.9169255 12.7816766
## Box-Pierce: p-value         NaN       NaN        NA 0.3400838 0.6090657 0.7942018 0.9005488 0.8120293 0.8795939 0.8863520 0.9033103 0.7925958 0.8480824 0.8943779 0.8318754 0.7956572 0.8411391 0.8857100  0.8709305  0.7506743

tsdiag(D_mdl_auto.resid, lags = 20, df = len(D_mdl_auto.arparams) + len(D_mdl_auto.maparams))
##                              1         2         3         4         5         6         7         8         9         10        11        12        13        14        15        16        17        18        19        20
## Ljung-Box: X-squared   0.108974  0.504135  1.027775  1.028427  1.054902  1.099799  1.116354  1.395432  1.691489  1.994225  2.707352  2.967250  3.321214  3.364098  3.528036  3.658165  5.191096  5.192165  5.548731  6.473726
## Ljung-Box: p-value          NaN       NaN       NaN       NaN       NaN       NaN  0.290705  0.497721  0.638828  0.736821  0.744994  0.812946  0.853784  0.909477  0.939649  0.961453  0.921560  0.951250  0.961048  0.953104
## Box-Pierce: X-squared  0.106823  0.491585  0.998001  0.998626  1.023883  1.066417  1.081991  1.342709  1.617341  1.896176  2.548312  2.784272  3.103306  3.141676  3.287279  3.401997  4.743312  4.744240  5.051544  5.842658
## Box-Pierce: p-value         NaN       NaN       NaN       NaN       NaN       NaN  0.298253  0.511016  0.655464  0.754847  0.769202  0.835395  0.875284  0.925154  0.951809  0.970322  0.942958  0.965980  0.974041  0.970303

We see that the \(p-value\) of the Ljung-Box statistic is greater than 0.05 for the first 20 lags - we have no grounds to reject the null hypothesis of no autocorrelation in the residuals. So, both the specified underlying model and the automatically selected model is adequate.

Considerations

Note the histogram and Q-Q plot - since the first value of the series is quite different from the rest - one value (and hence, one residual) appears to be an outlier:

qqnorm(D_data_full[1:N], pch = 1, frame = FALSE)
qqline(D_data_full[1:N], col = "steelblue", lwd = 2)

fig =  sm.qqplot(D_data_full[:N], line = 's')
plt.show()

As mentioned at the beginning, generally, when simulating data, a number of the first values generated in the series are treated as burn-in observations - they are usually dropped from the beginning of the sample. For example, if we need to generate \(N\) observations, we generate \(N + m\) and drop the first \(m\) observations.

The more complex our model, the more burn-in observations we may need.

  • \((E)\) \(Y_t = 1.1 Y_{t-1} - 0.2 Y_{t-2} + \epsilon_t + 1.3 \epsilon_{t-1}\)

The residuals of our manually specified model (with the true underlying lag order):

forecast::tsdisplay(E_mdl_1$residuals, main = as.expression(bquote("Time Series: Residuals Of "~Y[t]==1.1*Y[t-1]-0.2*Y[t-2]+epsilon[t]+1.3*epsilon[t-1])))

tsdisplay(E_mdl_1.resid, title = ":\ Residuals\ Of\ Y_t = 1.1 Y_{t-1} - 0.2 Y_{t-2} + \epsilon_t + 1.3 \epsilon_{t-1}")

tsdiag2(E_mdl_1$residuals, lags = 20, df = sum(grepl("^ar|^ma", names(coef(E_mdl_1)))))
##                                1         2       3         4         5         6         7         8         9        10        11        12        13        14        15         16         17         18         19         20
## Ljung-Box: X-squared  0.02072419 0.9847243 1.18818 1.2516771 1.4897355 1.5725842 1.5853641 2.8771704 2.9985536 3.5226249 3.9087507 6.1812945 6.4085660 6.5279449 8.4734930 10.1097035 10.2524077 10.2551564 11.3053080 15.0080867
## Ljung-Box: p-value           NaN       NaN      NA 0.2632324 0.4747971 0.6656220 0.8114202 0.7189156 0.8090283 0.8328233 0.8652610 0.7216454 0.7798493 0.8359231 0.7471209  0.6849399  0.7435065  0.8033903  0.7902597  0.5948991
## Box-Pierce: X-squared 0.02031516 0.9589469 1.15571 1.2167006 1.4437957 1.5222840 1.5343071 2.7411262 2.8537251 3.3364224 3.6895243 5.7527548 5.9575982 6.0644109 7.7923648  9.2348135  9.3596797  9.3620667 10.2671316 13.4339818
## Box-Pierce: p-value          NaN       NaN      NA 0.2700091 0.4858293 0.6771368 0.8205475 0.7398215 0.8269693 0.8522356 0.8839935 0.7643859 0.8188130 0.8690321 0.8011386  0.7549869  0.8073425  0.8578382  0.8523106  0.7066839

tsdiag(E_mdl_1.resid, lags = 20, df = len(E_mdl_1.arparams) + len(E_mdl_1.maparams))
##                              1         2         3         4         5         6         7         8         9          10         11         12         13         14         15         16         17         18         19         20
## Ljung-Box: X-squared   1.422860  2.118465  2.223336  5.791265  6.984317  7.318775  7.382975  9.448726  9.555712  11.510251  11.517790  11.785960  11.786103  12.706695  13.392113  13.801157  15.281182  15.667069  17.945506  20.219814
## Ljung-Box: p-value          NaN       NaN       NaN  0.016106  0.030435  0.062402  0.116982  0.092450  0.144652   0.117859   0.174050   0.225647   0.299627   0.312927   0.341196   0.387995   0.359208   0.404517   0.327098   0.263141
## Box-Pierce: X-squared  1.394777  2.072077  2.173498  5.600588  6.738697  7.055551  7.115950  9.045797  9.145040  10.945273  10.952168  11.195638  11.195766  12.019455  12.628214  12.988818  14.283840  14.618952  16.582605  18.527737
## Box-Pierce: p-value         NaN       NaN       NaN  0.017954  0.034412  0.070148  0.129886  0.107251  0.165584   0.141024   0.204424   0.262537   0.342471   0.362183   0.396632   0.448676   0.428786   0.479197   0.413090   0.356303

The residuals from the automated procedure:

forecast::tsdisplay(E_est_auto$residuals, main = "Time Series: Residuals Of (E) automated procedure")

tsdisplay(E_mdl_auto.resid, title = ":\ Residuals\ Of\ (E)\ automated\ procedure")

tsdiag2(E_est_auto$residuals, lags = 20, df = sum(grepl("^ar|^ma", names(coef(E_est_auto)))))
##                                 1        2         3         4         5         6         7         8         9        10        11        12        13        14        15         16         17         18         19         20
## Ljung-Box: X-squared  0.002864305 1.029821 1.1810809 1.2313281 1.4151872 1.5059632 1.5232033 2.8139283 2.9374130 3.4749329 3.8089809 6.1134070 6.3969261 6.5615942 8.6324907 10.3756389 10.4939449 10.4963693 11.4680730 15.1584310
## Ljung-Box: p-value            NaN       NA 0.2771361 0.5402820 0.7019787 0.8255846 0.9103767 0.8318122 0.8907310 0.9011264 0.9235290 0.8056469 0.8456120 0.8851752 0.8000989  0.7342367  0.7875740  0.8394545  0.8312233  0.6510614
## Box-Pierce: X-squared 0.002807773 1.002739 1.1490236 1.1972873 1.3726792 1.4586775 1.4748968 2.6807057 2.7952540 3.2903381 3.5958162 5.6879926 5.9435328 6.0908674 7.9301505  9.4668732  9.5703910  9.5724964 10.4099515 13.5661788
## Box-Pierce: p-value           NaN       NA 0.2837536 0.5495565 0.7119512 0.8339352 0.9159484 0.8477209 0.9032749 0.9148388 0.9359488 0.8407603 0.8771052 0.9114323 0.8481209  0.8000312  0.8458587  0.8879803  0.8855452  0.7569020

tsdiag(E_mdl_auto.resid, lags = 20, df = len(E_mdl_auto.arparams) + len(E_mdl_auto.maparams))
##                              1         2         3         4         5         6         7         8         9          10         11         12         13         14         15         16         17         18         19         20
## Ljung-Box: X-squared   1.422860  2.118465  2.223336  5.791265  6.984317  7.318775  7.382975  9.448726  9.555712  11.510251  11.517790  11.785960  11.786103  12.706695  13.392113  13.801157  15.281182  15.667069  17.945506  20.219814
## Ljung-Box: p-value          NaN       NaN       NaN  0.016106  0.030435  0.062402  0.116982  0.092450  0.144652   0.117859   0.174050   0.225647   0.299627   0.312927   0.341196   0.387995   0.359208   0.404517   0.327098   0.263141
## Box-Pierce: X-squared  1.394777  2.072077  2.173498  5.600588  6.738697  7.055551  7.115950  9.045797  9.145040  10.945273  10.952168  11.195638  11.195766  12.019455  12.628214  12.988818  14.283840  14.618952  16.582605  18.527737
## Box-Pierce: p-value         NaN       NaN       NaN  0.017954  0.034412  0.070148  0.129886  0.107251  0.165584   0.141024   0.204424   0.262537   0.342471   0.362183   0.396632   0.448676   0.428786   0.479197   0.413090   0.356303

For the automatically selected model, the Box-Pierce test statistic of the first 20 lags is greater than 0.05, indicating no autocorrelation in the residuals for R. On the other hand, in Python we see significant residual autocorrelation, both in the manually specified (true underlying) model, as well as in the automatically selected model for the first couple lags. This highlights the case, where even knowing the true model does not help in capturing all of the autocorrelation in the data sample.

Considerations

How would our Ljung-Box and Box-Pierce statistics look like, if we were to ignore the reduction in degrees of freedom from estimating the parameters?

tsdiag2(E_est_auto$residuals, lags = 20, df = 0)
##                                 1         2         3         4         5         6         7         8         9        10        11        12        13        14        15         16         17         18         19         20
## Ljung-Box: X-squared  0.002864305 1.0298206 1.1810809 1.2313281 1.4151872 1.5059632 1.5232033 2.8139283 2.9374130 3.4749329 3.8089809 6.1134070 6.3969261 6.5615942 8.6324907 10.3756389 10.4939449 10.4963693 11.4680730 15.1584310
## Ljung-Box: p-value    0.957318232 0.5975542 0.7575454 0.8729142 0.9226448 0.9590975 0.9814955 0.9454862 0.9667246 0.9679427 0.9751752 0.9102576 0.9305072 0.9502948 0.8959364  0.8462894  0.8816309  0.9144949  0.9070881  0.7672715
## Box-Pierce: X-squared 0.002807773 1.0027389 1.1490236 1.1972873 1.3726792 1.4586775 1.4748968 2.6807057 2.7952540 3.2903381 3.5958162 5.6879926 5.9435328 6.0908674 7.9301505  9.4668732  9.5703910  9.5724964 10.4099515 13.5661788
## Box-Pierce: p-value   0.957741134 0.6057006 0.7652547 0.8785450 0.9272784 0.9621921 0.9831660 0.9527715 0.9718622 0.9737426 0.9802867 0.9309863 0.9481839 0.9641489 0.9265525  0.8929493  0.9206532  0.9449705  0.9420980  0.8517933

tsdiag(E_mdl_auto.resid, lags = 20, df = 0)
##                              1         2         3         4         5         6         7         8         9          10         11         12         13         14         15         16         17         18         19         20
## Ljung-Box: X-squared   1.422860  2.118465  2.223336  5.791265  6.984317  7.318775  7.382975  9.448726  9.555712  11.510251  11.517790  11.785960  11.786103  12.706695  13.392113  13.801157  15.281182  15.667069  17.945506  20.219814
## Ljung-Box: p-value     0.232933  0.346722  0.527367  0.215288  0.221809  0.292370  0.390122  0.305866  0.387634   0.319169   0.400957   0.463019   0.545257   0.549738   0.572039   0.613524   0.575236   0.615770   0.526081   0.444257
## Box-Pierce: X-squared  1.394777  2.072077  2.173498  5.600588  6.738697  7.055551  7.115950  9.045797  9.145040  10.945273  10.952168  11.195638  11.195766  12.019455  12.628214  12.988818  14.283840  14.618952  16.582605  18.527737
## Box-Pierce: p-value    0.237600  0.354858  0.537187  0.231028  0.240810  0.315740  0.416908  0.338447  0.423995   0.361798   0.447279   0.512231   0.594422   0.604740   0.630990   0.673575   0.646918   0.687943   0.618122   0.552686

We see that the \(p\)-values are higher. In the case when the \(p\)-values were close to, or less than, 0.05 - we see that ignoring the fact that we are examining residuals from an estimated model would result in, in Pythons case, of having no grounds to reject the null hypothesis of not autocorrelation!

In practice, this may also be the case for larger lag values as well. Consequently, we should always be mindful of how estimation of unknown parameters affects our test statistics (e.g. is it only the degrees of freedom, or maybe even the asymptotic distribution of the test statistic?)

2.3.4 Exercise Set 4: Model Forecasts

2.3.4.1 Task 11

We will compare the AIC and BIC values to select the best model.

The model with the lower AIC/BIC value is the preferred one. BIC penalizes model complexity more heavily.

  • \((A)\) \(Y_t = 1 + (1 + 0.5L)\epsilon_t\)
t(data.frame(AIC = c(AIC(A_mdl_1), AIC(A_est_auto)), BIC= c(BIC(A_mdl_1), BIC(A_est_auto)),
           row.names = c("Manual", "Auto")))
##       Manual     Auto
## AIC 207.9590 206.5234
## BIC 216.9909 218.5659
pd.DataFrame([[A_mdl_1.aic, A_mdl_auto.aic], [A_mdl_1.bic, A_mdl_auto.bic]], 
              index = ["AIC", "BIC"], columns = ["Manual", "Auto"])
##          Manual        Auto
## AIC  225.480802  218.557543
## BIC  234.512708  242.642625

The automatically specified model is better in terms of it its lower AIC. On the other hand, the \(BIC\) is lower of our manually specified model.

In this case, we should examine the residuals and maybe even forecasts in order to determine, which model is more appropriate.

However, since we automatically selected the model order in terms of lower \(AIC\) values and since the task requires to compare in terms of \(AIC\), we will opt to use the auto specified model.

  • \((B)\) \(Y_t = 1 + (1 + 1.3L - 0.4 L^2)\epsilon_t\)
t(data.frame(AIC = c(AIC(B_mdl_1), AIC(B_est_auto)), BIC= c(BIC(B_mdl_1), BIC(B_est_auto)),
           row.names = c("Manual", "Auto")))
##       Manual     Auto
## AIC 338.6367 334.8398
## BIC 350.6793 346.8824
pd.DataFrame([[B_mdl_1.aic, B_mdl_auto.aic], [B_mdl_1.bic, B_mdl_auto.bic]], 
              index = ["AIC", "BIC"], columns = ["Manual", "Auto"])
##          Manual        Auto
## AIC  361.861215  356.923319
## BIC  373.903756  381.008402

\(AIC\) and \(BIC\) indicate that the automatically specified model is better in R, while in Python the automatically specified model is better in terms of \(AIC\).

  • \((C)\) \((1 - 1.1L) Y_t = 1 + \epsilon_t\)
t(data.frame(AIC = c(AIC(C_mdl_1), AIC(C_est_auto)), BIC= c(BIC(C_mdl_1), BIC(C_est_auto)),
           row.names = c("Manual", "Auto")))
##       Manual     Auto
## AIC 5657.001 6344.111
## BIC 5666.033 6350.132
pd.DataFrame([[C_mdl_1.aic, C_mdl_auto.aic], [C_mdl_1.bic, C_mdl_auto.bic]], 
              index = ["AIC", "BIC"], columns = ["Manual", "Auto"])
##           Manual         Auto
## AIC  5626.358523  5283.491273
## BIC  5635.390429  5307.576355

The manually specified model is better in R since its \(AIC\) and \(BIC\) are lower. On the other hand, the automatically specified model is better in Python.

  • \((D)\) \((1 - 1.1L + 0.2L^2) Y_t = 1 + \epsilon_t\)
t(data.frame(AIC = c(AIC(D_mdl_1), AIC(D_est_auto)), BIC= c(BIC(D_mdl_1), BIC(D_est_auto)),
           row.names = c("Manual", "Auto")))
##       Manual     Auto
## AIC 209.7763 205.3518
## BIC 221.8188 220.4049
pd.DataFrame([[D_mdl_1.aic, D_mdl_auto.aic], [D_mdl_1.bic, D_mdl_auto.bic]], 
              index = ["AIC", "BIC"], columns = ["Manual", "Auto"])
##          Manual        Auto
## AIC  228.094785  219.919287
## BIC  240.137326  244.004369

Automatically specified model is better in R because its \(AIC\) and \(BIC\) are lower. In Python, the automatically specified model is better in terms of a lower \(AIC\) value.

  • \((E)\) \((1 - 1.1L + 0.2L^2) Y_t = (1 + 1.3L)\epsilon_t\)
t(data.frame(AIC = c(AIC(E_mdl_1), AIC(E_est_auto)), BIC= c(BIC(E_mdl_1), BIC(E_est_auto)),
           row.names = c("Manual", "Auto")))
##       Manual     Auto
## AIC 284.9765 283.0246
## BIC 297.0190 292.0565
pd.DataFrame([[E_mdl_1.aic, E_mdl_auto.aic], [E_mdl_1.bic, E_mdl_auto.bic]], 
              index = ["AIC", "BIC"], columns = ["Manual", "Auto"])
##          Manual        Auto
## AIC  309.910649  309.910649
## BIC  321.953190  321.953190

In R, the automatically specified model is better in terms of \(AIC\) and \(BIC\). In Python, the manually and automatically specified models are the same, so the results are equal.

In most cases, both the \(AIC\) suggest that the automatically specified model is better, except for \((A)\), where \(BIC\) was lower for the manually specified model, whereas the \(AIC\) was lower for the automatically specified model.

As we can see, we do not always get a clear-cut result of which model is better, especially if we have more than one criteria for model selection.

Since in practice we would not know the true lag order, the automatic model selection produced acceptable models.

Hence, we will analyse forecasts of the automatically specified models.

2.3.4.2 Task 12

From the results in the previous task - the automatically specified models have lower \(AIC\) and \(BIC\) values in most cases. Furthermore, from the residual diagnostics in tasks, the residuals of the automatically specified models are just as good (in terms of the significance of autocorrelation) as the residuals of the models with the true lag order specification.

  • \((A)\) \(Y_t = 1 + (1 + 0.5L)\epsilon_t\)

We will show how to get the predictions as values in this example. We will generally focus on the plots, so this is provided only as an example for this one instance.

The last fitted historical data point:

tail(A_est_auto$fitted, 1)
## Time Series:
## Start = 150 
## End = 150 
## Frequency = 1 
## [1] 0.9071897
A_mdl_auto.fittedvalues[-1]
## 1.1588294419910945

The forecasted values:

forecast::forecast(A_est_auto, h = 20)
##     Point Forecast     Lo 80    Hi 80       Lo 95    Hi 95
## 151      0.7609154 0.1545052 1.367326 -0.16650894 1.688340
## 152      1.0884837 0.4290083 1.747959  0.07990305 2.097064
## 153      0.9832625 0.3165811 1.649944 -0.03633867 2.002864
## 154      0.9832625 0.3165811 1.649944 -0.03633867 2.002864
## 155      0.9832625 0.3165811 1.649944 -0.03633867 2.002864
## 156      0.9832625 0.3165811 1.649944 -0.03633867 2.002864
## 157      0.9832625 0.3165811 1.649944 -0.03633867 2.002864
## 158      0.9832625 0.3165811 1.649944 -0.03633867 2.002864
## 159      0.9832625 0.3165811 1.649944 -0.03633867 2.002864
## 160      0.9832625 0.3165811 1.649944 -0.03633867 2.002864
## 161      0.9832625 0.3165811 1.649944 -0.03633867 2.002864
## 162      0.9832625 0.3165811 1.649944 -0.03633867 2.002864
## 163      0.9832625 0.3165811 1.649944 -0.03633867 2.002864
## 164      0.9832625 0.3165811 1.649944 -0.03633867 2.002864
## 165      0.9832625 0.3165811 1.649944 -0.03633867 2.002864
## 166      0.9832625 0.3165811 1.649944 -0.03633867 2.002864
## 167      0.9832625 0.3165811 1.649944 -0.03633867 2.002864
## 168      0.9832625 0.3165811 1.649944 -0.03633867 2.002864
## 169      0.9832625 0.3165811 1.649944 -0.03633867 2.002864
## 170      0.9832625 0.3165811 1.649944 -0.03633867 2.002864
A_mdl_auto.predict(start = N - 1, end = N + 19)
## array([1.15882944, 0.76989887, 0.96162531, 1.15440624, 0.88305414,
##        1.12529146, 0.96205815, 0.94601715, 1.12928241, 0.86298451,
##        1.10334808, 0.99156129, 0.92755158, 1.13874023, 0.87045166,
##        1.08193056, 1.01661081, 0.90891344, 1.14307541, 0.88231217,
##        1.05906824])

Note that if we start the forecast from a historical datapoint (remember that our sample size is \(N\) and indexes in Python start from 0, so effectively, our last observation is at index N-1), we actually get the fitted value.

To get only the forecasts, we can use the .forecast(...) method:

A_mdl_auto.forecast(steps = 20)[0]
## array([0.76989887, 0.96162531, 1.15440624, 0.88305414, 1.12529146,
##        0.96205815, 0.94601715, 1.12928241, 0.86298451, 1.10334808,
##        0.99156129, 0.92755158, 1.13874023, 0.87045166, 1.08193056,
##        1.01661081, 0.90891344, 1.14307541, 0.88231217, 1.05906824])

A reminder of the model coefficients for this model:

print(lmtest::coeftest(A_est_auto))
## 
## z test of coefficients:
## 
##            Estimate Std. Error z value  Pr(>|z|)    
## ma1        0.427401   0.080232  5.3271  9.98e-08 ***
## ma2       -0.161204   0.086558 -1.8624   0.06255 .  
## intercept  0.983262   0.048406 20.3127 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
print(A_mdl_auto.summary().tables[1])
## ==============================================================================
##                  coef    std err          z      P>|z|      [0.025      0.975]
## ------------------------------------------------------------------------------
## const          1.0046      0.055     18.121      0.000       0.896       1.113
## ar.L1.y       -1.0083      0.080    -12.614      0.000      -1.165      -0.852
## ar.L2.y       -0.3666      0.113     -3.242      0.001      -0.588      -0.145
## ar.L3.y        0.1575      0.113      1.390      0.165      -0.065       0.379
## ar.L4.y       -0.2547      0.080     -3.176      0.001      -0.412      -0.097
## ma.L1.y        1.6032      0.035     45.238      0.000       1.534       1.673
## ma.L2.y        0.9999      0.042     24.066      0.000       0.918       1.081
## ==============================================================================

We can also plot the fitted values and the forecasts along with their standard errors:

par(mfrow = c(2, 1))
plot(forecast::forecast(A_est_auto, h = 20), col = "orange", lwd = 2)
lines(A_est_auto$fitted, col = "blue", lwd = 2)
plot(forecast::forecast(A_est_auto, h = 20), col = "orange", lwd = 2,
     xlim = c(tail(time(A_est_auto$fitted), 1)) + c(-20, 20))
lines(A_est_auto$fitted, col = "blue", lwd = 2)
legend("topright", c("Forecast", "Y"), lwd = 2, col = c("blue", "orange"), bg = "transparent")

fig = plt.figure(figsize = (10, 8))
#
_ = A_mdl_auto.plot_predict(end = A_data.size + 20, ax = fig.add_subplot(211))
_ = plt.legend().remove()
_ = A_mdl_auto.plot_predict(start = A_data.size - 10, end = A_data.size + 20, ax = fig.add_subplot(212))
_ = plt.tight_layout()
plt.show()

We see that:

  • in R the initial forecast has some variation at \(T + 1\) and \(T + 2\), but for \(T + 3, ..., T+20\) the forecasts are equal to the process mean;

  • in Python, there is quite a bit of variation in our forecasts, since our model includes multiple autoregressive and moving-average lag coefficients.

  • \((B)\) \(Y_t = 1 + (1 + 1.3L - 0.4 L^2)\epsilon_t\)

par(mfrow = c(2, 1))
plot(forecast::forecast(B_est_auto, h = 20), col = "orange", lwd = 2)
lines(B_est_auto$fitted, col = "blue", lwd = 2)
plot(forecast::forecast(B_est_auto, h = 20), col = "orange", lwd = 2,
     xlim = c(tail(time(B_est_auto$fitted), 1)) + c(-20, 20))
lines(B_est_auto$fitted, col = "blue", lwd = 2)
legend("topright", c("Forecast", "Y"), lwd = 2, col = c("blue", "orange"), bg = "transparent")

fig = plt.figure(figsize = (10, 8))
#
_ = B_mdl_auto.plot_predict(end = B_data.size + 20, ax = fig.add_subplot(211))
_ = plt.legend().remove()
_ = B_mdl_auto.plot_predict(start = B_data.size - 10, end = B_data.size + 20, ax = fig.add_subplot(212))
_ = plt.tight_layout()
plt.show()

  • In R, the process there is some variation for the first couple of forecasts, but the later ones are close to the process mean (because the automatically specified model is not a pure \(\rm MA\) model).

  • in Python, the results are similar as in \((A)\).

  • \((C)\) \((1 - 1.1L) Y_t = 1 + \epsilon_t\)

par(mfrow = c(2, 1))
plot(forecast::forecast(C_est_auto, h = 20), col = "orange", lwd = 2)
lines(C_est_auto$fitted, col = "blue", lwd = 2)
plot(forecast::forecast(C_est_auto, h = 20), col = "orange", lwd = 2,
     xlim = c(tail(time(C_est_auto$fitted), 1)) + c(-20, 20))
lines(C_est_auto$fitted, col = "blue", lwd = 2)
legend("topright", c("Forecast", "Y"), lwd = 2, col = c("blue", "orange"), bg = "transparent")

fig = plt.figure(figsize = (10, 8))
#
_ = C_mdl_auto.plot_predict(end = C_data.size + 20, ax = fig.add_subplot(211))
_ = plt.legend().remove()
_ = C_mdl_auto.plot_predict(start = C_data.size - 10, end = C_data.size + 20, ax = fig.add_subplot(212))
_ = plt.tight_layout()
plt.show()

Graphically, the fitted values appear close to the true values for this non-stationary process in Python. However, the forecast is declining, whereas the fitted values and the true values appear to grow exponentially. this does not appear to make any sense intuitively. Remember that we attempt to estimate a stationary model for the data, regardless whether this is true or not, so the forecast is quite poor. On the other hand, in R, the best model was an \({\rm ARMA}(0, 0)\).

Considerations

Overall, these kinds of non-stationary processes need separate methods for dealing with non-stationarity and cannot simply be modelled by specifying an approximate stationary model on the initial data.

  • \((D)\) \((1 - 1.1L + 0.2L^2) Y_t = 1 + \epsilon_t\)
par(mfrow = c(2, 1))
plot(forecast::forecast(D_est_auto, h = 20), col = "orange", lwd = 2)
lines(D_est_auto$fitted, col = "blue", lwd = 2)
plot(forecast::forecast(D_est_auto, h = 20), col = "orange", lwd = 2,
     xlim = c(tail(time(D_est_auto$fitted), 1)) + c(-20, 20))
lines(D_est_auto$fitted, col = "blue", lwd = 2)
legend("topright", c("Forecast", "Y"), lwd = 2, col = c("blue", "orange"), bg = "transparent")

fig = plt.figure(figsize = (10, 8))
#
_ = D_mdl_auto.plot_predict(end = D_data.size + 20, ax = fig.add_subplot(211))
_ = plt.legend().remove()
_ = D_mdl_auto.plot_predict(start = D_data.size - 10, end = D_data.size + 20, ax = fig.add_subplot(212))
_ = plt.tight_layout()
plt.show()

Fitted values appear to fit the data quite well, the forecasts appear to be approaching the mean of the process, but it may take longer than 20 periods in the future.

  • \((E)\) \((1 - 1.1L + 0.2L^2) Y_t = (1 + 1.3L)\epsilon_t\)
par(mfrow = c(2, 1))
plot(forecast::forecast(E_est_auto, h = 20), col = "orange", lwd = 2)
lines(E_est_auto$fitted, col = "blue", lwd = 2)
plot(forecast::forecast(E_est_auto, h = 20), col = "orange", lwd = 2,
     xlim = c(tail(time(E_est_auto$fitted), 1)) + c(-20, 20))
lines(E_est_auto$fitted, col = "blue", lwd = 2)
legend("topright", c("Forecast", "Y"), lwd = 2, col = c("blue", "orange"), bg = "transparent")

fig = plt.figure(figsize = (10, 8))
#
_ = E_mdl_auto.plot_predict(end = E_data.size + 20, ax = fig.add_subplot(211))
_ = plt.legend().remove()
_ = E_mdl_auto.plot_predict(start = E_data.size - 10, end = E_data.size + 20, ax = fig.add_subplot(212))
_ = plt.tight_layout()
plt.show()

The model appears to fit the data quite nicely. The forecasts are similar to the ones in \((D)\). For Python, we see that the forecasts are less volatile - this is because the automatically selected model has fewer lags, compared to \((D)\) (see TASK 6).

To sum up:

  • \((A)\) and \((B)\) are moving average processes, which are governed by random shocks, hence the fitted values are quite varying, even though the automatic selection selected \(ARMA\) models, which have both \(AR\) and \(MA\) components.

  • If the true underlying process has a significant autoregressive part, the automatically selected models appear to fit the data quite well, even if it isn’t stationary.

  • On the other hand, the forecast of a non-stationary process does not appear to make sense, indicating overall a poor model for \((C)\).

Considerations

Stationarity and invertibility problems when estimating simulated or empirical data

As we have seen, in case of non-stationary or non-invertible processes, we may still get an estimated model, with stationary and invertible coefficients. The model may not fit the data well, but is an attempt of fitting a good model (in terms of stationarity and invertibility). However, sometimes estimation may result in an error, stating that the non-stationarity and/or noninvertibility does not allow to estimate a model.

To make matters worse, sometimes even if we know that a process is stationary and/or invertible, we may still get an error, stating otherwise.

To give an example of when an error may occur, generate the following \(\rm AR(1)\) process:

1
## [1] 1
np.random.seed(1)
N = 100
AR_spec_test = smt.arima_process.ArmaProcess(ar = np.array([1, 0.99]), ma = [1])
AR_bad_example =  AR_spec_test.generate_sample(N, scale = 0.5)

We can verify that this process is both stationary and invertible:

1
## [1] 1
AR_spec_test.isstationary
## True
1
## [1] 1
AR_spec_test.isinvertible
## True

However, if we try to estimate this model:

1
## [1] 1
spec_1 = smt.arima_model.ARMA(AR_bad_example, order = (1, 0))

we would get an error:

1
## [1] 1
try:
    mdl_1 = spec_1.fit(trend = 'nc')
except ValueError as err:
        print('Handling run-time error:', err)
## Handling run-time error: The computed initial AR coefficients are not stationary
## You should induce stationarity, choose a different model order, or you can
## pass your own start_params.

Sometimes we may even get this kind of error if our coefficients are relatively small - for example the following \(\rm ARMA\) model:

1
## [1] 1
np.random.seed(4)
N = 150
ARMA_spec_test = smt.arima_process.ArmaProcess(ar = np.array([1, 0.5]), ma = [1, 0.5])
ARMA_bad_example = ARMA_spec_test.generate_sample(N, scale = 0.5)

We can verify that this process is both stationary and invertible:

1
## [1] 1
ARMA_spec_test.isstationary
## True
1
## [1] 1
ARMA_spec_test.isinvertible
## True

Then if we specify:

1
## [1] 1
spec_2 = smt.arima_model.ARMA(ARMA_bad_example, order = (1, 1))

and attempt to estimate the model:

1
## [1] 1
try:
    mdl_2 = spec_2.fit(trend = 'nc')
except np.linalg.LinAlgError as err:
    print("Error: ", err)
## Error:  SVD did not converge

which is confusing.

We could also get a different kind of error - that the \(AR\) coefficients are not stationary:

1
## [1] 1
np.random.seed(751)
N = 150
ARMA_spec_test = smt.arima_process.ArmaProcess(ar = np.array([1, 0.5]), ma = [1, 0.5])
ARMA_bad_example = ARMA_spec_test.generate_sample(N, scale = 0.5)
spec_2 = smt.arima_model.ARMA(ARMA_bad_example, order = (1, 1))
try:
    mdl_2 = spec_2.fit(trend = 'nc')
except  ValueError as err:
    print('Handling run-time error:', err)
## Handling run-time error: The computed initial AR coefficients are not stationary
## You should induce stationarity, choose a different model order, or you can
## pass your own start_params.

which makes no sense.

To bypass this, we can specify the starting parameters, start_params, when fitting:

1
## [1] 1
mdl_2 = spec_2.fit(trend = 'nc', start_params = [0, 0])
1
## [1] 1
print(mdl_2.summary().tables[1])
## ==============================================================================
##                  coef    std err          z      P>|z|      [0.025      0.975]
## ------------------------------------------------------------------------------
## ar.L1.y       -0.6235      0.165     -3.770      0.000      -0.948      -0.299
## ma.L1.y        0.4341      0.181      2.398      0.016       0.079       0.789
## ==============================================================================

If our specified model has a constant:

1
## [1] 1
np.random.seed(751)
N = 150
ARMA_spec_test = smt.arima_process.ArmaProcess(ar = np.array([1, 0.5]), ma = [1, 0.5])
ARMA_bad_example =  3 + ARMA_spec_test.generate_sample(N, scale = 0.5)

Then the starting parameters have to include the constant term as well:

1
## [1] 1
spec_3 = smt.arima_model.ARMA(ARMA_bad_example, order = (1, 1))
mdl_3 = spec_3.fit(trend = 'c', start_params = [0, 0, 0])
1
## [1] 1
print(mdl_3.summary().tables[1])
## ==============================================================================
##                  coef    std err          z      P>|z|      [0.025      0.975]
## ------------------------------------------------------------------------------
## const          3.0258      0.035     86.311      0.000       2.957       3.094
## ar.L1.y       -0.6221      0.166     -3.757      0.000      -0.947      -0.298
## ma.L1.y        0.4311      0.181      2.377      0.017       0.076       0.787
## ==============================================================================

We note that non-stationarity, even when the true underlying process is stationary, can be caused by:

  • a small sample size;
  • a random chance;
  • combination of coefficients, which lead to lag function roots that are close to the unit circle;
  • a specific lag order selection in the \(AR\), or \(MA\), may result in a non-stationary and/or non-invertible process. This may be remedied by attempting to specify a different order model.
  • bugs in the econometric software (not as frequent);

Similarly, if the process is non-invertible, we may not always get the estimates. This, unfortunately, may also cause false error for invertible processes for similar reasons, as with non-stationarity:

1
## [1] 1
np.random.seed(132)
N = 100
MA_spec_test = smt.arima_process.ArmaProcess(ar = np.array([1]), ma = [1, -1.3, 0.5])
MA_bad_example =  MA_spec_test.generate_sample(N, scale = 1)

We can verify the stationarity and invertibility as before:

1
## [1] 1
MA_spec_test.isstationary
## True
1
## [1] 1
MA_spec_test.isinvertible
## True

We can calculate the roots:

1
## [1] 1
MA_spec_test.maroots
## array([1.3-0.55677644j, 1.3+0.55677644j])

And their absolute values:

1
## [1] 1
np.abs(MA_spec_test.maroots)
## array([1.41421356, 1.41421356])

To verify that they are outside the unit circle (i.e. their absolute values are > 1). We can also visualize them:

# plot(MA_spec_test)
# forecast::autoplot(MA_spec_test)
fig = plt.figure(figsize = (10, 8))
ax = fig.add_subplot(1, 1, 1)
circ = plt.Circle((0, 0), radius = 1, edgecolor = 'black', facecolor = 'None')
_ = ax.add_patch(circ)
_ = plt.scatter(x = MA_spec_test.maroots.real, y = MA_spec_test.maroots.imag, color = "r")
_ = plt.xlim(-max(np.abs(MA_spec_test.maroots)), max(np.abs(MA_spec_test.maroots)))
_ = plt.ylim(-max(np.abs(MA_spec_test.maroots)), max(np.abs(MA_spec_test.maroots)))
_ = plt.axhline(y = 0, linestyle = "--", color = "b")
_ = plt.axvline(x = 0, linestyle = "--", color = "b")
_ = plt.title("MA(1) roots")
plt.show()

However, if we try to estimate this model:

1
## [1] 1
spec_4 = smt.arima_model.ARMA(MA_bad_example, order = (0, 2))

we would get an error:

1
## [1] 1
try:
    mdl_4 = spec_4.fit(trend = 'nc')
except ValueError as err:
        print('Handling run-time error:', err)
## Handling run-time error: The computed initial MA coefficients are not invertible
## You should induce invertibility, choose a different model order, or you can
## pass your own start_params.

Note that this time, the \(MA\) coefficient is non-invertible, even though the model specification and MA_spec_test.isinvertible indicated that this process is invertible.

Similarly to before, we can specify the starting parameter values:

1
## [1] 1
mdl_4 = spec_4.fit(trend = 'nc', start_params = [0, 0])
1
## [1] 1
print(mdl_4.summary().tables[1])
## ==============================================================================
##                  coef    std err          z      P>|z|      [0.025      0.975]
## ------------------------------------------------------------------------------
## ma.L1.y       -1.5021      0.071    -21.023      0.000      -1.642      -1.362
## ma.L2.y        0.7023      0.063     11.119      0.000       0.579       0.826
## ==============================================================================

Careful!

If we want to avoid checking for stationarity and invertibility entirely, we might try to fit the model with transparams = False:

1
## [1] 1
print(spec_4.fit(trend = 'nc', transparams = False, start_params = [0, 0]).summary().tables[1])
## ==============================================================================
##                  coef    std err          z      P>|z|      [0.025      0.975]
## ------------------------------------------------------------------------------
## ma.L1.y       -1.5021      0.071    -21.023      0.000      -1.642      -1.362
## ma.L2.y        0.7023      0.063     11.119      0.000       0.579       0.826
## ==============================================================================

NOTE: the results are entirely different! Furthermore, it may be possible that the final parameters are exactly the same as the ones in start_params. Therefore, if is advised against using this option.