2.2 Chapter Exercises

You are encouraged to find any other data, which may be interesting to you. The data that will be used in this section is simulated. The reason being - specific model properties, as well as R and Python library capabilities can be explored much easier. After more advanced time series topics are covered, it will become easier to analyse real-world datasets.

2.2.1 Time Series Processes

Below we list some select time series processes, which you can examine.

  • \(Y_t = 1 + (1 + 0.5L)\epsilon_t\)
  • \(Y_t = 2 + (1 - 1.3L) \epsilon_t\)
  • \(Y_t = 1 + (1 + 1.3L - 0.4 L^2)\epsilon_t\)
  • \(Y_t = (1 + 0.4 L^2)\epsilon_t\)
  • \((1 - 1.1L) Y_t = 1 + \epsilon_t\)
  • \((1 + 0.5L) Y_t = 2 + \epsilon_t\)
  • \((1 - 1.1L + 0.2L^2) Y_t = 1 + \epsilon_t\)
  • \((1 - 0.2L^2) Y_t = 2 + \epsilon_t\)
  • \((1 - 1.1L + 0.2L^2) Y_t = (1 + 1.3L)\epsilon_t\)
  • \((1 - 0.5L) Y_t = 2 + (1 - 0.5L)\epsilon_t\)

2.2.2 Tasks

The following are universal for all processes. This is in order to highlight that in time series analysis, regardless of what the true underlying process is, we still follow the same steps to carry out our analysis.

2.2.2.1 Exercise Set 1: Data Simulation and Exploratory Data Analysis (EDA)

  1. What kind of models are specified in the equations - \(AR(p)\), \(MA(q)\) or \(ARMA(p,q)\) (do not forget to specify p and q)? Are the processes stationary and/or invertible? Explain your answers.

  2. Simulate the data for each process with sample size \(T = 150\) and a \(WN\) component \(\epsilon_t \sim \mathcal{N}(0, 0.5^2)\). Assume that if \(t \leq 0\), then \(Y_t = \epsilon_t = 0\).

  3. What is the theoretical mean of each process and is it close to the sample mean?

  4. Plot the sample ACF and PACF - what can you say about the processes using only these plots?

2.2.2.2 Exercise Set 2: Model Estimation

  1. Assume that we are somewhat omnipotent and know the true lag order of our series. Estimate the models from the generated data (note: use built-in functions for model estimation in R and Python and manually specify the lag order from part 1.1). What are the coefficient values of your estimated models? Are they close to the actual values?

  2. Assume that this is a data sample from real-life, where we do not know the true underlying process. Use an automated \(ARMA\) model order selection criterion, via the built-in functions in R and Python, to fit the best model for each series. Is the model and the coefficients suggested by the procedure the same as the ones used to generate the data?

  3. If your models are stationary and/or invertible - re-estimate them as either pure \(MA\), or pure \(AR\) models by either restricting the automated model order options in 2.2 (for example, setting \(p_{\max} = 0\) to find the best \(MA\) model), or by selecting an arbitrary high lag order for the model.

2.2.2.3 Exercise Set 3: Residual Diagnostics

Remember that the previous models are generated with shocks \(\epsilon_t \sim WN(0, \sigma^2)\).

  1. Plot the residuals of your estimated models from 2.2. Does the time series plot look like \(WN\)?

  2. Plot the sample ACF and PACF of your model residuals - do they look like \(WN\)?

  3. Perform the Ljung-Box Test on the residuals of your models. Are the residuals \(WN?\)

2.2.2.4 Exercise Set 4: Model Forecasts

  1. Which model is better in terms of \(AIC\): the ones from 2.1, or 2.2? What about using \(BIC\)?

  2. Using the results from 4.1, along with the results from the residual tests, select the best model and forecast 20 periods ahead. What can you say about the forecasts, i.e. how do the forecast values change as the forecast period increases?