3.1 Chapter Exercises

You are encouraged to find any other data, which may be interesting to you. The data that will be used in this section is simulated. The reason being - specific model properties, as well as R and Python library capabilities can be explored much easier. After more advanced time series topics are covered, it will become easier to analyse real-world datasets.

3.1.1 Time Series Datasets

Below we list some select time series processes, which you can examine.

There are various time-series data available, we will mainly use the datasets from:

In addition, some packages are needed in order to load and prepare the data:

#Import the required modules for vectors and matrix operations, data generation
import numpy as np
#Import the required modules for plot creation:
import matplotlib.pyplot as plt
#import the required modules for TimeSeries data generation:
import statsmodels.api as sm
#Import the required modules for test statistic calculation:
import statsmodels.stats as sm_stat
#Import the required modules for model estimation:
import statsmodels.tsa as smt
# Import pandas dataset
import pandas as pd

3.1.1.1 Dataset 1

Monthly Airline Passenger Numbers 1949-1960, in thousands.

airpass <- fma::airpass
airpass = sm.datasets.get_rdataset("AirPassengers", "datasets")

You can look at the documentation of this dataset:

print(airpass.__doc__)

and extract the values

airpass = pd.Series(airpass.data["value"])
airpass.index = pd.date_range(start = "1949-01", periods = len(airpass.index), freq = "M").to_period()
airpass.index = airpass.index.to_timestamp()
airpass.head()

3.1.1.2 Dataset 2

Global temperature anomalies for the years 1880 to 2010. These are the GISS (Goddard Institute for Space Studies) Land-Ocean Temperature Index (LOTI) data.

loti <- read.csv("https://data.giss.nasa.gov/gistemp/tabledata_v4/GLB.Ts+dSST.csv", skip = 1)

Note that you will need to fix the format of the data, input the following code and examine the output:

loti <- reshape2::melt(loti[, c(1:13)], id.vars = "Year")
loti <- loti[loti$Year <= 2020, ]
loti$value <- as.numeric(gsub("^\\.", "0.", gsub("^-\\.", "-0.", loti$value)))
loti <- loti[order(loti$Year), ]
loti <- ts(loti$value * 100, start = c(1880, 1), frequency = 12)
head(loti)
##      Jan Feb Mar Apr May Jun
## 1880 -17 -23  -8 -15  -8 -19
# loti = sm.datasets.get_rdataset("loti", "gamclass")
loti = pd.read_csv("https://data.giss.nasa.gov/gistemp/tabledata_v4/GLB.Ts+dSST.csv", skiprows = 1)

Note that you will need to fix the format of the data, input the following code and examine the output:

loti.head()
#
##    Year   Jan   Feb   Mar   Apr   May   Jun   Jul   Aug   Sep   Oct   Nov   Dec   J-D   D-N   DJF   MAM   JJA   SON
## 0  1880 -0.17 -0.23  -.08  -.15  -.08  -.19  -.17  -.09  -.13  -.22  -.20  -.16  -.16   ***   ***  -.10  -.15  -.18
## 1  1881 -0.18 -0.13   .04   .06   .07  -.17   .01  -.02  -.14  -.20  -.17  -.05  -.07  -.08  -.16   .06  -.06  -.17
## 2  1882  0.18  0.15   .05  -.15  -.13  -.22  -.15  -.06  -.13  -.23  -.15  -.35  -.10  -.07   .09  -.08  -.14  -.17
## 3  1883 -0.28 -0.35  -.11  -.17  -.16  -.07  -.05  -.12  -.20  -.10  -.22  -.10  -.16  -.18  -.33  -.15  -.08  -.17
## 4  1884 -0.12 -0.06  -.35  -.39  -.33  -.35  -.31  -.26  -.26  -.24  -.32  -.30  -.27  -.26  -.09  -.36  -.31  -.27
loti = pd.melt(loti.iloc[: , list(range(0, 13))], id_vars = ['Year'])
#
loti.head()
#
##    Year variable value
## 0  1880      Jan -0.17
## 1  1881      Jan -0.18
## 2  1882      Jan  0.18
## 3  1883      Jan -0.28
## 4  1884      Jan -0.12
cats = ['Jan', 'Feb', 'Mar', 'Apr','May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
loti.variable = pd.Categorical(loti.variable, categories = cats)
loti = loti.sort_values(by = ['Year', 'variable'])
loti = loti[loti["Year"] <= 2020]
loti.head()
##      Year variable value
## 0    1880      Jan -0.17
## 142  1880      Feb -0.23
## 284  1880      Mar  -.08
## 426  1880      Apr  -.15
## 568  1880      May  -.08
loti["value"] = loti["value"].replace(to_replace = "^-\\.", value = "-0.", regex = True)
loti["value"] = loti["value"].replace(to_replace = "^\\.", value = "0.", regex = True)
loti["value"] = loti["value"].astype(float)
#
loti = pd.Series(loti["value"] * 100)
loti.index = pd.date_range(start = "1880-01", periods = len(loti.index), freq = "M").to_period()
loti.index = loti.index.to_timestamp()
loti.head()
## 1880-01-01   -17.0
## 1880-02-01   -23.0
## 1880-03-01    -8.0
## 1880-04-01   -15.0
## 1880-05-01    -8.0
## Freq: MS, Name: value, dtype: float64

3.1.1.3 Dataset 3

Total international visitors to Australia (in millions). 1980-2015.

austa <- fpp2::austa
austa = sm.datasets.get_rdataset("austa", "fpp2")
#
austa = pd.Series(austa.data["value"])
austa.index = pd.date_range(start = "1990-01", periods = len(austa.index), freq = "Y").to_period()
austa.index = austa.index.to_timestamp() 

3.1.1.4 Dataset 4

Quarterly visitor nights (in millions) spent by international tourists to Australia. 1999-2015.

austourists <- fpp2::austourists
austourists = sm.datasets.get_rdataset("austourists", "fpp2")
#
austourists = pd.Series(austourists.data["value"])
austourists.index = pd.date_range(start = "1999-01", periods = len(austourists.index), freq = "Q").to_period()
austourists.index = austourists.index.to_timestamp() 

3.1.1.5 Dataset 5

Monthly Australian beer production: Jan 1991 – Aug 1995.

beer <- fma::beer
beer = pd.Series([164, 148, 152, 144, 155, 125, 153, 146, 138, 190,
                  192, 192, 147, 133, 163, 150, 129, 131, 145, 137, 138, 168, 176,
                  188, 139, 143, 150, 154, 137, 129, 128, 140, 143, 151, 177, 184,
                  151, 134, 164, 126, 131, 125, 127, 143, 143, 160, 190, 182, 138,
                  136, 152, 127, 151, 130, 119, 153])
beer.index = pd.date_range(start = "1991-01", periods = len(beer.index), freq = "M").to_period()
beer.index = beer.index.to_timestamp() 

3.1.1.6 Dataset 6

Sales of shampoo over a three year period.

shampoo <- fma::shampoo
shampoo = pd.Series([266, 145.9, 183.1, 119.3, 180.3, 168.5, 231.8,
                     224.5, 192.8, 122.9, 336.5, 185.9, 194.3, 149.5, 210.1, 273.3,
                     191.4, 287, 226, 303.6, 289.9, 421.6, 264.5, 342.3, 339.7, 440.4,
                     315.9, 439.3, 401.3, 437.4, 575.5, 407.6, 682, 475.3, 581.3, 646.9])
shampoo.index = pd.date_range(start = "1999-01", periods = len(shampoo.index), freq = "Q").to_period()
shampoo.index = shampoo.index.to_timestamp()  

3.1.1.7 Dataset 7

Monthly sales of new one-family houses sold in the USA since 1973.

hsales <- fma::hsales
hsales = pd.Series([55, 60, 68, 63, 65, 61, 54, 52, 46, 42, 37, 30, 37,
                    44, 55, 53, 58, 50, 48, 45, 41, 34, 30, 24, 29, 34, 44, 54, 57,
                    51, 51, 53, 46, 46, 46, 39, 41, 53, 55, 62, 55, 56, 57, 59, 58,
                    55, 49, 47, 57, 68, 84, 81, 78, 74, 64, 74, 71, 63, 55, 51, 57,
                    63, 75, 85, 80, 77, 68, 72, 68, 70, 53, 50, 53, 58, 73, 72, 68,
                    63, 64, 68, 60, 54, 41, 35, 43, 44, 44, 36, 44, 50, 55, 61, 50,
                    46, 39, 33, 37, 40, 49, 44, 45, 38, 36, 34, 28, 29, 27, 29, 28,
                    29, 36, 32, 36, 34, 31, 36, 39, 40, 39, 33, 44, 46, 57, 59, 64,
                    59, 51, 50, 48, 51, 45, 48, 52, 58, 63, 61, 59, 58, 52, 48, 53,
                    55, 42, 38, 48, 55, 67, 60, 65, 65, 63, 61, 54, 52, 51, 47, 55,
                    59, 89, 84, 75, 66, 57, 52, 60, 54, 48, 49, 53, 59, 73, 72, 62,
                    58, 55, 56, 52, 52, 43, 37, 43, 55, 68, 68, 64, 65, 57, 59, 54,
                    57, 43, 42, 52, 51, 58, 60, 61, 58, 62, 61, 49, 51, 47, 40, 45,
                    50, 58, 52, 50, 50, 46, 46, 38, 37, 34, 29, 30, 40, 46, 46, 47,
                    47, 43, 46, 37, 41, 39, 36, 48, 55, 56, 53, 52, 53, 52, 56, 51,
                    48, 42, 42, 44, 50, 60, 66, 58, 59, 55, 57, 57, 56, 53, 51, 45,
                    58, 74, 65, 65, 55, 52, 59, 54, 57, 45, 40, 47, 47, 60, 58, 63,
                    64, 64, 63, 55, 54, 44])
hsales.index = pd.date_range(start = "1973-01", periods = len(hsales.index), freq = "M").to_period()
hsales.index = hsales.index.to_timestamp()

3.1.1.8 Dataset 8

US treasury bill contracts on the Chicago market for 100 consecutive trading days in 1981.

ustreas <- fma::ustreas
ustreas = pd.Series([91.57, 91.56, 91.40, 91.39, 91.20, 90.90, 90.99,
                     91.17, 90.98, 91.11, 91.20, 90.92, 90.73, 90.79, 90.86, 90.83,
                     90.80, 90.21, 90.10, 90.10, 89.66, 89.81, 89.40, 89.34, 89.16,
                     88.71, 89.28, 89.36, 89.64, 89.53, 89.32, 89.14, 89.41, 89.53,
                     89.16, 88.98, 88.50, 88.63, 88.62, 88.76, 89.07, 88.47, 88.41,
                     88.57, 88.23, 87.93, 87.88, 88.18, 88.04, 88.18, 88.78, 89.29,
                     89.14, 89.14, 89.42, 89.26, 89.37, 89.51, 89.66, 89.39, 89.02,
                     89.05, 88.97, 88.57, 88.44, 88.52, 87.92, 87.71, 87.52, 87.37,
                     87.27, 87.07, 86.83, 86.88, 87.48, 87.30, 87.87, 87.85, 87.83,
                     87.40, 86.89, 87.38, 87.48, 87.21, 87.02, 87.10, 87.02, 87.07,
                     86.74, 86.35, 86.32, 86.77, 86.77, 86.49, 86.02, 85.52, 85.02,
                     84.42, 85.29, 85.32])

3.1.1.9 Dataset 9

Australian monthly electricity production: Jan 1956 – Aug 1995

elec <- fma::elec
elec  = pd.Series([1254, 1290, 1379, 1346, 1535, 1555, 1655, 1651, 1500,
                   1538, 1486, 1394, 1409, 1387, 1543, 1502, 1693, 1616, 1841,
                   1787, 1631, 1649, 1586, 1500, 1497, 1463, 1648, 1595, 1777,
                   1824, 1994, 1835, 1787, 1699, 1633, 1645, 1597, 1577, 1709,
                   1756, 1936, 2052, 2105, 2016, 1914, 1925, 1824, 1765, 1721,
                   1752, 1914, 1857, 2159, 2195, 2287, 2276, 2096, 2055, 2004,
                   1924, 1851, 1839, 2019, 1937, 2270, 2251, 2382, 2364, 2129,
                   2110, 2072, 1980, 1995, 1932, 2171, 2162, 2489, 2424, 2641,             
                   2630, 2324, 2412, 2284, 2186, 2184, 2144, 2379, 2383, 2717,
                   2774, 3051, 2891, 2613, 2600, 2493, 2410, 2390, 2463, 2616,
                   2734, 2970, 3125, 3342, 3207, 2964, 2919, 2764, 2732, 2622,
                   2698, 2950, 2895, 3200, 3408, 3679, 3473, 3154, 3107, 3052,
                   2918, 2786, 2739, 3125, 3033, 3486, 3661, 3927, 3851, 3456,
                   3390, 3280, 3166, 3080, 3069, 3340, 3310, 3798, 3883, 4191,
                   4213, 3766, 3628, 3520, 3322, 3250, 3287, 3552, 3440, 4153,
                   4265, 4655, 4492, 4051, 3967, 3807, 3639, 3647, 3560, 3929,
                   3858, 4485, 4697, 4977, 4675, 4596, 4491, 4127, 4144, 4014,
                   3994, 4320, 4400, 5002, 5091, 5471, 5193, 4997, 4737, 4546,
                   4498, 4350, 4206, 4743, 4582, 5191, 5457, 5891, 5618, 5158,
                   5030, 4800, 4654, 4453, 4440, 4945, 4788, 5425, 5706, 6061,
                   5846, 5242, 5408, 5114, 5042, 5008, 4657, 5359, 5193, 5891,
                   5980, 6390, 6366, 5756, 5640, 5429, 5398, 5413, 5141, 5695,
                   5554, 6369, 6592, 7107, 6917, 6353, 6205, 5830, 5646, 5379,
                   5489, 5824, 5907, 6482, 6795, 7028, 6776, 6274, 6362, 5940,
                   5958, 5769, 5887, 6367, 6165, 6868, 7201, 7601, 7581, 7090,
                   6841, 6408, 6435, 6176, 6138, 6717, 6470, 7312, 7763, 8171,
                   7788, 7311, 6679, 6704, 6724, 6552, 6427, 7105, 6869, 7683,
                   8082, 8555, 8386, 7553, 7398, 7112, 6886, 7077, 6820, 7426,
                   7143, 8261, 8240, 8977, 8991, 8026, 7911, 7510, 7381, 7366,
                   7414, 7824, 7524, 8279, 8707, 9486, 8973, 8231, 8206, 7927,
                   7999, 7834, 7521, 8284, 7999, 8940, 9381, 10078, 9796, 8471,
                   8572, 8150, 8168, 8166, 7903, 8606, 8071, 9178, 9873, 10476,
                   9296, 8818, 8697, 8381, 8293, 7942, 8001, 8744, 8397, 9115,
                   9773, 10358, 9849, 9083, 9143, 8800, 8741, 8492, 8795, 9354,
                   8796, 10072, 10174, 11326, 10744, 9806, 9740, 9373, 9244, 9407,
                   8827, 9880, 9364, 10580, 10899, 11687, 11280, 10208, 10212,
                   9725, 9721, 9846, 9407, 10265, 9970, 10801, 11246, 12167,
                   11578, 10645, 10613, 10104, 10348, 10263, 9973, 10803, 10409,
                   11458, 11845, 12559, 12070, 11221, 11338, 10761, 11012, 10923,
                   10790, 11427, 10788, 11772, 12104, 12634, 12772, 11764, 11956,
                   11646, 11750, 11485, 11198, 12265, 11704, 12419, 13259, 13945,
                   13839, 12387, 12546, 12038, 11977, 12336, 11793, 12877, 11923,
                   13306, 13988, 14002, 14336, 12867, 12721, 12449, 12686, 12810,
                   12015, 12888, 12431, 13499, 13014, 14296, 14125, 12817, 12862,
                   12449, 12489, 12621, 12380, 13023, 12302, 13339, 13825, 14428,
                   14151, 13355, 13094, 12656, 12435, 13287, 12434, 13209, 12817,
                   13746, 14259, 14590, 14354, 13254, 13464, 13302, 13456, 13171,
                   12517, 13489, 12509, 13785, 13921, 14603, 14749, 13540, 13457,
                   13243, 13590, 13487, 12776, 13812, 13032, 14268, 14473, 15359,
                   14457])
elec.index = pd.date_range(start = "1956-01", periods = len(elec.index), freq = "M").to_period()
elec.index = elec.index.to_timestamp()

3.1.2 Tasks

The following are universal for all datasets. This is in order to highlight that in time series analysis, regardless of what the true underlying process is, we still follow the same steps to carry out our analysis.

3.1.2.1 Exercise Set 1: Exploratory Data Analysis (EDA)

  1. Plot your time series. Does the data look stationary? Why?

  2. Does the data exhibit autocorrelation? What could be the cause of this type of autocorrelation? (Hint:look at the ACF and PACF - think about what happens to them if a deterministic component is introduced).

  3. As you know, there are two common decomposition models - additive and multiplicative. Which one would be appropriate for this data and why? (Hint: look at the data variation). If needed, transform your data.

3.1.2.2 Exercise Set 2: Time series decomposition without forecasts

After having an initial look at the data and carrying out any needed data transformations, move on to decompose the deterministic components of your time series. Remember that one requirement for the seasonal component with period \(d\) is that \(S_t = S_{t+d}\), i.e. the seasonal component itself must not increase pastime increases.

  1. Decompose the data via a moving-average method. Plot the trend and trend + seasonal components alongside the actual data. Is the decomposed trend and seasonality captured correctly? Explain your answers. (Hint: look at the residual plots as well as their ACF and PACF)

3.1.2.3 Exercise Set 3: Time series decomposition with forecasts

Having an initial idea on the decomposition form and the deterministic components, use a couple of methods for TS decomposition, that allow forecasting. If needed, generate the trend and seasonal variables from your data.

  1. Use the Global Method of OLS to estimate the trend and seasonal components of your data and plot the estimated values from your model. Does the model capture your trend and seasonality components well? Explain why.

  2. Use the Local Method of Exponential Smoothing to estimate the deterministic components. Does the model capture the trend and seasonality components well? Explain why.

  3. Compare the different decomposition methods - which one is better for your data? Explain your answer.

  4. Select the best method and forecast your data 12 steps ahead.

3.1.2.4 Exercise Set 4: Additional Decomposition Methods (Global NLS, SARMA, Kernel Smoothing, X13, etc.)

  1. If your data has a seasonality component - remove the trend (if a trend exists in your series) and estimate an \(SARIMA(p, 0, q)\times (P, 0, Q)_S\) model, where \(S\) is the frequency of your data (as long as its frequency \(S > 1\)). If there is no seasonality - examine the residuals and check whether an \(ARMA\) model is necessary.

  2. Forecast the \(SARIMA\) (or the \(ARMA\), if there is no seasonality) process \(h = 20\) steps ahead. If necessary, combine with the trend forecast. Plot the fitted and the forecast values.

  3. Select any other smoothing method (Kernel smoothing, LOWESS or Cubic splines) and estimate the trend in your data. Is the trend increasing, decreasing, or constant in your series?