Time Series With Trend and Seasonality Components: Tasks

There are various time-series data available, we will mainly use the datasets from:

statsmodels dataset collection;
github, which are available in R’s fma package.

Main packages required:

#Import the required modules for vectors and matrix operations, data generation
import numpy as np
#Import the required modules for plot creation:
import matplotlib.pyplot as plt
#import the required modules for TimeSeries data generation:
import statsmodels.api as sm
#Import the required modules for test statistic calculation:
import statsmodels.stats as sm_stat
#Import the required modules for model estimation:
import statsmodels.tsa as smt
# Import pandas dataset
import pandas as pd

The datasets that we will be analysing are:

- Monthly Airline Passenger Numbers 1949-1960, in thousands

airpass = sm.datasets.get_rdataset("AirPassengers", "datasets")

Look at its documentation

print(airpass.__doc__)

and extract the values

airpass = pd.Series(airpass.data["value"])
airpass.index = pd.date_range(start = "1949-01", periods = len(airpass.index), freq = "M").to_period()
airpass.index = airpass.index.to_timestamp()
airpass.head()

- Global temperature anomalies for the years 1880 to 2010. These are the GISS (Goddard Institute for Space Studies) Land-Ocean Temperature Index (LOTI) data.

loti = sm.datasets.get_rdataset("loti", "gamclass")

Note that you will need to fix the format of the data, input the following code and examine the output:

loti.data.head()
#
loti = pd.melt(loti.data.iloc[: , list(range(0,12)) + [len(loti.data.columns)-1]], id_vars = ['Year'])
#
loti.head()
#
cats = ['Jan', 'Feb', 'Mar', 'Apr','May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
loti.variable = pd.Categorical(loti.variable, categories = cats)
loti = loti.sort_values(by = ['Year', 'variable'])
loti.head()
#
loti = pd.Series(loti["value"])
loti.index = pd.date_range(start = "1880-01", periods = len(loti.index), freq = "M").to_period()
loti.index = loti.index.to_timestamp()
loti.head()

- Total international visitors to Australia (in millions). 1980-2015.

austa = sm.datasets.get_rdataset("austa", "fpp2")
#
austa = pd.Series(austa.data["value"])
austa.index = pd.date_range(start = "1990-01", periods = len(austa.index), freq = "Y").to_period()
austa.index = austa.index.to_timestamp()

- Quarterly visitor nights (in millions) spent by international tourists to Australia. 1999-2015.

austourists = sm.datasets.get_rdataset("austourists", "fpp2")
#
austourists = pd.Series(austourists.data["value"])
austourists.index = pd.date_range(start = "1999-01", periods = len(austourists.index), freq = "Q").to_period()
austourists.index = austourists.index.to_timestamp()

- Monthly Australian beer production: Jan 1991 – Aug 1995.

beer = pd.Series([164, 148, 152, 144, 155, 125, 153, 146, 138, 190,
                  192, 192, 147, 133, 163, 150, 129, 131, 145, 137, 138, 168, 176,
                  188, 139, 143, 150, 154, 137, 129, 128, 140, 143, 151, 177, 184,
                  151, 134, 164, 126, 131, 125, 127, 143, 143, 160, 190, 182, 138,
                  136, 152, 127, 151, 130, 119, 153])
beer.index = pd.date_range(start = "1991-01", periods = len(beer.index), freq = "M").to_period()
beer.index = beer.index.to_timestamp()

- Sales of shampoo over a three year period.

shampoo = pd.Series([266, 145.9, 183.1, 119.3, 180.3, 168.5, 231.8,
                     224.5, 192.8, 122.9, 336.5, 185.9, 194.3, 149.5, 210.1, 273.3,
                     191.4, 287, 226, 303.6, 289.9, 421.6, 264.5, 342.3, 339.7, 440.4,
                     315.9, 439.3, 401.3, 437.4, 575.5, 407.6, 682, 475.3, 581.3, 646.9])
shampoo.index = pd.date_range(start = "1999-01", periods = len(shampoo.index), freq = "Q").to_period()
shampoo.index = shampoo.index.to_timestamp()

- Monthly sales of new one-family houses sold in the USA since 1973.

hsales = pd.Series([55, 60, 68, 63, 65, 61, 54, 52, 46, 42, 37, 30, 37,
                    44, 55, 53, 58, 50, 48, 45, 41, 34, 30, 24, 29, 34, 44, 54, 57,
                    51, 51, 53, 46, 46, 46, 39, 41, 53, 55, 62, 55, 56, 57, 59, 58,
                    55, 49, 47, 57, 68, 84, 81, 78, 74, 64, 74, 71, 63, 55, 51, 57,
                    63, 75, 85, 80, 77, 68, 72, 68, 70, 53, 50, 53, 58, 73, 72, 68,
                    63, 64, 68, 60, 54, 41, 35, 43, 44, 44, 36, 44, 50, 55, 61, 50,
                    46, 39, 33, 37, 40, 49, 44, 45, 38, 36, 34, 28, 29, 27, 29, 28,
                    29, 36, 32, 36, 34, 31, 36, 39, 40, 39, 33, 44, 46, 57, 59, 64,
                    59, 51, 50, 48, 51, 45, 48, 52, 58, 63, 61, 59, 58, 52, 48, 53,
                    55, 42, 38, 48, 55, 67, 60, 65, 65, 63, 61, 54, 52, 51, 47, 55,
                    59, 89, 84, 75, 66, 57, 52, 60, 54, 48, 49, 53, 59, 73, 72, 62,
                    58, 55, 56, 52, 52, 43, 37, 43, 55, 68, 68, 64, 65, 57, 59, 54,
                    57, 43, 42, 52, 51, 58, 60, 61, 58, 62, 61, 49, 51, 47, 40, 45,
                    50, 58, 52, 50, 50, 46, 46, 38, 37, 34, 29, 30, 40, 46, 46, 47,
                    47, 43, 46, 37, 41, 39, 36, 48, 55, 56, 53, 52, 53, 52, 56, 51,
                    48, 42, 42, 44, 50, 60, 66, 58, 59, 55, 57, 57, 56, 53, 51, 45,
                    58, 74, 65, 65, 55, 52, 59, 54, 57, 45, 40, 47, 47, 60, 58, 63,
                    64, 64, 63, 55, 54, 44])
hsales.index = pd.date_range(start = "1973-01", periods = len(hsales.index), freq = "M").to_period()
hsales.index = hsales.index.to_timestamp()

- US treasury bill contracts on the Chicago market for 100 consecutive trading days in 1981

ustreas = pd.Series([91.57, 91.56, 91.40, 91.39, 91.20, 90.90, 90.99,
                     91.17, 90.98, 91.11, 91.20, 90.92, 90.73, 90.79, 90.86, 90.83,
                     90.80, 90.21, 90.10, 90.10, 89.66, 89.81, 89.40, 89.34, 89.16,
                     88.71, 89.28, 89.36, 89.64, 89.53, 89.32, 89.14, 89.41, 89.53,
                     89.16, 88.98, 88.50, 88.63, 88.62, 88.76, 89.07, 88.47, 88.41,
                     88.57, 88.23, 87.93, 87.88, 88.18, 88.04, 88.18, 88.78, 89.29,
                     89.14, 89.14, 89.42, 89.26, 89.37, 89.51, 89.66, 89.39, 89.02,
                     89.05, 88.97, 88.57, 88.44, 88.52, 87.92, 87.71, 87.52, 87.37,
                     87.27, 87.07, 86.83, 86.88, 87.48, 87.30, 87.87, 87.85, 87.83,
                     87.40, 86.89, 87.38, 87.48, 87.21, 87.02, 87.10, 87.02, 87.07,
                     86.74, 86.35, 86.32, 86.77, 86.77, 86.49, 86.02, 85.52, 85.02,
                     84.42, 85.29, 85.32])

- Australian monthly electricity production: Jan 1956 – Aug 1995

elec  = pd.Series([1254, 1290, 1379, 1346, 1535, 1555, 1655, 1651, 1500,
                   1538, 1486, 1394, 1409, 1387, 1543, 1502, 1693, 1616, 1841,
                   1787, 1631, 1649, 1586, 1500, 1497, 1463, 1648, 1595, 1777,
                   1824, 1994, 1835, 1787, 1699, 1633, 1645, 1597, 1577, 1709,
                   1756, 1936, 2052, 2105, 2016, 1914, 1925, 1824, 1765, 1721,
                   1752, 1914, 1857, 2159, 2195, 2287, 2276, 2096, 2055, 2004,
                   1924, 1851, 1839, 2019, 1937, 2270, 2251, 2382, 2364, 2129,
                   2110, 2072, 1980, 1995, 1932, 2171, 2162, 2489, 2424, 2641,             
                   2630, 2324, 2412, 2284, 2186, 2184, 2144, 2379, 2383, 2717,
                   2774, 3051, 2891, 2613, 2600, 2493, 2410, 2390, 2463, 2616,
                   2734, 2970, 3125, 3342, 3207, 2964, 2919, 2764, 2732, 2622,
                   2698, 2950, 2895, 3200, 3408, 3679, 3473, 3154, 3107, 3052,
                   2918, 2786, 2739, 3125, 3033, 3486, 3661, 3927, 3851, 3456,
                   3390, 3280, 3166, 3080, 3069, 3340, 3310, 3798, 3883, 4191,
                   4213, 3766, 3628, 3520, 3322, 3250, 3287, 3552, 3440, 4153,
                   4265, 4655, 4492, 4051, 3967, 3807, 3639, 3647, 3560, 3929,
                   3858, 4485, 4697, 4977, 4675, 4596, 4491, 4127, 4144, 4014,
                   3994, 4320, 4400, 5002, 5091, 5471, 5193, 4997, 4737, 4546,
                   4498, 4350, 4206, 4743, 4582, 5191, 5457, 5891, 5618, 5158,
                   5030, 4800, 4654, 4453, 4440, 4945, 4788, 5425, 5706, 6061,
                   5846, 5242, 5408, 5114, 5042, 5008, 4657, 5359, 5193, 5891,
                   5980, 6390, 6366, 5756, 5640, 5429, 5398, 5413, 5141, 5695,
                   5554, 6369, 6592, 7107, 6917, 6353, 6205, 5830, 5646, 5379,
                   5489, 5824, 5907, 6482, 6795, 7028, 6776, 6274, 6362, 5940,
                   5958, 5769, 5887, 6367, 6165, 6868, 7201, 7601, 7581, 7090,
                   6841, 6408, 6435, 6176, 6138, 6717, 6470, 7312, 7763, 8171,
                   7788, 7311, 6679, 6704, 6724, 6552, 6427, 7105, 6869, 7683,
                   8082, 8555, 8386, 7553, 7398, 7112, 6886, 7077, 6820, 7426,
                   7143, 8261, 8240, 8977, 8991, 8026, 7911, 7510, 7381, 7366,
                   7414, 7824, 7524, 8279, 8707, 9486, 8973, 8231, 8206, 7927,
                   7999, 7834, 7521, 8284, 7999, 8940, 9381, 10078, 9796, 8471,
                   8572, 8150, 8168, 8166, 7903, 8606, 8071, 9178, 9873, 10476,
                   9296, 8818, 8697, 8381, 8293, 7942, 8001, 8744, 8397, 9115,
                   9773, 10358, 9849, 9083, 9143, 8800, 8741, 8492, 8795, 9354,
                   8796, 10072, 10174, 11326, 10744, 9806, 9740, 9373, 9244, 9407,
                   8827, 9880, 9364, 10580, 10899, 11687, 11280, 10208, 10212,
                   9725, 9721, 9846, 9407, 10265, 9970, 10801, 11246, 12167,
                   11578, 10645, 10613, 10104, 10348, 10263, 9973, 10803, 10409,
                   11458, 11845, 12559, 12070, 11221, 11338, 10761, 11012, 10923,
                   10790, 11427, 10788, 11772, 12104, 12634, 12772, 11764, 11956,
                   11646, 11750, 11485, 11198, 12265, 11704, 12419, 13259, 13945,
                   13839, 12387, 12546, 12038, 11977, 12336, 11793, 12877, 11923,
                   13306, 13988, 14002, 14336, 12867, 12721, 12449, 12686, 12810,
                   12015, 12888, 12431, 13499, 13014, 14296, 14125, 12817, 12862,
                   12449, 12489, 12621, 12380, 13023, 12302, 13339, 13825, 14428,
                   14151, 13355, 13094, 12656, 12435, 13287, 12434, 13209, 12817,
                   13746, 14259, 14590, 14354, 13254, 13464, 13302, 13456, 13171,
                   12517, 13489, 12509, 13785, 13921, 14603, 14749, 13540, 13457,
                   13243, 13590, 13487, 12776, 13812, 13032, 14268, 14473, 15359,
                   14457])
elec.index = pd.date_range(start = "1956-01", periods = len(elec.index), freq = "M").to_period()
elec.index = elec.index.to_timestamp()

Select a few datasets which appear interesting to you - either visually or in general and complete the following tasks.

Exploratory Data Analysis

1.1 Plot your time series. Does the data look stationary? Why?

1.2 Does the data exhibit autocorrelation? What could be the cause of this type of autocorrelation? (Hint:look at the ACF and PACF - think about what happens to them if a deterministic component is introduced).

1.3 As you know, there are two common decomposition models - additive and multiplicative. Which one would be appropriate for this data and why? (Hint: look at the data variation). If needed, transform your data.

Time series decomposition without forecasts

After having an initial look at the data and carrying out any needed data transformations, move on to decompose the deterministic components of your time series. Remember that one requirement for the seasonal component with period \(d\) is that \(S_t = S_{t+d}\), i.e. the seasonal component itself must not increase pastime increases.

2.1 Decompose the data via a moving-average method. Plot the trend and trend + seasonal components alongside the actual data. Is the decomposed trend and seasonality captured correctly? Explain your answers. (Hint: look at the residual plots as well as their ACF and PACF)

Time series decomposition with forecasts

Having an initial idea on the decomposition form and the deterministic components, use a couple of methods for TS decomposition, that allow forecasting. If needed, generate the trend and seasonal variables from your data.

3.1 Use the Global Method of OLS to estimate the trend and seasonal components of your data and plot the estimated values from your model. Does the model capture your trend and seasonality components well? Explain why.

3.2 Use the Local Method of Exponential Smoothing to estimate the deterministic components. Does the model capture the trend and seasonality components well? Explain why.

3.3 Compare the different decomposition methods - which one is better for your data? Explain your answer.

3.4 Select the best method and forecast your data 12 steps ahead.

Additional Decomposition Methods (Global NLS, SARMA, Kernel Smoothing, X13, etc.)

4.1 If your data has a seasonality component - remove the trend (if a trend exists in your series) and estimate an \(SARIMA(p, 0, q)\times (P, 0, Q)_d\) model, where \(d\) is the frequency of your data (as long as its frequency \(d > 1\)). If there is no seasonality - examine the residuals and check whether an \(ARMA\) model is necessary.

4.2 Forecast the \(SARIMA\) (or the \(ARMA\), if there is no seasonality) process \(h = 20\) steps ahead. If necessary, combine with the trend forecast. Plot the fitted and the forecast values.

4.3 Select any other smoothing method (Kernel smoothing, LOWESS or Cubic splines) and estimate the trend in your data. Is the trend increasing, decreasing, or constant in your series?