Time Series With Trend and Seasonality Components

The following code provides examples on how to decompose a time series comprised of a deterministic trend component, a deterministic seasonal component and a (non-deterministic) stationary component, which could be either a $WN$, an $ARMA$ or a similar kind of process.

In [1]:
#Import the required modules for data generation
import numpy as np

#Import the required modules for plot creation:
import matplotlib.pyplot as plt

#Import the required modules for DataFrame creation:
import pandas as pd

#Import the required modules for TimeSeries data generation:
import statsmodels.api as sm
#Import the required modules for test statistic calculation:
import statsmodels.stats as sm_stat
#Import the required modules for model estimation:
import statsmodels.tsa as smt
#Import the required modules for optimization:
import scipy.optimize as optimize

#We also need additional data:
import statsmodels.formula.api as smf
C:\Users\Andrius\Anaconda3\lib\site-packages\statsmodels\compat\pandas.py:56: FutureWarning: The pandas.core.datetools module is deprecated and will be removed in a future version. Please use the pandas.tseries module instead.
  from pandas.core import datetools

The formula.api hosts many of the same functions found in api (e.g. OLS, GLM), but it also holds lower case counterparts for most of these models. In general, lower case models accept formula and df arguments, whereas upper case ones take endog and exog design matrices. formula accepts a string which describes the model in terms of a patsy formula. df takes a pandas data frame.

dir(smf) will print a list of available models.

In [2]:
dir(smf)
Out[2]:
['GEE',
 'GLM',
 'GLS',
 'GLSAR',
 'Logit',
 'MNLogit',
 'MixedLM',
 'NegativeBinomial',
 'NominalGEE',
 'OLS',
 'OrdinalGEE',
 'PHReg',
 'Poisson',
 'Probit',
 'QuantReg',
 'RLM',
 'WLS',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 'gee',
 'glm',
 'gls',
 'glsar',
 'logit',
 'mixedlm',
 'mnlogit',
 'negativebinomial',
 'nominal_gee',
 'ols',
 'ordinal_gee',
 'phreg',
 'poisson',
 'probit',
 'quantreg',
 'rlm',
 'wls']

Looking back on the White Noise Process

Often it is important to establish whether a stationary process under consideration is $WN$. To achieve this, we can use the graphs of the $ACF$ and $PACF$ functions.

We begin by redefining R's tsdisplay() and tsdiag() functions:

In [3]:
#Define our tsdisplay function (equivalent to R):
def tsdisplay(y, figsize = (14,8), title = "", max_lag = 20):
    tmp_data = pd.Series(y)
    fig = plt.figure(figsize = figsize)
    #Plot the time series
    tmp_data.plot(ax = fig.add_subplot(211), title = "$Time\ Series\ " + title + "$", legend = False)
    #Plot the ACF:
    sm.graphics.tsa.plot_acf(tmp_data, lags = max_lag, zero = False, ax = fig.add_subplot(223))
    #Plot the PACF:
    sm.graphics.tsa.plot_pacf(tmp_data, lags = max_lag, zero = False, ax = fig.add_subplot(224))
    #Fix the layout of the plots:
    plt.tight_layout()
    
#Define our tsdiag function (equivalent to R):
def tsdiag(y, figsize = (14,8), title = ""):
    #The data:
    tmp_data = pd.Series(y)
    #The standardized data:
    tmp_stand_data = pd.Series(y / np.sqrt(y.var()))
    #The Ljung-Box test results for the first 10 lags:
    p_vals = pd.Series(sm_stat.diagnostic.acorr_ljungbox(tmp_data, lags = 10)[1])
    #Start the index from 1 instead of 0 (because Ljung-Box test is for lag values from 1 to k)
    p_vals.index += 1
    
    fig = plt.figure(figsize = figsize)
    #Plot the time series
    tmp_stand_data.plot(ax = fig.add_subplot(311), title = "$Standardized\ Time\ Series\ " + title + "$", legend = False)
    #Plot the ACF:
    sm.graphics.tsa.plot_acf(tmp_data, lags = 20, zero = False, ax = fig.add_subplot(312))
    #Plot the PACF:
    p_vals.plot(ax = fig.add_subplot(313), linestyle='', marker='o', title = "p-values for Ljung-Box statistic", legend = False)
    #Add the horizontal 0.05 critical value line
    plt.axhline(y = 0.05, color = 'blue', linestyle='--')
    
    # Annotate the p-value points above and to the left of the vertex
    x = np.arange(p_vals.size) + 1
    for X, Y, Z in zip(x, p_vals, p_vals):
        plt.annotate(round(Z, 4), xy=(X,Y), xytext=(-5, 5), ha = 'left', textcoords='offset points')
    
    #Change the range of the axis:
    plt.xlim(xmin = 0)
    plt.xlim(xmax = p_vals.size + 1)
    #Change the X axis tick marks - make them from 1 to the max. number of lags in the L-jung-Box test
    plt.xticks(np.arange(1, p_vals.size + 1, 1.0))
    
    #Fix the layout of the plots:
    plt.tight_layout()

These functions make it easier for us to analyse the series and we do not need to repeat the same code for plotting the same type of plots.

In [4]:
#Set the seed:
np.random.seed(1)
nsample = 100
x1 = np.random.normal(loc = 0.0, scale = 1.0, size = nsample) 
tsdisplay(x1)

While the process was generated from a normal distribution, we can see that the $ACF$ and $PACF$ plots show that the third lags are significantly different from zero, which makes it difficult to say if the series is uncorrelated if we do not know the underlying data generating process.

In [5]:
tsdiag(x1)

All the bubbles (p-values) in the bottom graph are above the 0.05 blue line, thus the time series is White Noise.




Datasets

Some R datasets are available online or via sm.datasets.get_rdataset("dataset", "package"):

In [6]:
data = pd.read_csv("https://raw.githubusercontent.com/vincentarelbundock/Rdatasets/master/csv/datasets/AirPassengers.csv")
print(data.head())
   Unnamed: 0         time  value
0           1  1949.000000    112
1           2  1949.083333    118
2           3  1949.166667    132
3           4  1949.250000    129
4           5  1949.333333    121

Tidy up the dataset:

In [7]:
data = pd.Series(data["value"])
data.index = pd.date_range(start = "1949-01", periods = len(data.index), freq = "M").to_period()
data.index = data.index.to_timestamp() #to_period() causes erros with decomposition methods
data.head()
Out[7]:
1949-01-01    112
1949-02-01    118
1949-03-01    132
1949-04-01    129
1949-05-01    121
Freq: MS, Name: value, dtype: int64
In [8]:
data.plot()
Out[8]:
<matplotlib.axes._subplots.AxesSubplot at 0x1f5d0dcc358>

It appears that it may be a multiplicative series, $Y_t = T_t \cdot S_t \cdot E_t$, so we will take logarithms:

In [9]:
log_passengers = np.log(data)
log_passengers.head()
Out[9]:
1949-01-01    4.718499
1949-02-01    4.770685
1949-03-01    4.882802
1949-04-01    4.859812
1949-05-01    4.795791
Freq: MS, Name: value, dtype: float64
In [10]:
print(log_passengers.plot())
AxesSubplot(0.125,0.125;0.775x0.755)

Now the time series looks like an additive model, $Y_t = T_t + S_t + E_t$.

In [11]:
tsdisplay(log_passengers, max_lag = 40)

We see that the ACF decayse very slowly. this indicates a trend effect.

We can see that the ACF plot exhibits a larger correlation at around every $12^{th}$ lag (at lags 12, 24, 36, etc.). This is an indication that there is a seasonal (or cyclic) component in our data.


Trend Decomposition via Moving Average

We will provide an example of the moving-average method for decomposition.

If we select the length of the moving average as an odd number, for example $l = 3$, then:

  • The two-sided moving-average:

$$\widehat{T}_t = \dfrac{Y_{t-1} + Y_t + Y_{t+1}}{3}$$

  • The one-sided moving-average:

$$\widehat{T}_t = \dfrac{Y_{t-2} + Y_{t-1} + Y_t}{3}$$

are calculated for integer times $t$.

If the time series contains a seasonal component and we want ot averate it out, the length of the moving average must be equal to the seasonal frequency (for a monthly series, the length $l = 12$). However, there is a slight hurdle - suppose, our times series begin in January ($t = 1$) and we average up to December ($t = 12$). This average corresponds to a time $t = 6.5$ - a time between June and July.

When we come to estimate seasonal effects, we need a moving-average at integer times. This can be achieved by averaging the average of January to December and the average of February ($t = 2$) to January ($t = 13$). This average of the two moving averages corresponds to $t = 7$ and the process is called centering.

thus, the trend at time $t$ can be estimated by the centered moving average:

  • Two-sided averaging: $$ \begin{aligned} \widehat{T}_t &= \dfrac{(Y_{t-6} +...+ Y_{t+5})/12 + (Y_{t-5} +...+ Y_{t+6})/12}{2} = \dfrac{(1/2)Y_{t-6} + Y_{t-5}...+ Y_{t+5}+ (1/2)Y_{t+6}}{12} \end{aligned} $$

  • One-sided averaging: $$ \begin{aligned} \widehat{T}_t &= \dfrac{(Y_{t-12} +...+ Y_{t-1})/12 + (Y_{t-11} +...+ Y_{t})/12}{2} = \dfrac{(1/2)Y_{t-12} + Y_{t-11}...+ Y_{t-1}+ (1/2)Y_{t}}{12} \end{aligned} $$

Note that the weights are different for the first and last elements used in the averaging. we can use the statsmodels.tsa.seasonal.seasonal_decompose to automatically decompose our series:

In [12]:
decomposition_1  = smt.seasonal.seasonal_decompose(log_passengers, model = "additive", freq = 12, two_sided = True)
trend_1 = decomposition_1.trend

plt.figure(figsize = (10, 6))
log_passengers.plot()
trend_1.plot()
plt.show()

We can see that, because we used a two-sided decomposition, we cannot calculate the trend for the first and last 6 months of our monthly time series.

If we specify a one-sided decomposition, then we will not be able to calculate the trend for the first 12 months:

In [13]:
decomposition_2  = smt.seasonal.seasonal_decompose(log_passengers, model = "additive", freq = 12, two_sided = False)
trend_2 = decomposition_2.trend

plt.figure(figsize = (10, 6))
log_passengers.plot()
trend_2.plot()
plt.show()

If we reduce the frequency, then we will begin to capture the seasonal part with our trend filter. for example, selecting the moving average length $l = 3$:

In [14]:
decomposition_3  = smt.seasonal.seasonal_decompose(log_passengers, model = "additive", freq = 3, two_sided = False)
trend_3 = decomposition_3.trend

plt.figure(figsize = (10, 6))
log_passengers.plot()
trend_3.plot()
plt.show()

From the formulas - this is because we are using fewer observations around time $t$. So, our estimated trend values would be closer to the overall series value. However, in doing so, we are not estimating the trend component correctly. As such, it is important to correctly determine whether your data exhibits a seasonal component and if so - to correctly determine the seasonal frequency.

We could also try to specify our own function to decompose the trend only:

In [15]:
import math as math

def moving_average_decompose(y, l = 12):
    tmp_index = y.index
    #Time Series:
    y = np.array(y)
    #Trend Component:
    t_two_side = pd.np.zeros(y.size)
    t_one_side = pd.np.zeros(y.size)
    tmp_val = 0
    #Loop through each time point:
    for j in range(0, y.size):
        ###########################################################
        # The length is even - use centered averaging:
        if l % 2 == 0:
            # The centered averaging method weights - see the formula example with l = 12.
            tmp_weights = np.concatenate([np.array([1 / (l * 2)]),
                                          np.repeat(1 / l, l - 1),
                                          np.array([1 / (l * 2)])])
            #Two-sided
            if j + 1 <= l / 2 or j + 1 > y.size - l / 2: 
                # We do not have enough data to the left or right of the series to calculate trend at time t = j:
                t_two_side[j] = np.nan
            else:
                tmp_val = y[int(j - (l / 2)):int(j + (l / 2 + 1))] * tmp_weights
                t_two_side[j] = tmp_val.sum()
            #One-sided
            if j + 1 <= l:
                t_one_side[j] = np.nan
            else:
                tmp_val = y[int(j - l):int(j + 1)] * tmp_weights
                t_one_side[j] = tmp_val.sum()
        ###########################################################
        # The length is odd:
        else:
            #For the non-centered averaging the weights are simply the mean, i.e. the length l.
            #tmp_weights = np.repeat(1 / l, l)
            #Two-sided
            if j + 1 <= math.floor(l / 2) or j + 1 > y.size - math.floor(l / 2):
                # We do not have enough data to the left or right of the series to calculate trend at time t = j:
                t_two_side[j] = np.nan
            else:
                tmp_val = y[int(j - math.floor(l / 2)):int(j + (math.floor(l / 2) + 1))]
                t_two_side[j] = tmp_val.mean()
            
            #One-sided
            if j * 2 <= l:
                t_one_side[j] = np.nan
            else:
                tmp_val = y[int(j - l + 1):int(j + 1)]
                t_one_side[j] = tmp_val.mean()
    #Return the same time series:
    t_one_side = pd.Series(t_one_side, name = "One-Sided MA with length = " + str(l))
    t_one_side.index = tmp_index
    t_two_side = pd.Series(t_two_side, name = "Two-Sided MA with length = " + str(l))
    t_two_side.index = tmp_index
    
    return {"Two-sided": t_two_side, "One-sided": t_one_side}
In [16]:
trend_rezults = moving_average_decompose(log_passengers)

plt.figure(figsize = (10, 6))
log_passengers.plot()
trend_rezults["One-sided"].plot()
trend_rezults["Two-sided"].plot()
plt.legend()
plt.show()
In [17]:
trend_low_frq = moving_average_decompose(log_passengers, l = 3)

plt.figure(figsize = (10, 6))
log_passengers.plot()
trend_low_frq["One-sided"].plot()
plt.legend()
plt.show()

We can compare the two functions and we see that they produce the same results:

In [18]:
comparison = trend_rezults["Two-sided"] - trend_1
print("Two-Sided MA, l = 12: max. difference = ", round(comparison.max(), 5), "; min. difference = ", round(comparison.min(), 5))
comparison = trend_rezults["One-sided"] - trend_2
print("One-Sided MA, l = 12: max. difference = ", round(comparison.max(), 5), "; min. difference = ", round(comparison.min(), 5))
comparison = trend_low_frq["One-sided"] - trend_3
print("One-Sided MA, l =  3: max. difference = ", round(comparison.max(), 5), "; min. difference = ", round(comparison.min(), 5))
Two-Sided MA, l = 12: max. difference =  0.0 ; min. difference =  -0.0
One-Sided MA, l = 12: max. difference =  0.0 ; min. difference =  -0.0
One-Sided MA, l =  3: max. difference =  0.0 ; min. difference =  -0.0

Having estimated the trend, we can move on to estimating the seasonal effect.


Seasonal Effect Decomposition

We will decompose the seasonal effect using both trend averaging methods and examine how the decomposition fits the data:

  1. Estimate the trend component $T_t$ using the moving average method.
  2. Estimate $\widehat{S}_t = Y_t - \widehat{T_t}$, $t = 1,...,T$;
  3. Decide on the seasonal period $d$: $S_t = S_{t+d}$ and $\sum_{t = 1}^d S_t = 0$. Then, calculate $\overline{S}_k = \text{mean}\left( \widehat{S}_k, \widehat{S}_{2k}, \widehat{S}_{3k}, ... \right)$, $k = 1,...,d$;
  4. Calculate the total sum: $S_{tot} = \sum_{k = 1}^d \overline{S}_k$;
  5. Calculate $w = \dfrac{1}{d} \sum_{k = 1}^d S_{tot}$
  6. Adjust each seasonal factor: $\tilde{S}_t = \widehat{S}_t - w$
  7. We now have estimated the seasonal component $\tilde{S}_t$ such that $\tilde{S}_t = \tilde{S}_{t+d}$ and $\sum_{t= 1}^d \tilde{S}_t = 0$.

Now, let's begin by defining the above procedure (note - we created this procedure with the assumption that the time series starts from the first season, i.e. January):

In [19]:
def decomp_seas(y, trend, period = 12, title = ""):
    #Initial seasonal component estimation:
    seas_0 = y - trend
    seas_1 = pd.np.zeros(period)
    #Seasonal component averaging:
    for j in range(0, period):
        #Select every j-th month from each year in the dataset:
        seas_1[j] = seas_0[j::period].mean()
    #Adjustment factor:
    w = seas_1.mean()
    seas_1 = seas_1 - w
    
    #Save the seasonal component in a time series format:
    seasonal_results = pd.np.zeros(y.size)
    for j in range(0, period):
        #Select every j-th month from each year in the dataset:
        seasonal_results[j::period] = seas_1[j]
    
    seasonal_results = pd.Series(seasonal_results, name = "Seasonal " + title)
    seasonal_results.index = trend.index
    return(seasonal_results)

The estimated seasonal component:

In [20]:
seas_results = {"One-sided": decomp_seas(y = log_passengers, trend = trend_rezults["One-sided"], period = 12, title = "(One-sided)"),
                "Two-sided": decomp_seas(y = log_passengers, trend = trend_rezults["Two-sided"], period = 12, title = "(Two-sided)")}

We can compare how the seasonal components look like on their own:

In [21]:
plt.figure(figsize = (15, 6))
seas_results["One-sided"].plot()
seas_results["Two-sided"].plot()
plt.legend()
plt.tight_layout()
plt.show()

As well as add them to the trend component $\widehat{T}_t + \tilde{S}_t$:

In [22]:
T_S = {"One-sided": trend_rezults["One-sided"] + seas_results["One-sided"],
      "Two-sided": trend_rezults["Two-sided"] + seas_results["Two-sided"]}
T_S["One-sided"].name = "Trend + Season"
T_S["Two-sided"].name = "Trend + Season"
In [23]:
fig = plt.figure(figsize = (15, 8))
log_passengers.plot(ax = fig.add_subplot(211), title = "One-sided decomposition")
trend_rezults["One-sided"].plot()
T_S["One-sided"].plot()
plt.legend()

log_passengers.plot(ax = fig.add_subplot(212), title = "Two-sided decomposition")
trend_rezults["Two-sided"].plot()
T_S["Two-sided"].plot()
plt.legend()

plt.tight_layout()
plt.show()

The decomposed trend and seasonality fit the data much nicer compared to only the trend component.

We can do the same with the results from smt.seasonal.seasonal_decompose. First, we note that the estimated seasonal component does not differ from the one we estimated manually:

In [24]:
comparison = seas_results["Two-sided"] - decomposition_1.seasonal
print("Two-Sided MA, l = 12: max. difference = ", round(comparison.max(), 5), "; min. difference = ", round(comparison.min(), 5))
comparison = seas_results["One-sided"] - decomposition_2.seasonal
print("One-Sided MA, l = 12: max. difference = ", round(comparison.max(), 5), "; min. difference = ", round(comparison.min(), 5))
Two-Sided MA, l = 12: max. difference =  0.0 ; min. difference =  -0.0
One-Sided MA, l = 12: max. difference =  0.0 ; min. difference =  -0.0

We can also print the output from smt.seasonal.seasonal_decompose:

In [25]:
print(decomposition_1.plot())
Figure(432x288)
In [26]:
fig = plt.figure(figsize = (15, 6))

log_passengers.plot(ax = fig.add_subplot(212), title = "One-sided decomposition")
decomposition_2.trend.plot()
(decomposition_2.seasonal + decomposition_2.trend).plot()

log_passengers.plot(ax = fig.add_subplot(211), title = "Two-sided decomposition")
decomposition_1.trend.plot()
(decomposition_1.seasonal + decomposition_1.trend).plot()

plt.tight_layout()
plt.show()

Finally, we can check if the remained is stationary. We can either calculated it manually:

In [27]:
fig = plt.figure(figsize = (10, 4))
(log_passengers - T_S["One-sided"]).plot(ax = fig.add_subplot(121), title = "One-sided decomposition Residuals")
(log_passengers - T_S["Two-sided"]).plot(ax = fig.add_subplot(122), title = "Two-sided decomposition Residuals")
plt.tight_layout()
plt.show()

Or exctract it from thesmt.seasonal.seasonal_decompose output:

In [28]:
decomposition_1.resid
fig = plt.figure(figsize = (10, 4))
decomposition_2.resid.plot(ax = fig.add_subplot(121), title = "One-sided decomposition Residuals")
decomposition_1.resid.plot(ax = fig.add_subplot(122), title = "Two-sided decomposition Residuals")
plt.tight_layout()
plt.show()

We note that the residuals might not be stationary - compare the variances from different time periods:

In [29]:
print("Two-sided decomp. Variance for [1949-01-01 - 1954-01-01]: ", round(decomposition_1.resid.loc['19490101':'19540101'].var(), 6))
print("Two-sided decomp. Variance for [1954-02-01 - 1960-12-01]: ", round(decomposition_1.resid.loc['19540201':'19600101'].var(), 6))
print("-----------------------------------------------------------")
print("One-sided decomp. Variance for [1949-01-01 - 1954-01-01]: ", round(decomposition_2.resid.loc['19490101':'19540101'].var(), 6))
print("One-sided decomp. Variance for [1954-02-01 - 1960-12-01]: ", round(decomposition_2.resid.loc['19540201':'19600101'].var(), 6))
Two-sided decomp. Variance for [1949-01-01 - 1954-01-01]:  0.001534
Two-sided decomp. Variance for [1954-02-01 - 1960-12-01]:  0.000709
-----------------------------------------------------------
One-sided decomp. Variance for [1949-01-01 - 1954-01-01]:  0.002611
One-sided decomp. Variance for [1954-02-01 - 1960-12-01]:  0.001283

The residuals in the second-half of the time series appear to have a smaller variance.

Let's also look at the residual correlation plots:

In [30]:
fig = plt.figure(figsize = (14, 5))
fig = sm.graphics.tsa.plot_acf(decomposition_1.resid.dropna(), lags=40, ax = fig.add_subplot(221), title = "ACF of Two-sided decomposition Residuals")
fig = sm.graphics.tsa.plot_acf(decomposition_2.resid.dropna(), lags=40, ax = fig.add_subplot(222), title = "ACF of One-sided decomposition Residuals")

fig = sm.graphics.tsa.plot_acf(decomposition_1.resid.dropna(), lags=40, ax = fig.add_subplot(223), title = "PACF of Two-sided decomposition Residuals")
fig = sm.graphics.tsa.plot_acf(decomposition_2.resid.dropna(), lags=40, ax = fig.add_subplot(224), title = "PACF of One-sided decomposition Residuals")

plt.tight_layout()
plt.show()

The series is not White Noise, so we would try to estimate an $ARMA$ model for it. However, it seems that the two-sided decomposition still leaves a seasonality effect as shown by the oscillating residual ACF plot.

Looking at the residual plots from this decomposition, we note that they do not appear to be stationary for the two-sided decomposition method on this data - the variance is smaller in the middle of the data, compared to the beginning of the time series. It also seems to be the case for the one-sided decomposition residuals as well.

We can see that this decomposition method does not fully eliminate the deterministic components from this specific time series data sample.

Note that this method does not allow to produce forecasts.




The Global Method of OLS

The OLS method estimates coefficients of an additive model. If needed, we can also specify not only a linear but also a polynomial trend. We will estimate three different models via OLS:

  • Model 1: $\text{AP}_t = \beta_0 + \beta_1 t + \beta_2 t^2 + \epsilon_t$
  • Model 2: $\text{AP}_t = \beta_0 + \beta_1 t + \beta_2 t^2 + \gamma_1 \text{dm}_{1,t} + ... + \gamma_{11} \text{dm}_{11,t} + \epsilon_t$
  • Model 3: $\log(\text{AP}_t) = \beta_0 + \beta_1 t + \beta_2 t^2 + \gamma_1 \text{dm}_{1,t} + ... + \gamma_{11} \text{dm}_{11,t} + \epsilon_t$

where:

  • $AP_t$ is the monthly airline passenger numbers 1949-1960;
  • $t$ is the trend, $t = 1949-01-01, 1950-02-01,..., 1960-12-01$. Equivalently, we can specify $t = 1,..., 144$. The only difference would be the coefficient $\beta_1, \beta_2$ magnitude.
  • $\text{dm}_{k,t}$, $k = 1,2,...,11$ - dummy variables indicating the month of time $t$;

Recall, that in order to avoid the dummy variable trap (a scenario in which the independent variables are multicollinear (highly correlated)) we have to exclude one dummy variable - in this case $\text{dm}_{12,t}$ - from our regression models.

In [31]:
tsdisplay(data)

From the ACF slow, wavy decay it seems that we have a trend and a seasonal component. From the time series plot is seems that the magnitude of seasonal variance increases as time increases. So it seems to be either an exponential or a polynomial growing time series.

We need to create the additional variables:

In [32]:
AP = data
#Create trend
time = np.array(range(1, len(data.index) + 1))
time_sq = time ** 2
#Create a dummy variable matrix of zeroes:
dm = np.column_stack(np.zeros((11, AP.size)))
#Create a DataFrame
dm = pd.DataFrame(dm)
#Add values to dummy variables:
for j in range(0, len(dm.columns)):
    #Select every j-th month from each year in the dataset:
    dm.iloc[j::12, j] = 1

#x_matrix = np.column_stack((a,b))
In [33]:
dtf = np.column_stack((AP, time, time_sq, dm))
#Create a DataFrame
dtf = pd.DataFrame(dtf)
#Rename the columns:
dtf.columns = ["AP"] + ["time"] + ["time_sq"] + ["dm_" + str(number) for number in range(1, 12)]
dtf.index = [str(number) for number in range(1, len(dtf.index) + 1)]
dtf.head(13)
Out[33]:
AP time time_sq dm_1 dm_2 dm_3 dm_4 dm_5 dm_6 dm_7 dm_8 dm_9 dm_10 dm_11
1 112.0 1.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2 118.0 2.0 4.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
3 132.0 3.0 9.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
4 129.0 4.0 16.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
5 121.0 5.0 25.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0
6 135.0 6.0 36.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0
7 148.0 7.0 49.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0
8 148.0 8.0 64.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0
9 136.0 9.0 81.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0
10 119.0 10.0 100.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0
11 104.0 11.0 121.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0
12 118.0 12.0 144.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
13 115.0 13.0 169.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

Now we can estimate the models:

In [34]:
def get_coef(my_fitted_model):
    #Get the parameters and their pvalues
    my_index = my_fitted_model.pvalues.index.format()
    rez = np.array([my_fitted_model.params, my_fitted_model.pvalues]).T
    rez = pd.DataFrame(rez)
    rez.index = my_index
    rez.columns = ["coef", "p_value"]
    return rez

AP_OLS_1 = smf.ols(formula = 'AP ~ time + time_sq', data = dtf)
AP_OLS_2 = smf.ols(formula = 'AP ~ time + time_sq + dm_1 + dm_2 + dm_3 + dm_4 + dm_5 + dm_6 + dm_7 + dm_8 + dm_9 + dm_10 + dm_11', data = dtf)
AP_OLS_3 = smf.ols(formula = 'np.log(AP) ~ time + time_sq + dm_1 + dm_2 + dm_3 + dm_4 + dm_5 + dm_6 + dm_7 + dm_8 + dm_9 + dm_10 + dm_11', data = dtf)

The first, quadratic model has all significanc coefficients:

In [35]:
fit_1 = AP_OLS_1.fit()
print(round(get_coef(fit_1), 5))
                coef  p_value
Intercept  112.38004  0.00000
time         1.64100  0.00001
time_sq      0.00701  0.00441

Similarly, we can say the same about the second model:

In [36]:
fit_2 = AP_OLS_2.fit()
print(round(get_coef(fit_2), 5))
                coef  p_value
Intercept   79.37755  0.00000
time         1.62550  0.00000
time_sq      0.00714  0.00000
dm_1         9.18029  0.34612
dm_2        -0.15867  0.98698
dm_3        32.40476  0.00110
dm_4        26.70392  0.00676
dm_5        28.82213  0.00353
dm_6        66.00941  0.00000
dm_7       103.01575  0.00000
dm_8       100.09115  0.00000
dm_9        48.73560  0.00000
dm_10       10.19912  0.29475
dm_11      -26.26830  0.00764

While the January, February and December dummy variables are not statistically significantly different from zero (and as such, these months are not significantly different from the baseline case of December), most of the remaining dummy variables are statistically significant, so we can conclude that the seasonal effect is significant in our data.

In the final model we "linearized" the time series by taking the logarithm. We see that we come to the same conclusions regarding coefficient significance as in the second model.

In [37]:
fit_3 = AP_OLS_3.fit()
print(round(get_coef(fit_3), 5))
              coef  p_value
Intercept  4.63006  0.00000
time       0.01318  0.00000
time_sq   -0.00002  0.00000
dm_1       0.02132  0.28129
dm_2      -0.00095  0.96167
dm_3       0.12911  0.00000
dm_4       0.09771  0.00000
dm_5       0.09525  0.00000
dm_6       0.21735  0.00000
dm_7       0.32130  0.00000
dm_8       0.31204  0.00000
dm_9       0.16750  0.00000
dm_10      0.02947  0.13672
dm_11     -0.11408  0.00000

Forecasting

Finally, it would be interesting to predict the series with each model. For that, we need to create a new sample of explanatory variables to predict and plot.

In [38]:
dtf_forc = dtf.iloc[0:36, 1:]
dtf_forc.index = [str(number) for number in range(len(dtf.index) + 1, len(dtf.index) + len(dtf_forc) + 1)]
dtf_forc["time"] = np.array(range(int(dtf_forc.index[0]), int(dtf_forc.index[-1]) + 1))
dtf_forc["time_sq"] = dtf_forc["time"] ** 2
dtf_forc.head()
Out[38]:
time time_sq dm_1 dm_2 dm_3 dm_4 dm_5 dm_6 dm_7 dm_8 dm_9 dm_10 dm_11
145 145 21025 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
146 146 21316 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
147 147 21609 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
148 148 21904 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
149 149 22201 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0

Forecast the data:

In [39]:
#Calculate forecasts
AP_forc_1 =  fit_1.predict(dtf_forc)
AP_forc_2 =  fit_2.predict(dtf_forc)
AP_forc_3 =  fit_3.predict(dtf_forc)

Combine into a single Time Series with fitted and forecast values:

In [40]:
#Combine into a single time series with FITTED and FORECAST values:
AP_fited_1 = np.concatenate((fit_1.predict(), np.array(AP_forc_1)))
AP_fited_1 = pd.Series(AP_fited_1)
#Add the correct time index:
AP_fited_1.index = pd.date_range(start = "1949-01", periods = len(AP_fited_1.index), freq = "M").to_period()
AP_fited_1.index = AP_fited_1.index.to_timestamp()
AP_fited_1.name = "Fitted"

AP_fited_2 = np.concatenate((fit_2.predict(), np.array(AP_forc_2)))
AP_fited_2 = pd.Series(AP_fited_2)
AP_fited_2.index = AP_fited_1.index
AP_fited_2.name = AP_fited_1.name

AP_fited_3 = np.concatenate((fit_3.predict(), np.array(AP_forc_3)))
AP_fited_3 = pd.Series(AP_fited_3)
AP_fited_3.index = AP_fited_1.index
AP_fited_3.name = AP_fited_1.name

Plot the predicted time series alongside the actual data:

In [41]:
fig = plt.figure(figsize = (15, 5))
AP_fited_1.plot(ax = fig.add_subplot(131), linestyle = "--", title = "Quadratic")
AP.plot()
plt.legend()

AP_fited_2.plot(ax = fig.add_subplot(132), linestyle = "--", title = "Quadratic + Seasonal")
AP.plot()
plt.legend()

AP_fited_3.plot(ax = fig.add_subplot(133), linestyle = "--", title = "Quadratic + Seasonal for log(Airpass)")
log_passengers.plot()
plt.legend()

plt.tight_layout()
plt.show()

While the graphs in the central attempts to capture the seasonality effect and it does so better than the left-side graph, we can see that the actual data fluctuations increase in size together with the level of airpass. So an additive model is unable to catch this effect. The common approach to this problem is to take the logarithms and estimate the additive model (right-side graph).




The Global Method of Nonlinear Least Squares (NLS)

We will begin by examining the Shampoo sales data (available in fma package in R):

In [42]:
shampoo = pd.read_csv("https://raw.githubusercontent.com/blue-yonder/pydse/master/pydse/data/sales-of-shampoo-over-a-three-ye.csv", sep  =";")
shampoo = pd.Series(shampoo.iloc[:, 1])

shampoo.plot()
plt.show()

If an economic indicator increases every year by the same rate, then it exhibits an exponential growth. to model this kind of time series, an exponential model would be needed.

However, modelling becomes rather complicated because to extract the trend we need to estimate a non-linear regression: $$Y_t = \alpha + \beta \cdot \exp \left( \gamma \cdot t \right) + \epsilon_t$$ Let's begin by defining the function, which we will be optimizing: $$ \begin{aligned} S &= \sum_{t = 1}^T r_t^2 \left(\alpha, \beta, \gamma \right) \rightarrow_{\alpha, \beta, \gamma} 0 \\ \text{where }& r_t\left(\alpha, \beta, \gamma \right) = Y_t - \alpha + \beta \cdot \exp \left( \gamma \cdot t \right) \end{aligned} $$

In [43]:
def func(pars, y_t, time):
    a = pars[0]
    b = pars[1]
    g = pars[2]
    r_t = a + b * np.exp(g * time)
    residuals = y_t - r_t
    return residuals

Now, we can optimize this function with the following method (note - we need to specify some initial parameter guess):

In [44]:
pars = [1.0, 1.0, 1.0]
time = np.array(range(1, len(shampoo.index) + 1))
final_pars = optimize.leastsq(func, # specify the funtion, which to optimize
                              pars, # specify the initial parameters
                              args = (shampoo, time) #specify the additional fixed parameters to pass to the function which we are optimizing
                             )
print(final_pars[0])
[1.35809103e+02 3.14082545e+01 7.70691146e-02]

This method minimizes the sum of squares of a set of equations.

This approximation assumes that the objective function is based on the difference between some observed target data (ydata) and a (non-linear) function of the parameters func(xdata, params):

func(params) = ydata - f(xdata, params).

Now we can fit the data:

In [45]:
fitted_vals = final_pars[0][0] + final_pars[0][1] * np.exp(final_pars[0][2] * time)
fitted_vals = pd.Series(fitted_vals)
fitted_vals.index = shampoo.index
fitted_vals.name = "Fitted values via NLS"

shampoo.plot()
fitted_vals.plot()
plt.legend()
plt.show()

Let us also check the residuals (remember that our defined function already calculates them):

In [46]:
resid = func(final_pars[0], shampoo, time)

#Plot residuals:
fig = plt.figure(figsize = (15, 5))
fig.add_subplot(121)
resid.plot(linestyle = '--', marker='o')
plt.axhline(y = 0, color='r', linestyle='-')
#Plot residual histogram
fig.add_subplot(122)
plt.hist(resid, histtype='bar', rwidth = 0.9, edgecolor = "black")
plt.show()

The spread of the residuals is more or less constant, however, they do not appear to be normal.

One of the drawbacks of the OLS is its globality - if the time series departs for some reason from the trend at the final moment T, this will change all the coefficients of the model and therefore the predicted value at moment t = 1.

The local methods are free of this defect - their forecast at moment t is (mostly) influenced by the values at close time moments.




The Local Method of Exponential Smoothing

Exponential smoothing is a technique that can be applied to times series data, either to produce smoothed data for presentation or to make forecasts.

While in the moving average decomposition the past observations are weighted equally, exponential functions are used to assign exponentially decreasing weights over time. It is an easily applied procedure for making some determination based on prior assumptions by the user, such as seasonality. Exponential smoothing is often used for analysis of time-series data.

Simple Exponential Smoothing

We state the simple exponential smoothing procedure as an algorithm for converting the observed series $Y_t$ into a smoothed series $\widehat{Y}_t$, $t = 1,...,T$ and forecasts $\widehat{Y}_{T+h,T}$:

  1. Initialize at $t = 1$: $\widehat{Y}_1 = Y_1$;
  2. Update: $\widehat{Y}_t = \alpha Y_t + (1-\alpha) \widehat{Y}_{t-1}$, where $t=2,...,T$ and $\alpha \in (0,1)$;
  3. Forecast: $\widehat{Y}_{T+h,T} = \widehat{Y}_T$, $h = 1,2,...$.
In [47]:
def exp_smooth_simple(pars, y_t, return_type = "simple", forc_period = 12):
    a = pars[0]
    T = len(y_t)
    #print(T)
    new_y = np.zeros(T)
    for i in range(0, T): #note - the first element (when i = 0 is unchanged
        if i == 0:
            new_y[i] = y_t[i]
        else:
            new_y[i] = a * y_t[i] + (1.0 - a) * new_y[i-1]  
    
    #Forecast ahead:
    y_forc = np.zeros(forc_period)
    for j in range(0, forc_period):
        y_forc[j] = new_y[T - 1]
        
    if return_type == "simple":
        return new_y
    else:
        return {"Y": new_y, "Forecast": y_forc}

The smaller $\alpha$ is, the smoother the estimated level. As $\alpha$ approaches 0, the smoothedseries approaches constantcy. As $\alpha$ approaches 1, the smoothes series pproaches point-by-point interpolation.

In [48]:
smoothed_data_1 = exp_smooth_simple([0.1], np.array(pd.Series.tolist(log_passengers)), return_type = "forcast")
smoothed_data_2 = exp_smooth_simple([0.5], np.array(pd.Series.tolist(log_passengers)), return_type = "forcast")
smoothed_data_3 = exp_smooth_simple([0.8], np.array(pd.Series.tolist(log_passengers)), return_type = "forcast")

smoothed_data_1 = pd.Series(np.append(smoothed_data_1["Y"], smoothed_data_1["Forecast"]))
smoothed_data_2 = pd.Series(np.append(smoothed_data_2["Y"], smoothed_data_2["Forecast"]))
smoothed_data_3 = pd.Series(np.append(smoothed_data_3["Y"], smoothed_data_3["Forecast"]))

smoothed_data_1.index = pd.date_range(start = "1949-01", periods = len(smoothed_data_1.index), freq = "M").to_period()
smoothed_data_1.index = smoothed_data_1.index.to_timestamp()

smoothed_data_2.index = smoothed_data_3.index = smoothed_data_1.index
smoothed_data_1.name = "alpha = 0.1"
smoothed_data_2.name = "alpha = 0.5"
smoothed_data_3.name = "alpha = 0.8"

#pd.date_range(start = "1949-01", periods = len(data.index), freq = "M").to_period()

plt.figure(figsize = (15, 10))
plt.plot(log_passengers)
plt.plot(smoothed_data_1, linestyle = "--")
plt.plot(smoothed_data_2, linestyle = "--")
plt.plot(smoothed_data_3, linestyle = "--", color = "red")
plt.legend()
plt.show()

Double Exponential Smoothing

Now imagine that we have not only a slowly evolving local level, but also a trend with a slowly evolving local slope. In such cases simple exponential smoothing does not do well. Then the optimal smoothing algorithm (Holt’s Linear Method) is as follows:

  1. Initialize at $t = 2$:

    1.1 $\widehat{Y_2} = Y_2$;

    1.2 $F_2 = Y_2 - Y_1$;

  2. Update for $t > 2$:

    2.1 $\widehat{Y_t} = \alpha Y_t + (1- \alpha)\cdot (\widehat{Y}_{t-1} + F_{t-1})$, where $0 < \alpha < 1$;

    2.2 $F_t = \beta (\widehat{Y}_t - \widehat{Y}_{t-1}) + (1 - \beta) F_{t-1}$, where $0 < \beta < 1$;

  3. Forecast: $\widehat{Y}_{T+h,T} = \widehat{Y}_T + h \cdot F_T$

Here $F_t$ is the best estimate of the trend and $\widehat{Y}_t$ is the smoothed series.

In [49]:
def exp_smooth_double(pars, y_t, return_type = "simple", forc_period = 12):
    a = pars[0]
    b = pars[1]
    T = len(y_t)
    new_y = np.zeros(T)
    new_f = np.zeros(T)
    
    for i in range(0, T):
        if i == 0:
            new_y[i] = y_t[i]
            new_f[i] = y_t[i]
        elif i == 1:
            new_y[i] = y_t[i]
            new_f[i] = y_t[i] - y_t[i-1]
        else:
            new_y[i] = a * y_t[i] + (1.0 - a) * (new_y[i-1] + new_f[i-1])
            new_f[i] = b * (new_y[i] - new_y[i-1]) + (1.0 -  b) * new_f[i-1]

    #Forecast ahead:
    y_forc = np.zeros(forc_period)
    for j in range(0, forc_period):
        y_forc[j] = new_y[T - 1] + (j + 1) * new_f[T - 1]
    
    if return_type == "simple":
        return new_y
    else:
        return {"Y": new_y, "F": new_f, "Forecast": y_forc}

The parameter $\alpha$ controls smoothing of the level and $\beta$ controls smoothing of the slope.

In [50]:
#Example of double exponential smoothing with different alpha parameters:
smoothed_data_1 = pd.Series(exp_smooth_double([0.1, 0.1], np.array(pd.Series.tolist(log_passengers)), return_type = "forcast"))
smoothed_data_2 = pd.Series(exp_smooth_double([0.3, 0.1], np.array(pd.Series.tolist(log_passengers)), return_type = "forcast"))
smoothed_data_3 = pd.Series(exp_smooth_double([0.6, 0.1], np.array(pd.Series.tolist(log_passengers)), return_type = "forcast"))

smoothed_data_1 = pd.Series(np.append(smoothed_data_1["Y"], smoothed_data_1["Forecast"]))
smoothed_data_2 = pd.Series(np.append(smoothed_data_2["Y"], smoothed_data_2["Forecast"]))
smoothed_data_3 = pd.Series(np.append(smoothed_data_3["Y"], smoothed_data_3["Forecast"]))

smoothed_data_1.index = pd.date_range(start = "1949-01", periods = len(smoothed_data_1.index), freq = "M").to_period()
smoothed_data_1.index = smoothed_data_1.index.to_timestamp()
smoothed_data_2.index = smoothed_data_3.index = smoothed_data_1.index

smoothed_data_1.name = "alpha = 0.1, beta = 0.1"
smoothed_data_2.name = "alpha = 0.3, beta = 0.1"
smoothed_data_3.name = "alpha = 0.6, beta = 0.1"

fig = plt.figure(figsize = (20, 10))
fig.add_subplot(121)
plt.plot(log_passengers)
plt.plot(smoothed_data_1, linestyle = "--")
plt.plot(smoothed_data_2, linestyle = "--")
plt.plot(smoothed_data_3, linestyle = "--", color = "red")
plt.legend()

#Example of double exponential smoothing with different beta parameters:
smoothed_data_1 = pd.Series(exp_smooth_double([0.1, 0.01], np.array(pd.Series.tolist(log_passengers)), return_type = "forcast"))
smoothed_data_2 = pd.Series(exp_smooth_double([0.1, 0.3], np.array(pd.Series.tolist(log_passengers)), return_type = "forcast"))
smoothed_data_3 = pd.Series(exp_smooth_double([0.1, 0.9], np.array(pd.Series.tolist(log_passengers)), return_type = "forcast"))

smoothed_data_1 = pd.Series(np.append(smoothed_data_1["Y"], smoothed_data_1["Forecast"]))
smoothed_data_2 = pd.Series(np.append(smoothed_data_2["Y"], smoothed_data_2["Forecast"]))
smoothed_data_3 = pd.Series(np.append(smoothed_data_3["Y"], smoothed_data_3["Forecast"]))

smoothed_data_1.index = pd.date_range(start = "1949-01", periods = len(smoothed_data_1.index), freq = "M").to_period()
smoothed_data_1.index = smoothed_data_1.index.to_timestamp()
smoothed_data_2.index = smoothed_data_3.index = smoothed_data_1.index

smoothed_data_1.name = "alpha = 0.1, beta = 0.01"
smoothed_data_2.name = "alpha = 0.1, beta = 0.3"
smoothed_data_3.name = "alpha = 0.1, beta = 0.9"

fig.add_subplot(122)
plt.plot(log_passengers)
plt.plot(smoothed_data_1, linestyle = "--")
plt.plot(smoothed_data_2, linestyle = "--")
plt.plot(smoothed_data_3, linestyle = "--", color = "red")
plt.legend()

plt.show()

Triple Exponential Smoothing

A method known as Holt-Winters method is based on three smoothing equations - one for the level, one for the trend and one for seasonality. It is similar to Holt’s linear method, with one additional equation dealing with seasonality.

Triple exponential smoothing applies exponential smoothing three times, which is commonly used when there are three high frequency signals to be removed from a time series under study. The use of a triple application is considered a rule of thumb technique, rather than one based on theoretical foundations and has often been over-emphasized by practitioners.

The smoothing algorithm (where the seasonal effect is additive) is as follows:

  1. Initialize:

    1.1 Set $\widehat{Y}_L = \dfrac{Y_1 + ... + Y_L}{L}$

    1.2 The general formula for the initial trend: $F_L = \dfrac{1}{L} \left(\dfrac{Y_{L+1} - Y_1}{L} + \dfrac{Y_{L+2} - Y_2}{L} + ... \dfrac{Y_{L+L} - Y_L}{L} \right)$

    1.3 Suppose a cycle of seasonal change is of length L so that $G_t = G_{t+L}$ and $N$ - the number of somplete cycles in the data. As a rule of thumb, a minimum of two full seasons (or 2L periods) of historical data is needed to initialize a set of seasonal factors. The initial estimates for the seasonal indices: $$ \begin{aligned} G_i &= \dfrac{1}{N}\sum_{j = 1}^N{\dfrac{Y_{L(j-1)+i}}{A_j}}, \quad i = 1,...,L\\ \text{where: } A_j &= \dfrac{1}{L} \sum_{k = 1}^L Y_{L(j-1)+k}, \quad j = 1,...,N \end{aligned} $$

    i.e. $A_j$ is the average value of $Y_t$ in the $j-th$ cycle of your data and $G_i$ is the average of the weighted values of the $i-th$ seasonal indice from all cycles.

    For example, if we have yearly data of $N$ years, then $A_1$ would be the average value of the first 12 month data $Y_1,...,Y_{12}$; and $A_2$ - the average of $Y_{13},...,Y_{24}$, etc.

    Then $G_1$ would be the average of the weighted values $\dfrac{Y_1}{A_1}, \dfrac{Y_{12 \cdot 2 + 1}}{A_2},..., \dfrac{Y_{12 \cdot N + 1}}{A_N}$, $G_2$ - the average of $\dfrac{Y_2}{A_1}, \dfrac{Y_{12 \cdot 2 + 2}}{A_2},..., \dfrac{Y_{12 \cdot N + 2}}{A_N}$ etc.

  2. Two possible scenarios:

    • Update for $t > L$ of an additive model:

    2.1 $S_t = \alpha \left( Y_t - G_{t-L}\right) + (1- \alpha)\cdot (S_{t-1} + F_{t-1})$, where $0 < \alpha < 1$;

    2.2 $F_t = \beta (S_t - S_{t-1}) + (1 - \beta) F_{t-1}$, where $0 < \beta < 1$;

    2.3 $G_t = \gamma \left( Y_t-S_{t} \right) + (1 - \gamma) G_{t-L}$

    2.4 Forecast: $\widehat{Y}_{t+h} = \widehat{Y}_t + h \cdot F_t + G_{t + h - L}$

    • Update for $t > L$ of a multiplicative model:

    2.1 $S_t = \alpha \left( \dfrac{Y_t}{G_{t-L}}\right) + (1- \alpha)\cdot (S_{t-1} + F_{t-1})$, where $0 < \alpha < 1$;

    2.2 $F_t = \beta (S_t - S_{t-1}) + (1 - \beta) F_{t-1}$, where $0 < \beta < 1$;

    2.3 $G_t = \gamma \left( \dfrac{Y_t}{S_t} \right) + (1 - \gamma) G_{t-L}$

    2.4 Forecast: $\widehat{Y}_{t+h} = (\widehat{Y}_t + h \cdot F_t) \cdot G_{t + h - L}$

Here $F_t$ is the best estimate of the trend, $G_t$ is the best estimate of the seasonal factors and $S_t$ is the smoothed series. We will show an example of a multiplicative model implementation:

In [51]:
def exp_smooth_triple(pars, y_t, return_type = "simple", L = 12):
    a = pars[0]
    b = pars[1]
    g = pars[2]

    T = len(y_t)
    cycles = np.floor(T / L)
    A = np.zeros(int(cycles))
    
    new_y = np.zeros(T)
    new_f = np.zeros(T)
    new_g = np.zeros(T)
    
    #Initialize values:
    new_y[0] = y_t[0] #np.mean(y_t[0:L])
    new_f[0] = np.mean(y_t[L:(2*L)] / L - y_t[0:L] / L)    
    
    #Calculate the seasonal average::
    for j in range(0, int(cycles)):
        A[j] = np.mean(y_t[(j*L):(j*L+L)])
    #Calculate the initial seasonal factors:
    for j in range(0, L):
        new_g[j] = np.mean(y_t[j::L] / A)
    
    for i in range(1, T):
        if i < L:
            #Double exponential smoothing for initial values:
            new_y[i] = a * y_t[i] + (1.0 - a) * (new_y[i-1] + new_f[i-1])
            new_f[i] = b * (new_y[i] - new_y[i-1]) + (1.0 -  b) * new_f[i-1]
        elif i >= L:
            new_y[i] = a * (y_t[i] / new_g[i - L]) + (1.0 - a) * (new_y[i-1] + new_f[i-1])
            new_f[i] = b * (new_y[i] - new_y[i-1]) + (1.0 -  b) * new_f[i-1]
            new_g[i] = g * (y_t[i] / new_y[i]) + (1.0 - g) * new_g[i - L]
    
    #Forecast ahead:
    forc_period = L
    h = list(range(1, int(forc_period + 1)))
    h = [float(i) for i in h]
    y_forc = (new_y[T - 1] + np.multiply(h, new_f[T - 1])) * new_g[-L:]
    
    #Fitted values for each cycle:
    y_fitted = np.zeros(T)
    #Set first period values to actual values to avoid any errors
    y_fitted[:L] = y_t[:L]
    for j in range(1, int(cycles)):
        y_fitted[(j*L):(j*L + L)] = (new_y[(j*L + L) - 1] + np.multiply(h, new_f[(j*L + L) - 1])) * new_g[((j - 1)*L):((j - 1)*L + L)]

    #Return the results
    if return_type == "simple":
        return y_fitted
    else:
        return {"Y": new_y, "F": new_f, "G": new_g, "Forecast": y_forc, "Fitted": y_fitted}

Optimization

For the exponential smoothing method we also need to choose the value for the smoothing parameters. For simple exponential smoothing, there is only one smoothing parameter $\alpha$. A robust and objective way to obtain values for the unknown parameters included in any exponential smoothing method is to estimate them from the observed data.

The unknown parameters and the initial values for any exponential smoothing method can be estimated by minimizing the SSE:

$$ \begin{aligned} SSE &= \sum_{t=1}^T e_t ^2\\ \text{where }& e_t = Y_t − \widehat{Y}_t \end{aligned} $$

Unlike the ordinary regression case (where we have formulas that return the values of the regression coefficients which minimize the SSE) this involves a non-linear minimization problem.

In [52]:
def exp_smooth_resid(pars, *args): # *args - lets us pass any number of additional arguments, where we only care about their number and not their name
    y_t = args[0]
    exp_type = args[1]
    
    if exp_type == "double":
        new_y = exp_smooth_double(pars, y_t, return_type = "simple")
    elif exp_type == "triple":
        L = args[2]
        new_y = exp_smooth_triple(pars, y_t, return_type = "simple", L = L)
    else:
        new_y = exp_smooth_simple(pars, y_t, return_type = "simple")
    
    residuals = y_t - new_y
    residuals_sq = sum([i ** 2 for i in residuals])
    return residuals_sq

We can optimize the functions via the L-BFGS-B algorithm:

In [53]:
pars = [0.5]
final_pars_1 = optimize.fmin_l_bfgs_b(func = exp_smooth_resid, # specify the funtion, which to optimize
                              x0 = pars, # specify the initial parameters
                              args = (np.array(pd.Series.tolist(log_passengers)), "simple"), #specify the additional fixed parameters to pass to the function which we are optimizing
                              bounds = [(0.01, 0.99)], approx_grad=True) #unless we can provide the analytical gradient expression
print("Simple Exponential smoothing alpha = ", round(final_pars_1[0][0], 4))

pars = [0.5, 0.5]
final_pars_2 = optimize.fmin_l_bfgs_b(func = exp_smooth_resid,
                              x0 = pars,
                              args = (np.array(pd.Series.tolist(log_passengers)), "double"),
                              bounds = [(0.01, 0.99), (0.01, 0.99)], approx_grad=True)
print("Double Exponential smoothing alpha = ", round(final_pars_2[0][0], 4), " beta = ", round(final_pars_2[0][1], 4))

pars = [0.5, 0.5, 0.5]
final_pars_3 = optimize.fmin_l_bfgs_b(func = exp_smooth_resid,
                              x0 = pars,
                              args = (np.array(pd.Series.tolist(log_passengers)), "triple", 12),
                              bounds = [(0.0, 1), (0.0, 1.0), (0.0, 1.0)], approx_grad=True)
print("Triple Exponential smoothing alpha = ", round(final_pars_3[0][0], 2), " beta = ", round(final_pars_3[0][1], 4), " gamma = ", round(final_pars_3[0][2], 4))
Simple Exponential smoothing alpha =  0.99
Double Exponential smoothing alpha =  0.99  beta =  0.0241
Triple Exponential smoothing alpha =  0.35  beta =  0.2275  gamma =  0.0
In [54]:
smoothed_data = pd.Series(exp_smooth_triple(final_pars_3[0], np.array(pd.Series.tolist(log_passengers)), return_type = "forcast", L = 12))
forecast_data = pd.Series(np.append(smoothed_data["Y"], smoothed_data["Forecast"]))
forecast_data[:len(smoothed_data["Y"])] = None
fitted_data = pd.Series(smoothed_data["Fitted"])
smoothed_data = pd.Series(smoothed_data["Y"])

forecast_data.index = pd.date_range(start = "1949-01", periods = len(forecast_data.index), freq = "M").to_period()
forecast_data.index = forecast_data.index.to_timestamp()
smoothed_data.index = pd.date_range(start = "1949-01", periods = len(smoothed_data.index), freq = "M").to_period()
smoothed_data.index = smoothed_data.index.to_timestamp()
fitted_data.index = smoothed_data.index

smoothed_data.name = "Triple Exponential Smoothing: Overall Smoothing"
forecast_data.name = "Triple Exponential Smoothing: Forecasts"
fitted_data.name = "Triple Exponential Smoothing: Fitted values"

plt.figure(figsize = (15, 10))
plt.plot(log_passengers)
plt.plot(fitted_data, linestyle = "--", color = "red")
plt.plot(smoothed_data, linestyle = "--", color = "orange")
plt.plot(forecast_data, linestyle = "--", color = "green")

plt.legend()
plt.show()

We can also look at the double exponential smoothing with the optimal parameters:

In [55]:
smoothed_data_double_result = pd.Series(exp_smooth_double(final_pars_2[0], np.array(pd.Series.tolist(log_passengers)), return_type = "forcast"))
In [56]:
smoothed_data_double = pd.Series(np.append(smoothed_data_double_result["Y"], smoothed_data_double_result["Forecast"]))

smoothed_data_double.index = pd.date_range(start = "1949-01", periods = len(smoothed_data_double.index), freq = "M").to_period()
smoothed_data_double.index = smoothed_data_double.index.to_timestamp()

smoothed_data_double.name = "Double Exponential Smoothing"
In [57]:
plt.figure(figsize = (15, 10))
plt.plot(log_passengers)
plt.plot(smoothed_data_double, linestyle = "--", color = "orange")

plt.legend()
plt.show()

Similar example with data.




Note: we have ommited checking the residual plots. Look at the residual series and ACF plots - are the residuals stationary? Are they WN?

Some examples are provided below

In [58]:
resid_OLS = fit_3.resid
tsdisplay(resid_OLS, max_lag = 40)

Note that only the first couple of lags are significant, while the latter ones are not as much.

In [59]:
resid_ets = log_passengers - smoothed_data_double_result["Y"]
tsdisplay(resid_ets, max_lag = 40)

Note that in the case of using double exponential smoothing, there appears to be a seasonality or cyclic effect as every so often we see a lag spike.