The following code provides examples on how to decompose a time series comprised of a deterministic trend component, a deterministic seasonal component and a (non-deterministic) stationary component, which could be either a $WN$, an $ARMA$ or a similar kind of process.
#Import the required modules for data generation
import numpy as np
#Import the required modules for plot creation:
import matplotlib.pyplot as plt
#Import the required modules for DataFrame creation:
import pandas as pd
#Import the required modules for TimeSeries data generation:
import statsmodels.api as sm
#Import the required modules for test statistic calculation:
import statsmodels.stats as sm_stat
#Import the required modules for model estimation:
import statsmodels.tsa as smt
#Import the required modules for optimization:
import scipy.optimize as optimize
#We also need additional data:
import statsmodels.formula.api as smf
The formula.api
hosts many of the same functions found in api
(e.g. OLS, GLM), but it also holds lower case counterparts for most of these models. In general, lower case models accept formula and df arguments, whereas upper case ones take endog and exog design matrices. formula
accepts a string which describes the model in terms of a patsy
formula. df takes a pandas data frame
.
dir(smf)
will print a list of available models.
dir(smf)
Often it is important to establish whether a stationary process under consideration is $WN$. To achieve this, we can use the graphs of the $ACF$ and $PACF$ functions.
We begin by redefining R's
tsdisplay()
and tsdiag()
functions:
#Define our tsdisplay function (equivalent to R):
def tsdisplay(y, figsize = (14,8), title = "", max_lag = 20):
tmp_data = pd.Series(y)
fig = plt.figure(figsize = figsize)
#Plot the time series
tmp_data.plot(ax = fig.add_subplot(211), title = "$Time\ Series\ " + title + "$", legend = False)
#Plot the ACF:
sm.graphics.tsa.plot_acf(tmp_data, lags = max_lag, zero = False, ax = fig.add_subplot(223))
#Plot the PACF:
sm.graphics.tsa.plot_pacf(tmp_data, lags = max_lag, zero = False, ax = fig.add_subplot(224))
#Fix the layout of the plots:
plt.tight_layout()
#Define our tsdiag function (equivalent to R):
def tsdiag(y, figsize = (14,8), title = ""):
#The data:
tmp_data = pd.Series(y)
#The standardized data:
tmp_stand_data = pd.Series(y / np.sqrt(y.var()))
#The Ljung-Box test results for the first 10 lags:
p_vals = pd.Series(sm_stat.diagnostic.acorr_ljungbox(tmp_data, lags = 10)[1])
#Start the index from 1 instead of 0 (because Ljung-Box test is for lag values from 1 to k)
p_vals.index += 1
fig = plt.figure(figsize = figsize)
#Plot the time series
tmp_stand_data.plot(ax = fig.add_subplot(311), title = "$Standardized\ Time\ Series\ " + title + "$", legend = False)
#Plot the ACF:
sm.graphics.tsa.plot_acf(tmp_data, lags = 20, zero = False, ax = fig.add_subplot(312))
#Plot the PACF:
p_vals.plot(ax = fig.add_subplot(313), linestyle='', marker='o', title = "p-values for Ljung-Box statistic", legend = False)
#Add the horizontal 0.05 critical value line
plt.axhline(y = 0.05, color = 'blue', linestyle='--')
# Annotate the p-value points above and to the left of the vertex
x = np.arange(p_vals.size) + 1
for X, Y, Z in zip(x, p_vals, p_vals):
plt.annotate(round(Z, 4), xy=(X,Y), xytext=(-5, 5), ha = 'left', textcoords='offset points')
#Change the range of the axis:
plt.xlim(xmin = 0)
plt.xlim(xmax = p_vals.size + 1)
#Change the X axis tick marks - make them from 1 to the max. number of lags in the L-jung-Box test
plt.xticks(np.arange(1, p_vals.size + 1, 1.0))
#Fix the layout of the plots:
plt.tight_layout()
These functions make it easier for us to analyse the series and we do not need to repeat the same code for plotting the same type of plots.
#Set the seed:
np.random.seed(1)
nsample = 100
x1 = np.random.normal(loc = 0.0, scale = 1.0, size = nsample)
tsdisplay(x1)
While the process was generated from a normal distribution, we can see that the $ACF$ and $PACF$ plots show that the third lags are significantly different from zero, which makes it difficult to say if the series is uncorrelated if we do not know the underlying data generating process.
tsdiag(x1)
All the bubbles (p-values) in the bottom graph are above the 0.05 blue line, thus the time series is White Noise.
Some R
datasets are available online or via sm.datasets.get_rdataset("dataset", "package")
:
data = pd.read_csv("https://raw.githubusercontent.com/vincentarelbundock/Rdatasets/master/csv/datasets/AirPassengers.csv")
print(data.head())
Tidy up the dataset:
data = pd.Series(data["value"])
data.index = pd.date_range(start = "1949-01", periods = len(data.index), freq = "M").to_period()
data.index = data.index.to_timestamp() #to_period() causes erros with decomposition methods
data.head()
data.plot()
It appears that it may be a multiplicative series, $Y_t = T_t \cdot S_t \cdot E_t$, so we will take logarithms:
log_passengers = np.log(data)
log_passengers.head()
print(log_passengers.plot())
Now the time series looks like an additive model, $Y_t = T_t + S_t + E_t$.
tsdisplay(log_passengers, max_lag = 40)
We see that the ACF decayse very slowly. this indicates a trend effect.
We can see that the ACF plot exhibits a larger correlation at around every $12^{th}$ lag (at lags 12, 24, 36, etc.). This is an indication that there is a seasonal (or cyclic) component in our data.
We will provide an example of the moving-average method for decomposition.
If we select the length of the moving average as an odd number, for example $l = 3$, then:
$$\widehat{T}_t = \dfrac{Y_{t-1} + Y_t + Y_{t+1}}{3}$$
$$\widehat{T}_t = \dfrac{Y_{t-2} + Y_{t-1} + Y_t}{3}$$
are calculated for integer times $t$.
If the time series contains a seasonal component and we want ot averate it out, the length of the moving average must be equal to the seasonal frequency (for a monthly series, the length $l = 12$). However, there is a slight hurdle - suppose, our times series begin in January ($t = 1$) and we average up to December ($t = 12$). This average corresponds to a time $t = 6.5$ - a time between June and July.
When we come to estimate seasonal effects, we need a moving-average at integer times. This can be achieved by averaging the average of January to December and the average of February ($t = 2$) to January ($t = 13$). This average of the two moving averages corresponds to $t = 7$ and the process is called centering.
thus, the trend at time $t$ can be estimated by the centered moving average:
Two-sided averaging: $$ \begin{aligned} \widehat{T}_t &= \dfrac{(Y_{t-6} +...+ Y_{t+5})/12 + (Y_{t-5} +...+ Y_{t+6})/12}{2} = \dfrac{(1/2)Y_{t-6} + Y_{t-5}...+ Y_{t+5}+ (1/2)Y_{t+6}}{12} \end{aligned} $$
One-sided averaging: $$ \begin{aligned} \widehat{T}_t &= \dfrac{(Y_{t-12} +...+ Y_{t-1})/12 + (Y_{t-11} +...+ Y_{t})/12}{2} = \dfrac{(1/2)Y_{t-12} + Y_{t-11}...+ Y_{t-1}+ (1/2)Y_{t}}{12} \end{aligned} $$
Note that the weights are different for the first and last elements used in the averaging. we can use the statsmodels.tsa.seasonal.seasonal_decompose to automatically decompose our series:
decomposition_1 = smt.seasonal.seasonal_decompose(log_passengers, model = "additive", freq = 12, two_sided = True)
trend_1 = decomposition_1.trend
plt.figure(figsize = (10, 6))
log_passengers.plot()
trend_1.plot()
plt.show()
We can see that, because we used a two-sided decomposition, we cannot calculate the trend for the first and last 6 months of our monthly time series.
If we specify a one-sided decomposition, then we will not be able to calculate the trend for the first 12 months:
decomposition_2 = smt.seasonal.seasonal_decompose(log_passengers, model = "additive", freq = 12, two_sided = False)
trend_2 = decomposition_2.trend
plt.figure(figsize = (10, 6))
log_passengers.plot()
trend_2.plot()
plt.show()
If we reduce the frequency, then we will begin to capture the seasonal part with our trend filter. for example, selecting the moving average length $l = 3$:
decomposition_3 = smt.seasonal.seasonal_decompose(log_passengers, model = "additive", freq = 3, two_sided = False)
trend_3 = decomposition_3.trend
plt.figure(figsize = (10, 6))
log_passengers.plot()
trend_3.plot()
plt.show()
From the formulas - this is because we are using fewer observations around time $t$. So, our estimated trend values would be closer to the overall series value. However, in doing so, we are not estimating the trend component correctly. As such, it is important to correctly determine whether your data exhibits a seasonal component and if so - to correctly determine the seasonal frequency.
We could also try to specify our own function to decompose the trend only:
import math as math
def moving_average_decompose(y, l = 12):
tmp_index = y.index
#Time Series:
y = np.array(y)
#Trend Component:
t_two_side = pd.np.zeros(y.size)
t_one_side = pd.np.zeros(y.size)
tmp_val = 0
#Loop through each time point:
for j in range(0, y.size):
###########################################################
# The length is even - use centered averaging:
if l % 2 == 0:
# The centered averaging method weights - see the formula example with l = 12.
tmp_weights = np.concatenate([np.array([1 / (l * 2)]),
np.repeat(1 / l, l - 1),
np.array([1 / (l * 2)])])
#Two-sided
if j + 1 <= l / 2 or j + 1 > y.size - l / 2:
# We do not have enough data to the left or right of the series to calculate trend at time t = j:
t_two_side[j] = np.nan
else:
tmp_val = y[int(j - (l / 2)):int(j + (l / 2 + 1))] * tmp_weights
t_two_side[j] = tmp_val.sum()
#One-sided
if j + 1 <= l:
t_one_side[j] = np.nan
else:
tmp_val = y[int(j - l):int(j + 1)] * tmp_weights
t_one_side[j] = tmp_val.sum()
###########################################################
# The length is odd:
else:
#For the non-centered averaging the weights are simply the mean, i.e. the length l.
#tmp_weights = np.repeat(1 / l, l)
#Two-sided
if j + 1 <= math.floor(l / 2) or j + 1 > y.size - math.floor(l / 2):
# We do not have enough data to the left or right of the series to calculate trend at time t = j:
t_two_side[j] = np.nan
else:
tmp_val = y[int(j - math.floor(l / 2)):int(j + (math.floor(l / 2) + 1))]
t_two_side[j] = tmp_val.mean()
#One-sided
if j * 2 <= l:
t_one_side[j] = np.nan
else:
tmp_val = y[int(j - l + 1):int(j + 1)]
t_one_side[j] = tmp_val.mean()
#Return the same time series:
t_one_side = pd.Series(t_one_side, name = "One-Sided MA with length = " + str(l))
t_one_side.index = tmp_index
t_two_side = pd.Series(t_two_side, name = "Two-Sided MA with length = " + str(l))
t_two_side.index = tmp_index
return {"Two-sided": t_two_side, "One-sided": t_one_side}
trend_rezults = moving_average_decompose(log_passengers)
plt.figure(figsize = (10, 6))
log_passengers.plot()
trend_rezults["One-sided"].plot()
trend_rezults["Two-sided"].plot()
plt.legend()
plt.show()
trend_low_frq = moving_average_decompose(log_passengers, l = 3)
plt.figure(figsize = (10, 6))
log_passengers.plot()
trend_low_frq["One-sided"].plot()
plt.legend()
plt.show()
We can compare the two functions and we see that they produce the same results:
comparison = trend_rezults["Two-sided"] - trend_1
print("Two-Sided MA, l = 12: max. difference = ", round(comparison.max(), 5), "; min. difference = ", round(comparison.min(), 5))
comparison = trend_rezults["One-sided"] - trend_2
print("One-Sided MA, l = 12: max. difference = ", round(comparison.max(), 5), "; min. difference = ", round(comparison.min(), 5))
comparison = trend_low_frq["One-sided"] - trend_3
print("One-Sided MA, l = 3: max. difference = ", round(comparison.max(), 5), "; min. difference = ", round(comparison.min(), 5))
Having estimated the trend, we can move on to estimating the seasonal effect.
We will decompose the seasonal effect using both trend averaging methods and examine how the decomposition fits the data:
Now, let's begin by defining the above procedure (note - we created this procedure with the assumption that the time series starts from the first season, i.e. January):
def decomp_seas(y, trend, period = 12, title = ""):
#Initial seasonal component estimation:
seas_0 = y - trend
seas_1 = pd.np.zeros(period)
#Seasonal component averaging:
for j in range(0, period):
#Select every j-th month from each year in the dataset:
seas_1[j] = seas_0[j::period].mean()
#Adjustment factor:
w = seas_1.mean()
seas_1 = seas_1 - w
#Save the seasonal component in a time series format:
seasonal_results = pd.np.zeros(y.size)
for j in range(0, period):
#Select every j-th month from each year in the dataset:
seasonal_results[j::period] = seas_1[j]
seasonal_results = pd.Series(seasonal_results, name = "Seasonal " + title)
seasonal_results.index = trend.index
return(seasonal_results)
The estimated seasonal component:
seas_results = {"One-sided": decomp_seas(y = log_passengers, trend = trend_rezults["One-sided"], period = 12, title = "(One-sided)"),
"Two-sided": decomp_seas(y = log_passengers, trend = trend_rezults["Two-sided"], period = 12, title = "(Two-sided)")}
We can compare how the seasonal components look like on their own:
plt.figure(figsize = (15, 6))
seas_results["One-sided"].plot()
seas_results["Two-sided"].plot()
plt.legend()
plt.tight_layout()
plt.show()
As well as add them to the trend component $\widehat{T}_t + \tilde{S}_t$:
T_S = {"One-sided": trend_rezults["One-sided"] + seas_results["One-sided"],
"Two-sided": trend_rezults["Two-sided"] + seas_results["Two-sided"]}
T_S["One-sided"].name = "Trend + Season"
T_S["Two-sided"].name = "Trend + Season"
fig = plt.figure(figsize = (15, 8))
log_passengers.plot(ax = fig.add_subplot(211), title = "One-sided decomposition")
trend_rezults["One-sided"].plot()
T_S["One-sided"].plot()
plt.legend()
log_passengers.plot(ax = fig.add_subplot(212), title = "Two-sided decomposition")
trend_rezults["Two-sided"].plot()
T_S["Two-sided"].plot()
plt.legend()
plt.tight_layout()
plt.show()
The decomposed trend and seasonality fit the data much nicer compared to only the trend component.
We can do the same with the results from smt.seasonal.seasonal_decompose
. First, we note that the estimated seasonal component does not differ from the one we estimated manually:
comparison = seas_results["Two-sided"] - decomposition_1.seasonal
print("Two-Sided MA, l = 12: max. difference = ", round(comparison.max(), 5), "; min. difference = ", round(comparison.min(), 5))
comparison = seas_results["One-sided"] - decomposition_2.seasonal
print("One-Sided MA, l = 12: max. difference = ", round(comparison.max(), 5), "; min. difference = ", round(comparison.min(), 5))
We can also print the output from smt.seasonal.seasonal_decompose
:
print(decomposition_1.plot())
fig = plt.figure(figsize = (15, 6))
log_passengers.plot(ax = fig.add_subplot(212), title = "One-sided decomposition")
decomposition_2.trend.plot()
(decomposition_2.seasonal + decomposition_2.trend).plot()
log_passengers.plot(ax = fig.add_subplot(211), title = "Two-sided decomposition")
decomposition_1.trend.plot()
(decomposition_1.seasonal + decomposition_1.trend).plot()
plt.tight_layout()
plt.show()
Finally, we can check if the remained is stationary. We can either calculated it manually:
fig = plt.figure(figsize = (10, 4))
(log_passengers - T_S["One-sided"]).plot(ax = fig.add_subplot(121), title = "One-sided decomposition Residuals")
(log_passengers - T_S["Two-sided"]).plot(ax = fig.add_subplot(122), title = "Two-sided decomposition Residuals")
plt.tight_layout()
plt.show()
Or exctract it from thesmt.seasonal.seasonal_decompose
output:
decomposition_1.resid
fig = plt.figure(figsize = (10, 4))
decomposition_2.resid.plot(ax = fig.add_subplot(121), title = "One-sided decomposition Residuals")
decomposition_1.resid.plot(ax = fig.add_subplot(122), title = "Two-sided decomposition Residuals")
plt.tight_layout()
plt.show()
We note that the residuals might not be stationary - compare the variances from different time periods:
print("Two-sided decomp. Variance for [1949-01-01 - 1954-01-01]: ", round(decomposition_1.resid.loc['19490101':'19540101'].var(), 6))
print("Two-sided decomp. Variance for [1954-02-01 - 1960-12-01]: ", round(decomposition_1.resid.loc['19540201':'19600101'].var(), 6))
print("-----------------------------------------------------------")
print("One-sided decomp. Variance for [1949-01-01 - 1954-01-01]: ", round(decomposition_2.resid.loc['19490101':'19540101'].var(), 6))
print("One-sided decomp. Variance for [1954-02-01 - 1960-12-01]: ", round(decomposition_2.resid.loc['19540201':'19600101'].var(), 6))
The residuals in the second-half of the time series appear to have a smaller variance.
Let's also look at the residual correlation plots:
fig = plt.figure(figsize = (14, 5))
fig = sm.graphics.tsa.plot_acf(decomposition_1.resid.dropna(), lags=40, ax = fig.add_subplot(221), title = "ACF of Two-sided decomposition Residuals")
fig = sm.graphics.tsa.plot_acf(decomposition_2.resid.dropna(), lags=40, ax = fig.add_subplot(222), title = "ACF of One-sided decomposition Residuals")
fig = sm.graphics.tsa.plot_acf(decomposition_1.resid.dropna(), lags=40, ax = fig.add_subplot(223), title = "PACF of Two-sided decomposition Residuals")
fig = sm.graphics.tsa.plot_acf(decomposition_2.resid.dropna(), lags=40, ax = fig.add_subplot(224), title = "PACF of One-sided decomposition Residuals")
plt.tight_layout()
plt.show()
The series is not White Noise, so we would try to estimate an $ARMA$ model for it. However, it seems that the two-sided decomposition still leaves a seasonality effect as shown by the oscillating residual ACF plot.
Looking at the residual plots from this decomposition, we note that they do not appear to be stationary for the two-sided decomposition method on this data - the variance is smaller in the middle of the data, compared to the beginning of the time series. It also seems to be the case for the one-sided decomposition residuals as well.
We can see that this decomposition method does not fully eliminate the deterministic components from this specific time series data sample.
Note that this method does not allow to produce forecasts.
The OLS method estimates coefficients of an additive model. If needed, we can also specify not only a linear but also a polynomial trend. We will estimate three different models via OLS:
where:
Recall, that in order to avoid the dummy variable trap (a scenario in which the independent variables are multicollinear (highly correlated)) we have to exclude one dummy variable - in this case $\text{dm}_{12,t}$ - from our regression models.
tsdisplay(data)
From the ACF slow, wavy decay it seems that we have a trend and a seasonal component. From the time series plot is seems that the magnitude of seasonal variance increases as time increases. So it seems to be either an exponential or a polynomial growing time series.
We need to create the additional variables:
AP = data
#Create trend
time = np.array(range(1, len(data.index) + 1))
time_sq = time ** 2
#Create a dummy variable matrix of zeroes:
dm = np.column_stack(np.zeros((11, AP.size)))
#Create a DataFrame
dm = pd.DataFrame(dm)
#Add values to dummy variables:
for j in range(0, len(dm.columns)):
#Select every j-th month from each year in the dataset:
dm.iloc[j::12, j] = 1
#x_matrix = np.column_stack((a,b))
dtf = np.column_stack((AP, time, time_sq, dm))
#Create a DataFrame
dtf = pd.DataFrame(dtf)
#Rename the columns:
dtf.columns = ["AP"] + ["time"] + ["time_sq"] + ["dm_" + str(number) for number in range(1, 12)]
dtf.index = [str(number) for number in range(1, len(dtf.index) + 1)]
dtf.head(13)
Now we can estimate the models:
def get_coef(my_fitted_model):
#Get the parameters and their pvalues
my_index = my_fitted_model.pvalues.index.format()
rez = np.array([my_fitted_model.params, my_fitted_model.pvalues]).T
rez = pd.DataFrame(rez)
rez.index = my_index
rez.columns = ["coef", "p_value"]
return rez
AP_OLS_1 = smf.ols(formula = 'AP ~ time + time_sq', data = dtf)
AP_OLS_2 = smf.ols(formula = 'AP ~ time + time_sq + dm_1 + dm_2 + dm_3 + dm_4 + dm_5 + dm_6 + dm_7 + dm_8 + dm_9 + dm_10 + dm_11', data = dtf)
AP_OLS_3 = smf.ols(formula = 'np.log(AP) ~ time + time_sq + dm_1 + dm_2 + dm_3 + dm_4 + dm_5 + dm_6 + dm_7 + dm_8 + dm_9 + dm_10 + dm_11', data = dtf)
The first, quadratic model has all significanc coefficients:
fit_1 = AP_OLS_1.fit()
print(round(get_coef(fit_1), 5))
Similarly, we can say the same about the second model:
fit_2 = AP_OLS_2.fit()
print(round(get_coef(fit_2), 5))
While the January, February and December dummy variables are not statistically significantly different from zero (and as such, these months are not significantly different from the baseline case of December), most of the remaining dummy variables are statistically significant, so we can conclude that the seasonal effect is significant in our data.
In the final model we "linearized" the time series by taking the logarithm. We see that we come to the same conclusions regarding coefficient significance as in the second model.
fit_3 = AP_OLS_3.fit()
print(round(get_coef(fit_3), 5))
Finally, it would be interesting to predict the series with each model. For that, we need to create a new sample of explanatory variables to predict and plot.
dtf_forc = dtf.iloc[0:36, 1:]
dtf_forc.index = [str(number) for number in range(len(dtf.index) + 1, len(dtf.index) + len(dtf_forc) + 1)]
dtf_forc["time"] = np.array(range(int(dtf_forc.index[0]), int(dtf_forc.index[-1]) + 1))
dtf_forc["time_sq"] = dtf_forc["time"] ** 2
dtf_forc.head()
Forecast the data:
#Calculate forecasts
AP_forc_1 = fit_1.predict(dtf_forc)
AP_forc_2 = fit_2.predict(dtf_forc)
AP_forc_3 = fit_3.predict(dtf_forc)
Combine into a single Time Series with fitted and forecast values:
#Combine into a single time series with FITTED and FORECAST values:
AP_fited_1 = np.concatenate((fit_1.predict(), np.array(AP_forc_1)))
AP_fited_1 = pd.Series(AP_fited_1)
#Add the correct time index:
AP_fited_1.index = pd.date_range(start = "1949-01", periods = len(AP_fited_1.index), freq = "M").to_period()
AP_fited_1.index = AP_fited_1.index.to_timestamp()
AP_fited_1.name = "Fitted"
AP_fited_2 = np.concatenate((fit_2.predict(), np.array(AP_forc_2)))
AP_fited_2 = pd.Series(AP_fited_2)
AP_fited_2.index = AP_fited_1.index
AP_fited_2.name = AP_fited_1.name
AP_fited_3 = np.concatenate((fit_3.predict(), np.array(AP_forc_3)))
AP_fited_3 = pd.Series(AP_fited_3)
AP_fited_3.index = AP_fited_1.index
AP_fited_3.name = AP_fited_1.name
Plot the predicted time series alongside the actual data:
fig = plt.figure(figsize = (15, 5))
AP_fited_1.plot(ax = fig.add_subplot(131), linestyle = "--", title = "Quadratic")
AP.plot()
plt.legend()
AP_fited_2.plot(ax = fig.add_subplot(132), linestyle = "--", title = "Quadratic + Seasonal")
AP.plot()
plt.legend()
AP_fited_3.plot(ax = fig.add_subplot(133), linestyle = "--", title = "Quadratic + Seasonal for log(Airpass)")
log_passengers.plot()
plt.legend()
plt.tight_layout()
plt.show()
While the graphs in the central attempts to capture the seasonality effect and it does so better than the left-side graph, we can see that the actual data fluctuations increase in size together with the level of airpass. So an additive model is unable to catch this effect. The common approach to this problem is to take the logarithms and estimate the additive model (right-side graph).
We will begin by examining the Shampoo sales data (available in fma
package in R
):
shampoo = pd.read_csv("https://raw.githubusercontent.com/blue-yonder/pydse/master/pydse/data/sales-of-shampoo-over-a-three-ye.csv", sep =";")
shampoo = pd.Series(shampoo.iloc[:, 1])
shampoo.plot()
plt.show()
If an economic indicator increases every year by the same rate, then it exhibits an exponential growth. to model this kind of time series, an exponential model would be needed.
However, modelling becomes rather complicated because to extract the trend we need to estimate a non-linear regression: $$Y_t = \alpha + \beta \cdot \exp \left( \gamma \cdot t \right) + \epsilon_t$$ Let's begin by defining the function, which we will be optimizing: $$ \begin{aligned} S &= \sum_{t = 1}^T r_t^2 \left(\alpha, \beta, \gamma \right) \rightarrow_{\alpha, \beta, \gamma} 0 \\ \text{where }& r_t\left(\alpha, \beta, \gamma \right) = Y_t - \alpha + \beta \cdot \exp \left( \gamma \cdot t \right) \end{aligned} $$
def func(pars, y_t, time):
a = pars[0]
b = pars[1]
g = pars[2]
r_t = a + b * np.exp(g * time)
residuals = y_t - r_t
return residuals
Now, we can optimize this function with the following method (note - we need to specify some initial parameter guess):
pars = [1.0, 1.0, 1.0]
time = np.array(range(1, len(shampoo.index) + 1))
final_pars = optimize.leastsq(func, # specify the funtion, which to optimize
pars, # specify the initial parameters
args = (shampoo, time) #specify the additional fixed parameters to pass to the function which we are optimizing
)
print(final_pars[0])
This method minimizes the sum of squares of a set of equations.
This approximation assumes that the objective function is based on the difference between some observed target data (ydata) and a (non-linear) function of the parameters func(xdata, params):
func(params) = ydata - f(xdata, params)
.
Now we can fit the data:
fitted_vals = final_pars[0][0] + final_pars[0][1] * np.exp(final_pars[0][2] * time)
fitted_vals = pd.Series(fitted_vals)
fitted_vals.index = shampoo.index
fitted_vals.name = "Fitted values via NLS"
shampoo.plot()
fitted_vals.plot()
plt.legend()
plt.show()
Let us also check the residuals (remember that our defined function already calculates them):
resid = func(final_pars[0], shampoo, time)
#Plot residuals:
fig = plt.figure(figsize = (15, 5))
fig.add_subplot(121)
resid.plot(linestyle = '--', marker='o')
plt.axhline(y = 0, color='r', linestyle='-')
#Plot residual histogram
fig.add_subplot(122)
plt.hist(resid, histtype='bar', rwidth = 0.9, edgecolor = "black")
plt.show()
The spread of the residuals is more or less constant, however, they do not appear to be normal.
One of the drawbacks of the OLS is its globality - if the time series departs for some reason from the trend at the final moment T, this will change all the coefficients of the model and therefore the predicted value at moment t = 1.
The local methods are free of this defect - their forecast at moment t is (mostly) influenced by the values at close time moments.
Exponential smoothing is a technique that can be applied to times series data, either to produce smoothed data for presentation or to make forecasts.
While in the moving average decomposition the past observations are weighted equally, exponential functions are used to assign exponentially decreasing weights over time. It is an easily applied procedure for making some determination based on prior assumptions by the user, such as seasonality. Exponential smoothing is often used for analysis of time-series data.
We state the simple exponential smoothing procedure as an algorithm for converting the observed series $Y_t$ into a smoothed series $\widehat{Y}_t$, $t = 1,...,T$ and forecasts $\widehat{Y}_{T+h,T}$:
def exp_smooth_simple(pars, y_t, return_type = "simple", forc_period = 12):
a = pars[0]
T = len(y_t)
#print(T)
new_y = np.zeros(T)
for i in range(0, T): #note - the first element (when i = 0 is unchanged
if i == 0:
new_y[i] = y_t[i]
else:
new_y[i] = a * y_t[i] + (1.0 - a) * new_y[i-1]
#Forecast ahead:
y_forc = np.zeros(forc_period)
for j in range(0, forc_period):
y_forc[j] = new_y[T - 1]
if return_type == "simple":
return new_y
else:
return {"Y": new_y, "Forecast": y_forc}
The smaller $\alpha$ is, the smoother the estimated level. As $\alpha$ approaches 0, the smoothedseries approaches constantcy. As $\alpha$ approaches 1, the smoothes series pproaches point-by-point interpolation.
smoothed_data_1 = exp_smooth_simple([0.1], np.array(pd.Series.tolist(log_passengers)), return_type = "forcast")
smoothed_data_2 = exp_smooth_simple([0.5], np.array(pd.Series.tolist(log_passengers)), return_type = "forcast")
smoothed_data_3 = exp_smooth_simple([0.8], np.array(pd.Series.tolist(log_passengers)), return_type = "forcast")
smoothed_data_1 = pd.Series(np.append(smoothed_data_1["Y"], smoothed_data_1["Forecast"]))
smoothed_data_2 = pd.Series(np.append(smoothed_data_2["Y"], smoothed_data_2["Forecast"]))
smoothed_data_3 = pd.Series(np.append(smoothed_data_3["Y"], smoothed_data_3["Forecast"]))
smoothed_data_1.index = pd.date_range(start = "1949-01", periods = len(smoothed_data_1.index), freq = "M").to_period()
smoothed_data_1.index = smoothed_data_1.index.to_timestamp()
smoothed_data_2.index = smoothed_data_3.index = smoothed_data_1.index
smoothed_data_1.name = "alpha = 0.1"
smoothed_data_2.name = "alpha = 0.5"
smoothed_data_3.name = "alpha = 0.8"
#pd.date_range(start = "1949-01", periods = len(data.index), freq = "M").to_period()
plt.figure(figsize = (15, 10))
plt.plot(log_passengers)
plt.plot(smoothed_data_1, linestyle = "--")
plt.plot(smoothed_data_2, linestyle = "--")
plt.plot(smoothed_data_3, linestyle = "--", color = "red")
plt.legend()
plt.show()
Now imagine that we have not only a slowly evolving local level, but also a trend with a slowly evolving local slope. In such cases simple exponential smoothing does not do well. Then the optimal smoothing algorithm (Holt’s Linear Method) is as follows:
Initialize at $t = 2$:
1.1 $\widehat{Y_2} = Y_2$;
1.2 $F_2 = Y_2 - Y_1$;
Update for $t > 2$:
2.1 $\widehat{Y_t} = \alpha Y_t + (1- \alpha)\cdot (\widehat{Y}_{t-1} + F_{t-1})$, where $0 < \alpha < 1$;
2.2 $F_t = \beta (\widehat{Y}_t - \widehat{Y}_{t-1}) + (1 - \beta) F_{t-1}$, where $0 < \beta < 1$;
Forecast: $\widehat{Y}_{T+h,T} = \widehat{Y}_T + h \cdot F_T$
Here $F_t$ is the best estimate of the trend and $\widehat{Y}_t$ is the smoothed series.
def exp_smooth_double(pars, y_t, return_type = "simple", forc_period = 12):
a = pars[0]
b = pars[1]
T = len(y_t)
new_y = np.zeros(T)
new_f = np.zeros(T)
for i in range(0, T):
if i == 0:
new_y[i] = y_t[i]
new_f[i] = y_t[i]
elif i == 1:
new_y[i] = y_t[i]
new_f[i] = y_t[i] - y_t[i-1]
else:
new_y[i] = a * y_t[i] + (1.0 - a) * (new_y[i-1] + new_f[i-1])
new_f[i] = b * (new_y[i] - new_y[i-1]) + (1.0 - b) * new_f[i-1]
#Forecast ahead:
y_forc = np.zeros(forc_period)
for j in range(0, forc_period):
y_forc[j] = new_y[T - 1] + (j + 1) * new_f[T - 1]
if return_type == "simple":
return new_y
else:
return {"Y": new_y, "F": new_f, "Forecast": y_forc}
The parameter $\alpha$ controls smoothing of the level and $\beta$ controls smoothing of the slope.
#Example of double exponential smoothing with different alpha parameters:
smoothed_data_1 = pd.Series(exp_smooth_double([0.1, 0.1], np.array(pd.Series.tolist(log_passengers)), return_type = "forcast"))
smoothed_data_2 = pd.Series(exp_smooth_double([0.3, 0.1], np.array(pd.Series.tolist(log_passengers)), return_type = "forcast"))
smoothed_data_3 = pd.Series(exp_smooth_double([0.6, 0.1], np.array(pd.Series.tolist(log_passengers)), return_type = "forcast"))
smoothed_data_1 = pd.Series(np.append(smoothed_data_1["Y"], smoothed_data_1["Forecast"]))
smoothed_data_2 = pd.Series(np.append(smoothed_data_2["Y"], smoothed_data_2["Forecast"]))
smoothed_data_3 = pd.Series(np.append(smoothed_data_3["Y"], smoothed_data_3["Forecast"]))
smoothed_data_1.index = pd.date_range(start = "1949-01", periods = len(smoothed_data_1.index), freq = "M").to_period()
smoothed_data_1.index = smoothed_data_1.index.to_timestamp()
smoothed_data_2.index = smoothed_data_3.index = smoothed_data_1.index
smoothed_data_1.name = "alpha = 0.1, beta = 0.1"
smoothed_data_2.name = "alpha = 0.3, beta = 0.1"
smoothed_data_3.name = "alpha = 0.6, beta = 0.1"
fig = plt.figure(figsize = (20, 10))
fig.add_subplot(121)
plt.plot(log_passengers)
plt.plot(smoothed_data_1, linestyle = "--")
plt.plot(smoothed_data_2, linestyle = "--")
plt.plot(smoothed_data_3, linestyle = "--", color = "red")
plt.legend()
#Example of double exponential smoothing with different beta parameters:
smoothed_data_1 = pd.Series(exp_smooth_double([0.1, 0.01], np.array(pd.Series.tolist(log_passengers)), return_type = "forcast"))
smoothed_data_2 = pd.Series(exp_smooth_double([0.1, 0.3], np.array(pd.Series.tolist(log_passengers)), return_type = "forcast"))
smoothed_data_3 = pd.Series(exp_smooth_double([0.1, 0.9], np.array(pd.Series.tolist(log_passengers)), return_type = "forcast"))
smoothed_data_1 = pd.Series(np.append(smoothed_data_1["Y"], smoothed_data_1["Forecast"]))
smoothed_data_2 = pd.Series(np.append(smoothed_data_2["Y"], smoothed_data_2["Forecast"]))
smoothed_data_3 = pd.Series(np.append(smoothed_data_3["Y"], smoothed_data_3["Forecast"]))
smoothed_data_1.index = pd.date_range(start = "1949-01", periods = len(smoothed_data_1.index), freq = "M").to_period()
smoothed_data_1.index = smoothed_data_1.index.to_timestamp()
smoothed_data_2.index = smoothed_data_3.index = smoothed_data_1.index
smoothed_data_1.name = "alpha = 0.1, beta = 0.01"
smoothed_data_2.name = "alpha = 0.1, beta = 0.3"
smoothed_data_3.name = "alpha = 0.1, beta = 0.9"
fig.add_subplot(122)
plt.plot(log_passengers)
plt.plot(smoothed_data_1, linestyle = "--")
plt.plot(smoothed_data_2, linestyle = "--")
plt.plot(smoothed_data_3, linestyle = "--", color = "red")
plt.legend()
plt.show()
A method known as Holt-Winters method is based on three smoothing equations - one for the level, one for the trend and one for seasonality. It is similar to Holt’s linear method, with one additional equation dealing with seasonality.
Triple exponential smoothing applies exponential smoothing three times, which is commonly used when there are three high frequency signals to be removed from a time series under study. The use of a triple application is considered a rule of thumb technique, rather than one based on theoretical foundations and has often been over-emphasized by practitioners.
The smoothing algorithm (where the seasonal effect is additive) is as follows:
Initialize:
1.1 Set $\widehat{Y}_L = \dfrac{Y_1 + ... + Y_L}{L}$
1.2 The general formula for the initial trend: $F_L = \dfrac{1}{L} \left(\dfrac{Y_{L+1} - Y_1}{L} + \dfrac{Y_{L+2} - Y_2}{L} + ... \dfrac{Y_{L+L} - Y_L}{L} \right)$
1.3 Suppose a cycle of seasonal change is of length L so that $G_t = G_{t+L}$ and $N$ - the number of somplete cycles in the data. As a rule of thumb, a minimum of two full seasons (or 2L periods) of historical data is needed to initialize a set of seasonal factors. The initial estimates for the seasonal indices: $$ \begin{aligned} G_i &= \dfrac{1}{N}\sum_{j = 1}^N{\dfrac{Y_{L(j-1)+i}}{A_j}}, \quad i = 1,...,L\\ \text{where: } A_j &= \dfrac{1}{L} \sum_{k = 1}^L Y_{L(j-1)+k}, \quad j = 1,...,N \end{aligned} $$
i.e. $A_j$ is the average value of $Y_t$ in the $j-th$ cycle of your data and $G_i$ is the average of the weighted values of the $i-th$ seasonal indice from all cycles.
For example, if we have yearly data of $N$ years, then $A_1$ would be the average value of the first 12 month data $Y_1,...,Y_{12}$; and $A_2$ - the average of $Y_{13},...,Y_{24}$, etc.
Then $G_1$ would be the average of the weighted values $\dfrac{Y_1}{A_1}, \dfrac{Y_{12 \cdot 2 + 1}}{A_2},..., \dfrac{Y_{12 \cdot N + 1}}{A_N}$, $G_2$ - the average of $\dfrac{Y_2}{A_1}, \dfrac{Y_{12 \cdot 2 + 2}}{A_2},..., \dfrac{Y_{12 \cdot N + 2}}{A_N}$ etc.
Two possible scenarios:
2.1 $S_t = \alpha \left( Y_t - G_{t-L}\right) + (1- \alpha)\cdot (S_{t-1} + F_{t-1})$, where $0 < \alpha < 1$;
2.2 $F_t = \beta (S_t - S_{t-1}) + (1 - \beta) F_{t-1}$, where $0 < \beta < 1$;
2.3 $G_t = \gamma \left( Y_t-S_{t} \right) + (1 - \gamma) G_{t-L}$
2.4 Forecast: $\widehat{Y}_{t+h} = \widehat{Y}_t + h \cdot F_t + G_{t + h - L}$
2.1 $S_t = \alpha \left( \dfrac{Y_t}{G_{t-L}}\right) + (1- \alpha)\cdot (S_{t-1} + F_{t-1})$, where $0 < \alpha < 1$;
2.2 $F_t = \beta (S_t - S_{t-1}) + (1 - \beta) F_{t-1}$, where $0 < \beta < 1$;
2.3 $G_t = \gamma \left( \dfrac{Y_t}{S_t} \right) + (1 - \gamma) G_{t-L}$
2.4 Forecast: $\widehat{Y}_{t+h} = (\widehat{Y}_t + h \cdot F_t) \cdot G_{t + h - L}$
Here $F_t$ is the best estimate of the trend, $G_t$ is the best estimate of the seasonal factors and $S_t$ is the smoothed series. We will show an example of a multiplicative model implementation:
def exp_smooth_triple(pars, y_t, return_type = "simple", L = 12):
a = pars[0]
b = pars[1]
g = pars[2]
T = len(y_t)
cycles = np.floor(T / L)
A = np.zeros(int(cycles))
new_y = np.zeros(T)
new_f = np.zeros(T)
new_g = np.zeros(T)
#Initialize values:
new_y[0] = y_t[0] #np.mean(y_t[0:L])
new_f[0] = np.mean(y_t[L:(2*L)] / L - y_t[0:L] / L)
#Calculate the seasonal average::
for j in range(0, int(cycles)):
A[j] = np.mean(y_t[(j*L):(j*L+L)])
#Calculate the initial seasonal factors:
for j in range(0, L):
new_g[j] = np.mean(y_t[j::L] / A)
for i in range(1, T):
if i < L:
#Double exponential smoothing for initial values:
new_y[i] = a * y_t[i] + (1.0 - a) * (new_y[i-1] + new_f[i-1])
new_f[i] = b * (new_y[i] - new_y[i-1]) + (1.0 - b) * new_f[i-1]
elif i >= L:
new_y[i] = a * (y_t[i] / new_g[i - L]) + (1.0 - a) * (new_y[i-1] + new_f[i-1])
new_f[i] = b * (new_y[i] - new_y[i-1]) + (1.0 - b) * new_f[i-1]
new_g[i] = g * (y_t[i] / new_y[i]) + (1.0 - g) * new_g[i - L]
#Forecast ahead:
forc_period = L
h = list(range(1, int(forc_period + 1)))
h = [float(i) for i in h]
y_forc = (new_y[T - 1] + np.multiply(h, new_f[T - 1])) * new_g[-L:]
#Fitted values for each cycle:
y_fitted = np.zeros(T)
#Set first period values to actual values to avoid any errors
y_fitted[:L] = y_t[:L]
for j in range(1, int(cycles)):
y_fitted[(j*L):(j*L + L)] = (new_y[(j*L + L) - 1] + np.multiply(h, new_f[(j*L + L) - 1])) * new_g[((j - 1)*L):((j - 1)*L + L)]
#Return the results
if return_type == "simple":
return y_fitted
else:
return {"Y": new_y, "F": new_f, "G": new_g, "Forecast": y_forc, "Fitted": y_fitted}
For the exponential smoothing method we also need to choose the value for the smoothing parameters. For simple exponential smoothing, there is only one smoothing parameter $\alpha$. A robust and objective way to obtain values for the unknown parameters included in any exponential smoothing method is to estimate them from the observed data.
The unknown parameters and the initial values for any exponential smoothing method can be estimated by minimizing the SSE:
$$ \begin{aligned} SSE &= \sum_{t=1}^T e_t ^2\\ \text{where }& e_t = Y_t − \widehat{Y}_t \end{aligned} $$
Unlike the ordinary regression case (where we have formulas that return the values of the regression coefficients which minimize the SSE) this involves a non-linear minimization problem.
def exp_smooth_resid(pars, *args): # *args - lets us pass any number of additional arguments, where we only care about their number and not their name
y_t = args[0]
exp_type = args[1]
if exp_type == "double":
new_y = exp_smooth_double(pars, y_t, return_type = "simple")
elif exp_type == "triple":
L = args[2]
new_y = exp_smooth_triple(pars, y_t, return_type = "simple", L = L)
else:
new_y = exp_smooth_simple(pars, y_t, return_type = "simple")
residuals = y_t - new_y
residuals_sq = sum([i ** 2 for i in residuals])
return residuals_sq
We can optimize the functions via the L-BFGS-B
algorithm:
pars = [0.5]
final_pars_1 = optimize.fmin_l_bfgs_b(func = exp_smooth_resid, # specify the funtion, which to optimize
x0 = pars, # specify the initial parameters
args = (np.array(pd.Series.tolist(log_passengers)), "simple"), #specify the additional fixed parameters to pass to the function which we are optimizing
bounds = [(0.01, 0.99)], approx_grad=True) #unless we can provide the analytical gradient expression
print("Simple Exponential smoothing alpha = ", round(final_pars_1[0][0], 4))
pars = [0.5, 0.5]
final_pars_2 = optimize.fmin_l_bfgs_b(func = exp_smooth_resid,
x0 = pars,
args = (np.array(pd.Series.tolist(log_passengers)), "double"),
bounds = [(0.01, 0.99), (0.01, 0.99)], approx_grad=True)
print("Double Exponential smoothing alpha = ", round(final_pars_2[0][0], 4), " beta = ", round(final_pars_2[0][1], 4))
pars = [0.5, 0.5, 0.5]
final_pars_3 = optimize.fmin_l_bfgs_b(func = exp_smooth_resid,
x0 = pars,
args = (np.array(pd.Series.tolist(log_passengers)), "triple", 12),
bounds = [(0.0, 1), (0.0, 1.0), (0.0, 1.0)], approx_grad=True)
print("Triple Exponential smoothing alpha = ", round(final_pars_3[0][0], 2), " beta = ", round(final_pars_3[0][1], 4), " gamma = ", round(final_pars_3[0][2], 4))
smoothed_data = pd.Series(exp_smooth_triple(final_pars_3[0], np.array(pd.Series.tolist(log_passengers)), return_type = "forcast", L = 12))
forecast_data = pd.Series(np.append(smoothed_data["Y"], smoothed_data["Forecast"]))
forecast_data[:len(smoothed_data["Y"])] = None
fitted_data = pd.Series(smoothed_data["Fitted"])
smoothed_data = pd.Series(smoothed_data["Y"])
forecast_data.index = pd.date_range(start = "1949-01", periods = len(forecast_data.index), freq = "M").to_period()
forecast_data.index = forecast_data.index.to_timestamp()
smoothed_data.index = pd.date_range(start = "1949-01", periods = len(smoothed_data.index), freq = "M").to_period()
smoothed_data.index = smoothed_data.index.to_timestamp()
fitted_data.index = smoothed_data.index
smoothed_data.name = "Triple Exponential Smoothing: Overall Smoothing"
forecast_data.name = "Triple Exponential Smoothing: Forecasts"
fitted_data.name = "Triple Exponential Smoothing: Fitted values"
plt.figure(figsize = (15, 10))
plt.plot(log_passengers)
plt.plot(fitted_data, linestyle = "--", color = "red")
plt.plot(smoothed_data, linestyle = "--", color = "orange")
plt.plot(forecast_data, linestyle = "--", color = "green")
plt.legend()
plt.show()
We can also look at the double exponential smoothing with the optimal parameters:
smoothed_data_double_result = pd.Series(exp_smooth_double(final_pars_2[0], np.array(pd.Series.tolist(log_passengers)), return_type = "forcast"))
smoothed_data_double = pd.Series(np.append(smoothed_data_double_result["Y"], smoothed_data_double_result["Forecast"]))
smoothed_data_double.index = pd.date_range(start = "1949-01", periods = len(smoothed_data_double.index), freq = "M").to_period()
smoothed_data_double.index = smoothed_data_double.index.to_timestamp()
smoothed_data_double.name = "Double Exponential Smoothing"
plt.figure(figsize = (15, 10))
plt.plot(log_passengers)
plt.plot(smoothed_data_double, linestyle = "--", color = "orange")
plt.legend()
plt.show()
Note: we have ommited checking the residual plots. Look at the residual series and ACF plots - are the residuals stationary? Are they WN?
Some examples are provided below
resid_OLS = fit_3.resid
tsdisplay(resid_OLS, max_lag = 40)
Note that only the first couple of lags are significant, while the latter ones are not as much.
resid_ets = log_passengers - smoothed_data_double_result["Y"]
tsdisplay(resid_ets, max_lag = 40)
Note that in the case of using double exponential smoothing, there appears to be a seasonality or cyclic effect as every so often we see a lag spike.