4.2 Example
Considerations
The following libraries will be useful for this exercise.
#Import the required modules for vectors and matrix operations, data generation
import numpy as np
#Import the required modules for plot creation:
import matplotlib.pyplot as plt
#import the required modules for TimeSeries data generation:
import statsmodels.api as sm
#Import the required modules for test statistic calculation:
import statsmodels.stats as sm_stat
#Import the required modules for model estimation:
import statsmodels.tsa as smt
#Import dmatrix to fit regression splines
from patsy import dmatrix
#Import the required modules for time series data types
import pandas as pd
# Set maximum column width of a pd.DataFrame to PREVENT a line break of wide tables
pd.set_option('display.max_columns', None)
# pd.set_option('display.max_colwidth', -1)
pd.set_option('display.expand_frame_repr', False)
We will also need some custom functions to make plotting easier:
def tsdisplay(y, figsize = (10, 8), title = "", lags = 20):
tmp_data = pd.Series(y)
fig = plt.figure(figsize = figsize)
#Plot the time series
_ = tmp_data.plot(ax = fig.add_subplot(311), title = "$Time\ Series\ " + title + "$", legend = False)
#Plot the ACF:
_ = sm.graphics.tsa.plot_acf(tmp_data, lags = lags, zero = False, ax = fig.add_subplot(323))
_ = plt.xticks(np.arange(1, lags + 1, 1.0))
#Plot the PACF:
_ = sm.graphics.tsa.plot_pacf(tmp_data, lags = 20, zero = False, ax = fig.add_subplot(324))
_ = plt.xticks(np.arange(1, lags + 1, 1.0))
#Plot the QQ plot of the data:
_ = sm.qqplot(tmp_data, line='s', ax = fig.add_subplot(325))
_ = plt.title("QQ Plot")
#Plot the residual histogram:
_ = fig.add_subplot(326).hist(tmp_data, bins = 40, density = 1)
_ = plt.title("Histogram")
#Fix the layout of the plots:
_ = plt.tight_layout()
plt.show()
tsdiag2 <- function(y, title = "", lags = 30, df = 0){
rez <- matrix(0, nrow = 4, ncol = lags)
rownames(rez) <- c("Ljung-Box: X-squared", "Ljung-Box: p-value", "Box-Pierce: X-squared", "Box-Pierce: p-value")
colnames(rez) <- 1:lags
suppressWarnings({
rez[1, ] <- sapply(1:lags, function(z){Box.test(y, lag = z, fitdf = df, type = "Ljung-Box")$statistic})
rez[2, ] <- sapply(1:lags, function(z){Box.test(y, lag = z, fitdf = df, type = "Ljung-Box")$p.value})
rez[3, ] <- sapply(1:lags, function(z){Box.test(y, lag = z, fitdf = df, type = "Box-Pierce")$statistic})
rez[4, ] <- sapply(1:lags, function(z){Box.test(y, lag = z, fitdf = df, type = "Box-Pierce")$p.value})
})
# Fix for when df - fitdf = 0, since Box.test(y, lag = 0) and Box.test(y, lag = 1, fitdf = 1) do not always return NA
rez[ c(2, 4), 1:lags == df] <- NA
print(rez)
# Plot Ljung-Box and Box-Pierce statistic p-values:
plot(1:lags, rez[2, ], pch = 19, col = "blue", main = paste0("Time Series, with model fitted df = ", df),
ylim = c(0, max(rez[c(2, 4),], 0.05, na.rm = TRUE)))
points(1:lags, rez[4, ], pch = 19, col = "darkgreen")
abline(h = 0.05, col = "red")
axis(1, at = 1:lags, labels = 1:lags)
legend("topleft", legend = c("Ljung-Box", "Box-Pierce", "5% critical value"),
lty = c(NA, NA, 1), pch = c(19, 19, NA), col = c("blue", "darkgreen", "red"))
# Return nothing:
return(NULL)
}
def tsdiag(y, fig_size = (10, 8), title = "", lags = 30, df = 0):
tmp_data = pd.Series(y.copy()).reset_index(drop = True)
tmp_data.index = list(range(1, len(tmp_data) + 1))
# Carry out the Ljung-Box and Box-Pierce tests:
tmp_acor = list(sm_stat.diagnostic.acorr_ljungbox(tmp_data, lags = lags, model_df = df, boxpierce = True))
#
col_index = ["Ljung-Box: X-squared", "Ljung-Box: p-value", "Box-Pierce: X-squared", "Box-Pierce: p-value"]
rez = pd.DataFrame(tmp_acor, index = col_index, columns = range(1, len(tmp_acor[0]) + 1))
print(rez)
# Plot Ljung-Box and Box-Pierce statistic p-values:
fig = plt.figure(figsize = fig_size)
_ = plt.plot(range(1, len(tmp_acor[0]) + 1), tmp_acor[1], 'bo', label = "Ljung-Box values")
_ = plt.plot(range(1, len(tmp_acor[0]) + 1), tmp_acor[3], 'go', label = "Box-Pierce values")
_ = plt.xticks(np.arange(1, len(tmp_acor[0]) + 1, 1.0))
_ = plt.axhline(y = 0.05, color = "blue", linestyle = "--", label = "5% critical value")
_ = plt.title("$Time\ Series\ " + title + "$, with model fitted df = " + str(df))
_ = plt.legend()
# get the p-values of the Ljung-Box test:
p_vals = pd.Series(tmp_acor[1])
# Start the index from 1 instead of 0 (because Ljung-Box test is for lag values from 1 to k)
p_vals.index += 1
# Annotate the p-value points above and to the left of the vertex
x = np.arange(p_vals.size) + 1
for X, Y, Z in zip(x, p_vals, p_vals):
plt.annotate(round(Z, 4), xy=(X,Y), xytext=(-5, 5), ha = 'left', textcoords='offset points')
plt.show()
# Return nothing
return None
Note that these functions are not necessary, but they make plotting much easier since we will frequently need to plot the data, its autocorrelation, as well as test whether the autocorrelation is significant.
4.2.1 Exercise Set 1: Exploratory Data Analysis (EDA)
We will show an example with the nyse
dataset.
We will begin by reading in the data:
4.2.1.1 Task 1
We will begin by plotting our series:
Note that the growth appears to be exponential - it is more convenient to take logarithms to linearize the series.
Considerations
We can also compare the original series, the logarithm of the series, the differences of the original series and the differences of the logarithms
par(mfrow = c(2, 2))
plot(nyse, main = bquote(Y[t]))
plot(log(nyse), main = bquote(log(Y[t])))
plot(diff(nyse), main = bquote(Delta~Y[t]))
plot(diff(log(nyse)), main = bquote(Delta~log(Y[t])))
While there does also appear to be a larger downward spike for \(\Delta \log(Y_t)\), the variance appears to be increasing in \(\Delta Y_t\). Consequently, we will work with \(\log(Y_t)\).
4.2.1.2 Task 2
We see a slow ACF decline - an indication of non-stationarity, or a trend, or a nonstationarity and/or drift and/or trend. It is impossible to tell from the graphs alone - we need to test whether the data is TS or DS. To do this - we need to carry out unit root testing.
In R
, we can always load the relevant libraries to avoid the use of ::
notation. However, multiple libraries may have functions with the same name. In such cases, using the ::
notation to indicate the library avoids those possible problems.
4.2.2 Exercise Set 2: Unit Root Testing
Unit root tests consist of:
- Selecting the lag order of the autoregressive model \(\Delta Y_t = \alpha + \delta t + \rho Y_{t-1} + \sum_{j = 1}^p \gamma_j \Delta Y_{t-1} + \epsilon_t\)
- Determine whether the trend coefficient, \(\widehat{\delta}\), is significant;
- Depending on the final model, asses whether \(\widehat{\rho}\) is significant.
4.2.2.1 Task 3: Testing manually
Lag Order selection
We will examine three cases:
- Starting with \(p_\max = 4\);
- Starting with \(p_\max = 5\);
- Starting with \(p_\max = \biggr\lfloor 12 \cdot \left( \dfrac{T}{100} \right)^{1/4} \biggr\rfloor\);
4.2.2.1.1 Starting with \(p_\max = 4\)
##
## Time series regression with "ts" data:
## Start = 1952(6), End = 1995(13)
##
## Call:
## dynlm(formula = d(lnyse) ~ L(lnyse) + L(d(lnyse), 1:4) + time(lnyse))
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.220902 -0.024758 0.001637 0.026627 0.152618
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.3687707 1.1485800 -2.062 0.0397 *
## L(lnyse) -0.0172567 0.0083590 -2.064 0.0395 *
## L(d(lnyse), 1:4)1 0.0544182 0.0439376 1.239 0.2161
## L(d(lnyse), 1:4)2 -0.0282788 0.0439795 -0.643 0.5205
## L(d(lnyse), 1:4)3 0.0150921 0.0439341 0.344 0.7313
## L(d(lnyse), 1:4)4 0.0364522 0.0439285 0.830 0.4070
## time(lnyse) 0.0012584 0.0006077 2.071 0.0389 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.04061 on 517 degrees of freedom
## Multiple R-squared: 0.01244, Adjusted R-squared: 0.0009837
## F-statistic: 1.086 on 6 and 517 DF, p-value: 0.3696
The last lag coefficient, (L(d(lnyse), 1:4)4
in R
) is insignificant since its $p-value = 0.4070 > 0.05`. We will reduce the maximum lag and continue the procedure.
mod.1 <- dynlm(d(lnyse) ~ L(lnyse) + L(d(lnyse), 1:3) + time(lnyse))
print(round(coef(summary(mod.1)), 4))
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.2563 1.1441 -1.9721 0.0491
## L(lnyse) -0.0162 0.0083 -1.9420 0.0527
## L(d(lnyse), 1:3)1 0.0514 0.0439 1.1689 0.2430
## L(d(lnyse), 1:3)2 -0.0274 0.0439 -0.6227 0.5338
## L(d(lnyse), 1:3)3 0.0151 0.0439 0.3438 0.7311
## time(lnyse) 0.0012 0.0006 1.9789 0.0484
p-value
of L(d(lnyse), 1:3)3
is > 0.05
.
mod.1 <- dynlm(d(lnyse) ~ L(lnyse) + L(d(lnyse), 1:2) + time(lnyse))
print(round(coef(summary(mod.1)), 4))
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.2314 1.1372 -1.9621 0.0503
## L(lnyse) -0.0161 0.0083 -1.9483 0.0519
## L(d(lnyse), 1:2)1 0.0493 0.0438 1.1244 0.2614
## L(d(lnyse), 1:2)2 -0.0263 0.0439 -0.6004 0.5485
## time(lnyse) 0.0012 0.0006 1.9698 0.0494
p-value
of L(d(lnyse), 1:2)2
is > 0.05
.
mod.1 <- dynlm(d(lnyse) ~ L(lnyse) + L(d(lnyse), 1) + time(lnyse))
print(round(coef(summary(mod.1)), 4))
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.2880 1.1308 -2.0233 0.0435
## L(lnyse) -0.0164 0.0082 -1.9956 0.0465
## L(d(lnyse), 1) 0.0480 0.0438 1.0956 0.2737
## time(lnyse) 0.0012 0.0006 2.0302 0.0428
p-value
of L(d(lnyse), 1)
is > 0.05
.
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.1695 1.1246 -1.9292 0.0542
## L(lnyse) -0.0155 0.0082 -1.8995 0.0580
## time(lnyse) 0.0012 0.0006 1.9362 0.0534
The final model contains no lags of \(\Delta Y_{t-j}\) - so there is no autoregressive coefficient for the difference model.
4.2.2.1.2 Starting with \(p_\max = 5\)
We will repeat the procedure as before.
mod.2 <- dynlm(d(lnyse) ~ L(lnyse) + L(d(lnyse), 1:5) + time(lnyse))
print(round(coef(summary(mod.2)), 4))
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.6325 1.1505 -2.2882 0.0225
## L(lnyse) -0.0192 0.0084 -2.2920 0.0223
## L(d(lnyse), 1:5)1 0.0540 0.0439 1.2319 0.2185
## L(d(lnyse), 1:5)2 -0.0286 0.0439 -0.6523 0.5145
## L(d(lnyse), 1:5)3 0.0208 0.0439 0.4738 0.6359
## L(d(lnyse), 1:5)4 0.0329 0.0438 0.7516 0.4527
## L(d(lnyse), 1:5)5 0.1030 0.0438 2.3508 0.0191
## time(lnyse) 0.0014 0.0006 2.2962 0.0221
L(d(lnyse), 1:5)5
has p-value < 0.05
, so the final lag order is 5
.
4.2.2.1.3 Starting with \(p_\max = \biggr\lfloor 12 \cdot \left( \dfrac{T}{100} \right)^{1/4} \biggr\rfloor\)
The maximum lag via the rule-of-thumb is:
So, we begin with such a model:
mod.3 <- dynlm(d(lnyse) ~ L(lnyse) + L(d(lnyse), 1:18) + time(lnyse))
print(round(coef(summary(mod.3)), 4))
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.0342 1.2305 -1.6532 0.0989
## L(lnyse) -0.0152 0.0091 -1.6706 0.0954
## L(d(lnyse), 1:18)1 0.0509 0.0453 1.1234 0.2618
## L(d(lnyse), 1:18)2 -0.0321 0.0454 -0.7058 0.4806
## L(d(lnyse), 1:18)3 0.0307 0.0453 0.6776 0.4984
## L(d(lnyse), 1:18)4 0.0279 0.0454 0.6162 0.5381
## L(d(lnyse), 1:18)5 0.1048 0.0451 2.3241 0.0205
## L(d(lnyse), 1:18)6 -0.0660 0.0453 -1.4561 0.1460
## L(d(lnyse), 1:18)7 -0.0292 0.0454 -0.6423 0.5210
## L(d(lnyse), 1:18)8 -0.0719 0.0454 -1.5853 0.1135
## L(d(lnyse), 1:18)9 0.0048 0.0455 0.1060 0.9156
## L(d(lnyse), 1:18)10 -0.0205 0.0455 -0.4518 0.6516
## L(d(lnyse), 1:18)11 0.0159 0.0453 0.3505 0.7261
## L(d(lnyse), 1:18)12 0.0322 0.0453 0.7105 0.4777
## L(d(lnyse), 1:18)13 0.0164 0.0451 0.3640 0.7160
## L(d(lnyse), 1:18)14 -0.1022 0.0450 -2.2723 0.0235
## L(d(lnyse), 1:18)15 0.0043 0.0452 0.0948 0.9245
## L(d(lnyse), 1:18)16 -0.0426 0.0451 -0.9442 0.3455
## L(d(lnyse), 1:18)17 0.0341 0.0451 0.7574 0.4492
## L(d(lnyse), 1:18)18 -0.0507 0.0451 -1.1236 0.2618
## time(lnyse) 0.0011 0.0007 1.6627 0.0970
L(d(lnyse), 1:18)18
is insignificant. We will continue reducing the lag order until the last lag is significant. We omit the intermediate model output with insignificant \(p_\max\) coefficients. We get that:
mod.3 <- dynlm(d(lnyse) ~ L(lnyse) + L(d(lnyse), 1:14) + time(lnyse))
print(round(coef(summary(mod.3)), 4))
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.0920 1.2078 -1.7320 0.0839
## L(lnyse) -0.0151 0.0089 -1.7043 0.0890
## L(d(lnyse), 1:14)1 0.0510 0.0448 1.1383 0.2555
## L(d(lnyse), 1:14)2 -0.0233 0.0449 -0.5198 0.6034
## L(d(lnyse), 1:14)3 0.0283 0.0449 0.6307 0.5285
## L(d(lnyse), 1:14)4 0.0301 0.0448 0.6722 0.5018
## L(d(lnyse), 1:14)5 0.1041 0.0448 2.3227 0.0206
## L(d(lnyse), 1:14)6 -0.0664 0.0451 -1.4723 0.1416
## L(d(lnyse), 1:14)7 -0.0299 0.0450 -0.6637 0.5072
## L(d(lnyse), 1:14)8 -0.0658 0.0450 -1.4631 0.1441
## L(d(lnyse), 1:14)9 0.0047 0.0449 0.1057 0.9159
## L(d(lnyse), 1:14)10 -0.0163 0.0447 -0.3638 0.7162
## L(d(lnyse), 1:14)11 0.0120 0.0447 0.2690 0.7880
## L(d(lnyse), 1:14)12 0.0389 0.0446 0.8727 0.3832
## L(d(lnyse), 1:14)13 0.0100 0.0446 0.2247 0.8223
## L(d(lnyse), 1:14)14 -0.0972 0.0446 -2.1787 0.0298
## time(lnyse) 0.0011 0.0006 1.7390 0.0826
The final model has a \(p_\max = 14\) lag order.
4.2.2.1.4 Comparison of Different Models
If we compare the \(\rm AIC\) and \(\rm BIC\) values of the three models:
## mdl_1 (p = 0) mdl_2 (p = 5) mdl_3 (p = 14)
## AIC -1881.349 -1860.785 -1813.996
## BIC -1864.273 -1822.449 -1737.636
The \(\rm AIC\) and \(\rm BIC\) values are smaller for the model, where we started with an arbitrary value of \(p = 4\).
Having determined the lag order, we asses the trend significance. Note that we will examine the final models for each case.
Considerations
In real applications, we would select an arbitrary lag ourselves, if we would choose to manually test for a unit root.
4.2.2.1.5 Determining the trend significance
For the 1st model:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.1695 1.1246 -1.9292 0.0542
## L(lnyse) -0.0155 0.0082 -1.8995 0.0580
## time(lnyse) 0.0012 0.0006 1.9362 0.0534
The trend is insignificant, so we remove it.
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.0077 0.0122 0.6352 0.5256
## L(lnyse) -0.0001 0.0019 -0.0720 0.9426
For the 2nd model:
mod.2 <- dynlm(d(lnyse) ~ L(lnyse) + L(d(lnyse), 1:5) + time(lnyse))
print(round(coef(summary(mod.2)), 4))
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.6325 1.1505 -2.2882 0.0225
## L(lnyse) -0.0192 0.0084 -2.2920 0.0223
## L(d(lnyse), 1:5)1 0.0540 0.0439 1.2319 0.2185
## L(d(lnyse), 1:5)2 -0.0286 0.0439 -0.6523 0.5145
## L(d(lnyse), 1:5)3 0.0208 0.0439 0.4738 0.6359
## L(d(lnyse), 1:5)4 0.0329 0.0438 0.7516 0.4527
## L(d(lnyse), 1:5)5 0.1030 0.0438 2.3508 0.0191
## time(lnyse) 0.0014 0.0006 2.2962 0.0221
The trend is significant, so we leave it.
For the 3rd model:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.0920 1.2078 -1.7320 0.0839
## L(lnyse) -0.0151 0.0089 -1.7043 0.0890
## L(d(lnyse), 1:14)1 0.0510 0.0448 1.1383 0.2555
## L(d(lnyse), 1:14)2 -0.0233 0.0449 -0.5198 0.6034
## L(d(lnyse), 1:14)3 0.0283 0.0449 0.6307 0.5285
## L(d(lnyse), 1:14)4 0.0301 0.0448 0.6722 0.5018
## L(d(lnyse), 1:14)5 0.1041 0.0448 2.3227 0.0206
## L(d(lnyse), 1:14)6 -0.0664 0.0451 -1.4723 0.1416
## L(d(lnyse), 1:14)7 -0.0299 0.0450 -0.6637 0.5072
## L(d(lnyse), 1:14)8 -0.0658 0.0450 -1.4631 0.1441
## L(d(lnyse), 1:14)9 0.0047 0.0449 0.1057 0.9159
## L(d(lnyse), 1:14)10 -0.0163 0.0447 -0.3638 0.7162
## L(d(lnyse), 1:14)11 0.0120 0.0447 0.2690 0.7880
## L(d(lnyse), 1:14)12 0.0389 0.0446 0.8727 0.3832
## L(d(lnyse), 1:14)13 0.0100 0.0446 0.2247 0.8223
## L(d(lnyse), 1:14)14 -0.0972 0.0446 -2.1787 0.0298
## time(lnyse) 0.0011 0.0006 1.7390 0.0826
The trend is insignificant, so we remove it
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.0083 0.0129 0.6479 0.5173
## L(lnyse) -0.0001 0.0020 -0.0471 0.9624
## L(d(lnyse), 1:14)1 0.0427 0.0446 0.9574 0.3388
## L(d(lnyse), 1:14)2 -0.0317 0.0447 -0.7094 0.4784
## L(d(lnyse), 1:14)3 0.0198 0.0447 0.4433 0.6577
## L(d(lnyse), 1:14)4 0.0217 0.0447 0.4859 0.6273
## L(d(lnyse), 1:14)5 0.0957 0.0447 2.1419 0.0327
## L(d(lnyse), 1:14)6 -0.0753 0.0449 -1.6792 0.0937
## L(d(lnyse), 1:14)7 -0.0379 0.0449 -0.8456 0.3982
## L(d(lnyse), 1:14)8 -0.0733 0.0449 -1.6329 0.1031
## L(d(lnyse), 1:14)9 -0.0017 0.0449 -0.0374 0.9701
## L(d(lnyse), 1:14)10 -0.0232 0.0446 -0.5202 0.6031
## L(d(lnyse), 1:14)11 0.0051 0.0446 0.1139 0.9093
## L(d(lnyse), 1:14)12 0.0320 0.0445 0.7184 0.4729
## L(d(lnyse), 1:14)13 0.0031 0.0445 0.0688 0.9452
## L(d(lnyse), 1:14)14 -0.1046 0.0445 -2.3500 0.0192
So, we have that:
- The 1st model does not contain a trend.
- The 2nd model does contain a trend;
- The 3rd model does not contain a trend;
4.2.2.1.6 Carrying out the unit root test
The \(\rm ADF\) test tests the null hypothesis: \[ H_0: \rho = 0 \text{ (i.e. the series contains a unit root)} \]
Depending on the trend, the critical values are:
- The 1st model does not contain a trend - we need to compare the t-statistic of \(\widehat{\rho}\) with -2.89;
- The 2nd model does contain a trend - we need to compare the t-statistic of \(\widehat{\rho}\) with -3.45;
- The 3rd model does not contain a trend - we need to compare the t-statistic of \(\widehat{\rho}\) with -2.89.
Looking at the 1st model:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.0077 0.0122 0.6352 0.5256
## L(lnyse) -0.0001 0.0019 -0.0720 0.9426
The coefficient \(\widehat{\rho}\) t-value is L(lnyse) = -0.0720 > -2.89
. So we have no grounds to reject the null hypothesis and conclude that the series contains a unit root.
Since we have determined that L(lnyse)
is insignificant, we could remove it from the model, if we wanted to consider this model as a contender for the best model for this series:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.0069 0.0018 3.8836 1e-04
Then the estimated for \(\Delta Y_t\) is: \(\Delta Y_t = 0.0069 + \epsilon_t\), which we can write it as a model for \(Y_t\): \(Y_t = 0.0069 + Y_{t-1} + \epsilon_t\), which is an \(\rm AR(1)\) model for \(Y_t\).
Looking at the 2nd model:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.6325 1.1505 -2.2882 0.0225
## L(lnyse) -0.0192 0.0084 -2.2920 0.0223
## L(d(lnyse), 1:5)1 0.0540 0.0439 1.2319 0.2185
## L(d(lnyse), 1:5)2 -0.0286 0.0439 -0.6523 0.5145
## L(d(lnyse), 1:5)3 0.0208 0.0439 0.4738 0.6359
## L(d(lnyse), 1:5)4 0.0329 0.0438 0.7516 0.4527
## L(d(lnyse), 1:5)5 0.1030 0.0438 2.3508 0.0191
## time(lnyse) 0.0014 0.0006 2.2962 0.0221
The coefficient \(\widehat{\rho}\) t-value is L(lnyse) = -2.2920 > -3.45
. So we have no grounds to reject the null hypothesis and conclude that the series contains a unit root.
Since we have determined that L(lnyse)
is insignificant, we could remove it from the model, if we wanted to consider this model as a contender for the best model for this series:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.07357 0.27887 -0.26382 0.79203
## L(d(lnyse), 1:5)1 0.04398 0.04382 1.00357 0.31606
## L(d(lnyse), 1:5)2 -0.03932 0.04380 -0.89786 0.36968
## L(d(lnyse), 1:5)3 0.01035 0.04380 0.23620 0.81337
## L(d(lnyse), 1:5)4 0.02288 0.04376 0.52299 0.60121
## L(d(lnyse), 1:5)5 0.09225 0.04373 2.10951 0.03538
## time(lnyse) 0.00004 0.00014 0.28548 0.77539
Since we needed the trend for unit root testing - having removed the insignificant \(\rho\) coefficient, we could try to remove the insignificant trend and improve our model. But we will assume that this model is acceptable as is.
Then the estimated model for \(\Delta Y_t\) is:
\(\Delta Y_t = -0.07357 + 0.00004 \cdot t + 0.04398 \Delta Y_{t-1} -0.03932 \Delta Y_{t-2} + 0.01035 \Delta Y_{t-3} + 0.02288 \Delta Y_{t-4} + 0.09225 \Delta Y_{t-5} + \epsilon_t\),
which we can re-write as a model for \(Y_t\).
We use the property \(\gamma_j = -[\phi_{j+1} + ... + \phi_p]\), \(j = 1, ..., p - 1\) and the expression of \(\rho\) to get the coefficients:
\[ \begin{aligned} \gamma_5 &= - \phi_6 &&\Longrightarrow \phi_6 = - \gamma_5 = -0.09225\\ \gamma_4 &= - [\phi_5 + \phi_6] &&\Longrightarrow \phi_5 = - \gamma_4 - \phi_6 = -0.02288+0.09225 = 0.06937\\ \gamma_3 &= - [\phi_4 + \phi_5 + \phi_6] &&\Longrightarrow \phi_4 = - \gamma_3 - [\phi_5 + \phi_6] = - \gamma_3 + \gamma_4 = -0.01035+0.02288 = 0.01253\\ \gamma_2 &= - [\phi_3 + \phi_4 + \phi_5 + \phi_6] &&\Longrightarrow \phi_3 = - \gamma_2 + \gamma_3 = 0.03932 + 0.01035 = 0.04967\\ \gamma_1 &= - [\phi_2 + \phi_3 + \phi_4 + \phi_5 + \phi_6] &&\Longrightarrow \phi_2 = - \gamma_1 + \gamma_2 = -0.04398 -0.03932 = -0.0833\\ \rho &= \phi_1 + \phi_2 + \phi_3 + \phi_4 + \phi_5 + \phi_6 - 1,\text{ since } \rho = 0 &&\Longrightarrow \phi_1 = 1 - (\phi_2 + \phi_3 + \phi_4 + \phi_5 + \phi_6) = 1 + \gamma_1= 1 + 0.04398 = 1.04398 \end{aligned} \] So, the equivalent model for \(Y_t\) is:
\(Y_t = -0.07357 + 0.00004 \cdot t + 1.04398 Y_{t-1} -0.0833 Y_{t-2} + 0.04967 Y_{t-3} + 0.01253 Y_{t-4} + 0.06937 Y_{t-5} -0.09225 Y_{t-6} + \epsilon_t\)
Considerations
To verify that this is indeed the correct transformation, we can try to simulate a few observations with the same residuals:
First, we will simulate \(Y_t\):
Y <- NULL
Y[1] <- -0.07357 + 0.00004 * tt[1] + eps[1]
Y[2] <- -0.07357 + 0.00004 * tt[2] + 1.04398 * Y[1] + eps[2]
Y[3] <- -0.07357 + 0.00004 * tt[3] + 1.04398 * Y[2] -0.0833 * Y[1] + eps[3]
Y[4] <- -0.07357 + 0.00004 * tt[4] + 1.04398 * Y[3] -0.0833 * Y[2] + 0.04967 * Y[1] + eps[4]
Y[5] <- -0.07357 + 0.00004 * tt[5] + 1.04398 * Y[4] -0.0833 * Y[3] + 0.04967 * Y[2] + 0.01253 * Y[1] + eps[5]
Y[6] <- -0.07357 + 0.00004 * tt[6] + 1.04398 * Y[5] -0.0833 * Y[4] + 0.04967 * Y[3] + 0.01253 * Y[2] + 0.06937 * Y[1] + eps[5]
for(j in 7:N){
Y[j] <- -0.07357 + 0.00004 * tt[j] + 1.04398 * Y[j-1] -0.0833 * Y[j-2] + 0.04967 * Y[j - 3] + 0.01253 * Y[j - 4] + 0.06937 * Y[j - 5] - 0.09225 * Y[j - 6] + eps[j]
}
Secondly, we will simulate \(\Delta Y_t\):
dY <- NULL
dY[1] <- -0.07357 + 0.00004 * tt[1] + eps[1]
dY[2] <- -0.07357 + 0.00004 * tt[2] + 0.04398 * dY[1] + eps[2]
dY[3] <- -0.07357 + 0.00004 * tt[3] + 0.04398 * dY[2] -0.03932 * dY[1] + eps[3]
dY[4] <- -0.07357 + 0.00004 * tt[4] + 0.04398 * dY[3] -0.03932 * dY[2] + 0.01035 * dY[1] + eps[4]
dY[5] <- -0.07357 + 0.00004 * tt[5] + 0.04398 * dY[4] -0.03932 * dY[3] + 0.01035 * dY[2] + 0.02288 * dY[1] + eps[5]
for(j in 6:N){
dY[j] <- -0.07357 + 0.00004 * tt[j] + 0.04398 * dY[j-1] -0.03932 * dY[j-2] + 0.01035 * dY[j - 3] + 0.02288 * dY[j - 4] + 0.09225 * dY[j - 5] + eps[j]
}
Finally, we will assume that the first 100
, or so, observations of \(Y_t\) are burn-in values (since the first couple of values assume that \(Y_j = \epsilon_j = 0\), for \(j \leq 0\)). This means that we treat treat the first 101
\(\Delta Y_t\) values as burn-in values as well and drop them:
We can compare \(\Delta Y_t\) with \(Y_t - Y_{t-1}\):
dt_compare <- data.frame(diff(Y), dY, check.names = FALSE)
dt_compare$DIFFERENCE <- round(diff(Y) - dY, 10)
head(dt_compare)
## diff(Y) dY DIFFERENCE
## 1 0.4191779 0.4191779 0
## 2 -0.1389262 -0.1389262 0
## 3 -0.5043527 -0.5043527 0
## 4 -1.1470432 -1.1470432 0
## 5 -0.2097488 -0.2097488 0
## 6 -0.7880470 -0.7880470 0
We see that there is no difference between \(Y_t - Y_{t-1}\) (column diff(Y)
), which was calculated from the simulated \(Y_t\) series, and the separately simulated \(\Delta Y_t\) (column dY
) series.
We can also visually compare the charts:
So, the two models are equivalent. For simulation and forecasting we may be more interested in \(Y_t\) model, as it is the original series, which we are interested in.
We may also be interesting in verifying that this is indeed the same model by simulating it for a larger sample
Y <- NULL
Y[1] <- -0.07357 + 0.00004 * tt[1] + eps[1]
Y[2] <- -0.07357 + 0.00004 * tt[2] + 1.04398 * Y[1] + eps[2]
Y[3] <- -0.07357 + 0.00004 * tt[3] + 1.04398 * Y[2] -0.0833 * Y[1] + eps[3]
Y[4] <- -0.07357 + 0.00004 * tt[4] + 1.04398 * Y[3] -0.0833 * Y[2] + 0.04967 * Y[1] + eps[4]
Y[5] <- -0.07357 + 0.00004 * tt[5] + 1.04398 * Y[4] -0.0833 * Y[3] + 0.04967 * Y[2] + 0.01253 * Y[1] + eps[5]
Y[6] <- -0.07357 + 0.00004 * tt[6] + 1.04398 * Y[5] -0.0833 * Y[4] + 0.04967 * Y[3] + 0.01253 * Y[2] + 0.06937 * Y[1] + eps[5]
for(j in 7:N){
Y[j] <- -0.07357 + 0.00004 * tt[j] + 1.04398 * Y[j-1] -0.0833 * Y[j-2] + 0.04967 * Y[j - 3] + 0.01253 * Y[j - 4] + 0.06937 * Y[j - 5] - 0.09225 * Y[j - 6] + eps[j]
}
and estimating the model:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.09239 0.05048 -1.83037 0.06735
## L(d(Y), 1:5)1 0.00563 0.02293 0.24545 0.80614
## L(d(Y), 1:5)2 -0.07151 0.02291 -3.12086 0.00183
## L(d(Y), 1:5)3 0.00008 0.02297 0.00331 0.99736
## L(d(Y), 1:5)4 0.02966 0.02291 1.29427 0.19573
## L(d(Y), 1:5)5 0.08955 0.02292 3.90678 0.00010
## time(Y) 0.00008 0.00004 1.95842 0.05033
The coefficients are close to the true values, used in model simulation:
## Estimate
## (Intercept) -0.07357
## L(d(lnyse), 1:5)1 0.04398
## L(d(lnyse), 1:5)2 -0.03932
## L(d(lnyse), 1:5)3 0.01035
## L(d(lnyse), 1:5)4 0.02288
## L(d(lnyse), 1:5)5 0.09225
## time(lnyse) 0.00004
This is a nice example, of how we can combine an existing model with data simulation to confirm various properties of a series.
The final, 3rd model
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.0083 0.0129 0.6479 0.5173
## L(lnyse) -0.0001 0.0020 -0.0471 0.9624
## L(d(lnyse), 1:14)1 0.0427 0.0446 0.9574 0.3388
## L(d(lnyse), 1:14)2 -0.0317 0.0447 -0.7094 0.4784
## L(d(lnyse), 1:14)3 0.0198 0.0447 0.4433 0.6577
## L(d(lnyse), 1:14)4 0.0217 0.0447 0.4859 0.6273
## L(d(lnyse), 1:14)5 0.0957 0.0447 2.1419 0.0327
## L(d(lnyse), 1:14)6 -0.0753 0.0449 -1.6792 0.0937
## L(d(lnyse), 1:14)7 -0.0379 0.0449 -0.8456 0.3982
## L(d(lnyse), 1:14)8 -0.0733 0.0449 -1.6329 0.1031
## L(d(lnyse), 1:14)9 -0.0017 0.0449 -0.0374 0.9701
## L(d(lnyse), 1:14)10 -0.0232 0.0446 -0.5202 0.6031
## L(d(lnyse), 1:14)11 0.0051 0.0446 0.1139 0.9093
## L(d(lnyse), 1:14)12 0.0320 0.0445 0.7184 0.4729
## L(d(lnyse), 1:14)13 0.0031 0.0445 0.0688 0.9452
## L(d(lnyse), 1:14)14 -0.1046 0.0445 -2.3500 0.0192
The coefficient \(\widehat{\rho}\) t-value is L(lnyse) = -0.0471 > -2.89
. So we have no grounds to reject the null hypothesis and conclude that the series contains a unit root.
We will not re-write this models equations as they can be determined by applying the same methods as before.
Overall, regardless of the lag order we see that the series has a unit root.
Considerations
It is important to note, that removing the trend increased the t-statistic of \(\widehat{\rho}\), so it is very possible that if we incorrectly remove the trend - we may make an incorrect test conclusion.
4.2.2.2 Task 3: Testing using built-in functions/methods
We will carry out different tests from different packages. For completeness sake, we will provide the same test results from more than one package.
4.2.2.2.1 ADF Test
The \(\rm ADF\) test tests the null hypothesis: \(H_0: \rho = 0 \text{ (i.e. the series contains a unit root)}\)
We can set the maximum lag order, and choose the criterion for the order selection:
##
## ###############################################################
## # Augmented Dickey-Fuller Test Unit Root / Cointegration Test #
## ###############################################################
##
## The value of the test statistic is: -2.0965 6.0649 2.2091
We can examine the model
##
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + tt + z.diff.lag)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.219561 -0.024529 0.003247 0.026918 0.152312
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.001e-02 4.010e-02 2.244 0.0252 *
## z.lag.1 -1.755e-02 8.370e-03 -2.096 0.0365 *
## tt 1.044e-04 5.041e-05 2.070 0.0390 *
## z.diff.lag 5.043e-02 4.444e-02 1.135 0.2570
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.0409 on 506 degrees of freedom
## Multiple R-squared: 0.01033, Adjusted R-squared: 0.004462
## F-statistic: 1.76 on 3 and 506 DF, p-value: 0.1538
The critical values can be extracted via @cval
:
## 1pct 5pct 10pct
## tau3 -3.96 -3.41 -3.12
## phi2 6.09 4.68 4.03
## phi3 8.27 6.25 5.34
To carry out the unit root test, we examine the t-statistic of z.lag.1
- it is -2.096
(Note that this is the same as the first value in the output The value of the test statistic is: -2.0965 6.0649 2.2091
). The relevant critical value is tau
, which is -3.41
for the \(5\%\) significance level.
Since The test statistic -2.096 > -3.45
, we do not reject the null hypothesis of a unit root.
The second way to carry out an \(\rm ADF\) test:
We can either manually specify the lag order, or let the function select the maximum lag order itself.
##
## Augmented Dickey-Fuller Test
##
## data: lnyse
## Dickey-Fuller = -1.9019, Lag order = 8, p-value = 0.6198
## alternative hypothesis: stationary
This output is a bit easier to read - the \(p-value = 0.6198 > 0.05\), so we have no grounds to reject the null hypothesis of a unit root.
4.2.2.2.2 KPSS test
The \(\rm KPSS\) test tests the null hypothesis: \(H_0: \text{ the data is stationary}\)
The test involves specifying and estimating the following equation with a random-walk component: \[ Y_t = r_t + \delta \cdot t + \epsilon_t,\quad r_t = r_{t-1} + u_t \]
The null hypothesis of stationarity is equivalent to the hypothesis: \(H_0: \sigma^2_u = 0\), which would mean that \(r_t = r_{t-1} = ... = r_0 = \rm const\) (i.e. a constant intercept).
We can set the number of lags to use the rule of thumb (lags = "long"
, which is equivalent to \(\root 4 \of {12 \times (n/100)}\)), or lags = "short"
, which is equivalent to \(\root 4 \of {4 \times (n/100)}\). The test types specify as deterministic component either a constant mu
or a constant with linear trend tau
.
##
## #######################################
## # KPSS Unit Root / Cointegration Test #
## #######################################
##
## The value of the test statistic is: 0.3909
We see that the test statistic is 0.3909
. The critical values are:
Since 0.3909 > 0.146
, we reject the null hypothesis of stationarity and conclude that the series is not stationary.
The second way to carry out the KPSS
test (where lshort = FALSE
indicates to use the rule-of-thumb larger lag selection):
## Warning in tseries::kpss.test(lnyse, null = "Trend", lshort = FALSE): p-value smaller than printed p-value
##
## KPSS Test for Trend Stationarity
##
## data: lnyse
## KPSS Trend = 0.39095, Truncation lag parameter = 18, p-value = 0.01
Since p-value = 0.01 < 0.05
, we reject the null hypothesis of stationarity and conclude that the series is not stationary.
4.2.2.2.3 PP test
The \(\rm PP\) test tests builds on the ADF test and tests the null hypothesis: \(H_0: \text{ (i.e. the series contains a unit root)}\).
We begin with the built-in function:
##
## Phillips-Perron Unit Root Test
##
## data: lnyse
## Dickey-Fuller = -1.9395, Truncation lag parameter = 18, p-value = 0.6039
p-value = 0.6039 > 0.05
, so we have no grounds to reject the null hypothesis of a unit root.
The urca
package also contains the \(PP\) test:
##
## ##################################################
## # Phillips-Perron Unit Root / Cointegration Test #
## ##################################################
##
## The value of the test statistic is: -1.9393
Note: use type = "Z-tau"
, otherwise the critical values will be missing!.
If we are interested, we can examine the specified regression, which was used for this test:
##
## Call:
## lm(formula = y ~ y.l1 + trend)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.221927 -0.025066 0.002876 0.026943 0.147915
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.045e-01 5.145e-02 2.032 0.0427 *
## y.l1 9.845e-01 8.151e-03 120.792 <2e-16 ***
## trend 9.600e-05 4.958e-05 1.936 0.0534 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.04055 on 525 degrees of freedom
## Multiple R-squared: 0.9981, Adjusted R-squared: 0.9981
## F-statistic: 1.38e+05 on 2 and 525 DF, p-value: < 2.2e-16
Although the critical test statistic is calculated differently than the one in the \(ADF\) test, the critical values for the PP test are the same as for the DF test.
The critical values are:
## 1pct 5pct 10pct
## critical values -3.97979 -3.420314 -3.132507
Since the value of the test statistic -1.9393
is greater than the \(5\%\) critical value -3.420314
, we have no grounds to reject the null hypothesis of a unit root.
The tseries
package also has the \(PP\) test:
tst_pp_2 <- tseries::pp.test(lnyse, alternative = "stationary", type = "Z(t_alpha)", lshort = FALSE)
tst_pp_2
##
## Phillips-Perron Unit Root Test
##
## data: lnyse
## Dickey-Fuller Z(t_alpha) = -1.9395, Truncation lag parameter = 18, p-value = 0.6039
## alternative hypothesis: stationary
The p-value = 0.6039 > 0.05
, we have no grounds to reject the null hypothesis and conclude that the series has a unit root.
So, all tests indicate that the series has a unit root.
4.2.2.3 Task 4
Since we have determined that the series has a unit root (and we see no indication of seasonality), we can take the differences of the series:
If we examine the series:
We see that it appears to be stationary. We will test this by repeating the unit root tests on the differenced data.
For simplicity, we will use the tests, which have p-values
, from the tseries
package:
## Warning in tseries::adf.test(dlnyse, alternative = "stationary"): p-value smaller than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: dlnyse
## Dickey-Fuller = -8.1328, Lag order = 8, p-value = 0.01
## alternative hypothesis: stationary
\(p-value = 0.01 < 0.05\), so we reject the null hypothesis of a unit root.
## Warning in tseries::kpss.test(dlnyse, null = "Level", lshort = FALSE): p-value greater than printed p-value
##
## KPSS Test for Level Stationarity
##
## data: dlnyse
## KPSS Level = 0.099543, Truncation lag parameter = 18, p-value = 0.1
\(p-value = 0.1 > 0.05\), so we have no grounds to reject the null hypothesis of stationarity.
## Warning in tseries::pp.test(dlnyse, alternative = "stationary", type = "Z(t_alpha)", : p-value smaller than printed p-value
##
## Phillips-Perron Unit Root Test
##
## data: dlnyse
## Dickey-Fuller Z(t_alpha) = -22.021, Truncation lag parameter = 18, p-value = 0.01
## alternative hypothesis: stationary
\(p-value = 0.01 < 0.05\), so we reject the null hypothesis of a unit root.
In other words, the differences, \(\Delta Y_t\), are stationary.
4.2.3 Exercise Set 3: Model Specification
4.2.3.1 Task 5
We will examine three different models:
- A model based on one selected from the \(\rm ADF\) test;
- A model based on the \(\rm ACF\), \(\rm PACF\) of \(\Delta Y_t\);
- An automatically specified model (e.g. via
auto.arima
inR
), without any restrictions.
4.2.3.1.1 From the ADF
test we select and re-estimate mod.2
model (this is a different parametrization than the one on dynlm
)
Furthermore, we will only be able to estimate the drift component, since if \(Y_t\) contains a constant \(\alpha\) and trend \(\delta \cdot t\), then \(\Delta Y_t\) contains \(\delta\).
In this specification \(\alpha\) is not identifiable.
## Series: lnyse
## Regression with ARIMA(5,1,0) errors
##
## Coefficients:
## ar1 ar2 ar3 ar4 ar5 tt
## 0.0380 -0.0350 0.0089 0.0228 0.0923 0.0069
## s.e. 0.0433 0.0434 0.0434 0.0434 0.0434 0.0020
##
## sigma^2 estimated as 0.001645: log likelihood=945.98
## AIC=-1877.96 AICc=-1877.75 BIC=-1848.08
The residuals of this model:
Appear to be white noise.
Considerations
This mainly concerns R
We will now expand upon what a different parametrization entails for models specified dynlm
versus the ones specified with Arima()
.
- The model, specified via
forecast::Arima
(or viastats::arima
) is specified as a process with a linear trend, whose deviations from the trend are described as the AR(1) process:
For example, assume that we would want to specify a model as a stationary \(\rm AR(1)\) model with a trend using forecast::Arima
(or stats::arima
) the specification is as follows:
\[ \begin{cases} Y_t &= \alpha + \delta \cdot t + u_t,\\ u_t &= \phi u_{t-1} + \epsilon_t \end{cases} \]
##
## Call:
## stats::arima(x = lnyse, order = c(1, 0, 0), xreg = time(lnyse))
##
## Coefficients:
## ar1 intercept time(lnyse)
## 0.9832 -144.2528 0.0763
## s.e. 0.0076 12.4969 0.0063
##
## sigma^2 estimated as 0.001632: log likelihood = 945.18, aic = -1882.37
## Series: lnyse
## Regression with ARIMA(1,0,0) errors
##
## Coefficients:
## ar1 intercept xreg
## 0.9832 -144.2528 0.0763
## s.e. 0.0076 12.4969 0.0063
##
## sigma^2 estimated as 0.001642: log likelihood=945.18
## AIC=-1882.37 AICc=-1882.29 BIC=-1865.28
The estimated model is:
\[ \begin{cases} Y_t &= -144.2528 + 0.0763 \cdot t + u_t,\\ u_t &= 0.9832 u_{t-1} + \epsilon_t \end{cases} \] In this case, we would want to test whether the residuals exhibit a unit root by testing whether \(\phi = 1\).
- The model, specified via
dynlm::dynlm
is specified as an autoregressive process with a linear trend:
From the previous expression we would have that: \[ \begin{cases} u_t &= Y_t - \alpha - \delta \cdot t,\\ u_t &= \phi u_{t-1} + \epsilon_t \end{cases} \Longrightarrow Y_t - \alpha - \delta \cdot t = \phi \left(Y_{t-1} - \alpha - \delta \cdot (t-1) \right) + \epsilon_t \]
which means that an alternative specification involves a reparametrization of the previous process as: \[ \begin{aligned} Y_t &= \tilde{\alpha} + \tilde{\delta} \cdot t + \phi Y_{t-1} + \epsilon_t = \left[ (1 - \phi) \alpha + \phi \delta\right] + \left[ (1 - \phi)\delta \right] \cdot t + \phi Y_{t-1} + \epsilon_t \end{aligned} \]
which is as an autoregressive process with a linear trend.
This is the specification that dynlm::dynlm(...)
uses:
##
## Time series regression with "ts" data:
## Start = 1952(2), End = 1996(1)
##
## Call:
## dynlm::dynlm(formula = lnyse ~ time(lnyse) + L(lnyse))
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.221927 -0.025066 0.002876 0.026943 0.147915
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.169524 1.124569 -1.929 0.0542 .
## time(lnyse) 0.001152 0.000595 1.936 0.0534 .
## L(lnyse) 0.984518 0.008150 120.792 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.04055 on 525 degrees of freedom
## Multiple R-squared: 0.9981, Adjusted R-squared: 0.9981
## F-statistic: 1.38e+05 on 2 and 525 DF, p-value: < 2.2e-16
The estimated model is: \[ \begin{aligned} Y_t &= 2.169524 + 0.001152 \cdot t + 0.984518 Y_{t-1} + \epsilon_t \end{aligned} \]
The models are equivalent, but due to the parametrization \(\alpha \neq \tilde \alpha\) and \(\delta \neq \tilde{\delta}\). The autoregressive coefficient \(\widehat{\phi}\) also differ due to different parameter estimation methods (which again may further differ for different parametrization options). However they are fairly close: ar1 = 0.9832
versus L(lnyse) = 0.984518
.
If we use the parametrization on the estimated coefficients from the first model, we would have that:
\[ \begin{aligned} \tilde{\alpha} &= \left[ (1 - \phi) \alpha + \phi \delta\right]\\ \tilde{\delta} &=\left[ (1 - \phi)\delta \right] \end{aligned} \]
We see that they are fairly close to (Intercept) = -2.169524
and time(lnyse) = 0.001152
respectively.
4.2.3.1.2 Selecting the model based on the ACF and PACF of the series:
We can test the lag significance via the \(Ljung-Box\) test. The null hypothesis is:
\[
\begin{aligned}
H_0&: \rho(1) = ... = \rho(k) = 0\\
H_1&: \exists j: \rho(j) \neq 0
\end{aligned}
\]
We will test, whether the first 20
lags have a significant autocorrelation with different values of \(k\):
for(i in 1:20){
aa <- Box.test(diff(lnyse), lag = i, type = "Ljung-Box")
print(paste0("lag = ", i, " p-value = ", round(aa$p.value, 5)))
}
## [1] "lag = 1 p-value = 0.36736"
## [1] "lag = 2 p-value = 0.49395"
## [1] "lag = 3 p-value = 0.70145"
## [1] "lag = 4 p-value = 0.76131"
## [1] "lag = 5 p-value = 0.25479"
## [1] "lag = 6 p-value = 0.20216"
## [1] "lag = 7 p-value = 0.20613"
## [1] "lag = 8 p-value = 0.1685"
## [1] "lag = 9 p-value = 0.2293"
## [1] "lag = 10 p-value = 0.29176"
## [1] "lag = 11 p-value = 0.36542"
## [1] "lag = 12 p-value = 0.40789"
## [1] "lag = 13 p-value = 0.48727"
## [1] "lag = 14 p-value = 0.21937"
## [1] "lag = 15 p-value = 0.27295"
## [1] "lag = 16 p-value = 0.31138"
## [1] "lag = 17 p-value = 0.34642"
## [1] "lag = 18 p-value = 0.30703"
## [1] "lag = 19 p-value = 0.17814"
## [1] "lag = 20 p-value = 0.20485"
We see that in all cases the \(p-value > 0.05\), so we have no grounds to reject the null hypothesis. This indicates that the differences, \(\Delta Y_t\), are white noise. We can specify the model as:
## Series: lnyse
## ARIMA(0,1,0) with drift
##
## Coefficients:
## drift
## 0.0069
## s.e. 0.0018
##
## sigma^2 estimated as 0.00165: log likelihood=942.79
## AIC=-1881.59 AICc=-1881.56 BIC=-1873.05
for(i in (length(coef(mdl_5_2)) + 1):20){
aa <- Box.test(mdl_5_2$residuals, lag = i, type = "Ljung-Box", fitdf = length(coef(mdl_5_2)))
print(paste0("lag = ", i, " p-value = ", round(aa$p.value, 5)))
}
## [1] "lag = 2 p-value = 0.233"
## [1] "lag = 3 p-value = 0.48918"
## [1] "lag = 4 p-value = 0.60088"
## [1] "lag = 5 p-value = 0.15918"
## [1] "lag = 6 p-value = 0.12887"
## [1] "lag = 7 p-value = 0.13701"
## [1] "lag = 8 p-value = 0.1124"
## [1] "lag = 9 p-value = 0.16249"
## [1] "lag = 10 p-value = 0.21714"
## [1] "lag = 11 p-value = 0.28468"
## [1] "lag = 12 p-value = 0.32625"
## [1] "lag = 13 p-value = 0.40394"
## [1] "lag = 14 p-value = 0.16598"
## [1] "lag = 15 p-value = 0.21319"
## [1] "lag = 16 p-value = 0.2484"
## [1] "lag = 17 p-value = 0.28167"
## [1] "lag = 18 p-value = 0.24748"
## [1] "lag = 19 p-value = 0.1376"
## [1] "lag = 20 p-value = 0.1608"
The residuals are white noise. So this model appears to be adequate. The residuals of the previous model:
for(i in (length(coef(mdl_5_1))+1):20){
aa <- Box.test(mdl_5_1$residuals, lag = i, type = "Ljung-Box", fitdf = length(coef(mdl_5_1)))
print(paste0("lag = ", i, " p-value = ", round(aa$p.value, 5)))
}
## [1] "lag = 7 p-value = 0.06977"
## [1] "lag = 8 p-value = 0.06436"
## [1] "lag = 9 p-value = 0.13757"
## [1] "lag = 10 p-value = 0.21511"
## [1] "lag = 11 p-value = 0.32666"
## [1] "lag = 12 p-value = 0.38641"
## [1] "lag = 13 p-value = 0.49784"
## [1] "lag = 14 p-value = 0.22004"
## [1] "lag = 15 p-value = 0.29774"
## [1] "lag = 16 p-value = 0.35561"
## [1] "lag = 17 p-value = 0.3971"
## [1] "lag = 18 p-value = 0.36719"
## [1] "lag = 19 p-value = 0.23814"
## [1] "lag = 20 p-value = 0.28593"
are also WN.
4.2.3.1.3 Automatic model selection without any restrictions
## Series: lnyse
## ARIMA(0,1,0) with drift
##
## Coefficients:
## drift
## 0.0069
## s.e. 0.0018
##
## sigma^2 estimated as 0.00165: log likelihood=942.79
## AIC=-1881.59 AICc=-1881.56 BIC=-1873.05
So, mdl_5_3
is the same as mdl_5_2
.
If we’d like, we can compare the models via AIC/BIC:
mdl_ic <- data.frame(mdl_5_1 = c(AIC(mdl_5_1), BIC(mdl_5_1)),
mdl_5_2 = c(AIC(mdl_5_2), BIC(mdl_5_2)))
rownames(mdl_ic) <- c("AIC", "BIC")
print(mdl_ic)
## mdl_5_1 mdl_5_2
## AIC -1877.962 -1881.587
## BIC -1848.078 -1873.049
Overall, the second model, mdl_5_2
, appears to be better, since its AIC and BIC values are smaller.
Considerations
Intuitively, we may assume that we can simply differenciate the data and then fit the model on the differences.
For, example, the \(\rm ARIMA(0, 1, 0)\) on \(Y_t\):
## Series: lnyse
## ARIMA(0,1,0) with drift
##
## Coefficients:
## drift
## 0.0069
## s.e. 0.0018
##
## sigma^2 estimated as 0.00165: log likelihood=942.79
## AIC=-1881.59 AICc=-1881.56 BIC=-1873.05
Can be written as a \(\rm ARIMA(0, 0, 0)\) model on the differences \(\Delta Y_t\):
## Series: diff(lnyse)
## ARIMA(0,0,0) with non-zero mean
##
## Coefficients:
## mean
## 0.0069
## s.e. 0.0018
##
## sigma^2 estimated as 0.00165: log likelihood=942.79
## AIC=-1881.59 AICc=-1881.56 BIC=-1873.05
While the model output seems to be the same. If we were to estimate the fit - the fitted values are then calculated for \(\widehat{\Delta Y_t}\):
## Feb Mar Apr May Jun Jul Aug Sep Oct Nov
## 1952 0.00686458 0.00686458 0.00686458 0.00686458 0.00686458 0.00686458 0.00686458 0.00686458 0.00686458 0.00686458
which are constant values. This is to be expected, since our model for \(\widehat{\Delta Y_t}\) is, in fact, a constant. Unfortunately, in this case the forecasts are NOT the same the \(\rm ARIMA(0, 1, 0)\) models on \(Y_t\), even if we compare the differences:
plot(diff(lnyse), lwd = 2)
lines(fitted(m_tst_2), col = "red")
lines(diff(m_tst_1$fitted), col = "blue", lty = 2)
legend("bottomleft", y.intersp = 3, cex = 0.8,
legend = c(as.expression(bquote(Delta~Y[t])), as.expression(bquote(widehat(Delta~Y)[t])), as.expression(bquote(widehat(Y)[t] - Y[t-1]))),
lty = c(1, 1, 2), col = c("black", "red", "blue"), lwd = 2)
Check out a comment on stackoverflow by the author of the forecast package. For \(\Delta Y_t \sim AR(1)\) the difference is that:
When fitting a \(I(1)\) model on \(Y_t\), the fitted values are: \[\widehat{Y}_t = Y_{t-1} + \widehat{\gamma}\cdot(Y_{t-1} - Y_{t-2})\]
When fitting a \(I(0)\) model on \(\Delta Y_t\), the fitted values are: \[\widehat{\Delta {Y}}_t = \widehat{\gamma}\cdot(Y_{t-1} - Y_{t-2})\]
In our case this is equivalent to comparing the fitted values of \(\widehat{(Y_{t-1} - Y_{t-2})} = \widehat{\alpha}\) versus \(\widehat{Y}_t = \widehat{\alpha} + Y_{t-1}\).
The problem is that in general \(\widehat{Y}_t - Y_{t-1} \neq \widehat{\Delta {Y}}_t\).
Fortunately, once we have the coefficient estimates we can use the parametrization (assuming that \(H_0: \rho = 0\) is not rejected): \(\Delta Y_t = \alpha + \delta \cdot t \sum_{j = 1}^{p-1} \gamma_j \Delta Y_{t-j} + \epsilon_t\)
In this case, our equation is: \(\Delta Y_t = \alpha + \epsilon_t\).
Since we do not want to calculate \(\widehat{\Delta Y}_t\), but rather \(\widehat{Y}_t\), we simply apply the definition of \(\Delta\) and get: \(Y_t = \alpha + Y_{t-1} + \epsilon_t\).
The two equations are equivalent, just with a different parametrization. If we fit a model on manually-calculated \(\Delta Y_t\) - there is no way for the software to detect that you took the differences MANUALLY.
This is also evident from the fact that the coefficients are identical (though they may sometimes differ due to different starting values for differenced versus non-differenced data).
Remember that the fitted values are calculated as a conditional expectation on \(Y_t\), conditionally that any variables on the right-hand-side are given, hence (replacing \(\epsilon_t\) with its mean of zero): \[ \widehat{Y}_t = \widehat{\alpha} + Y_{t-1},\ t = 2,...,T \] Assuming that our series starts at \(Y_1\) with \(Y_s = 0\) for \(s \leq 0\).
(if we had a moving-average part, we could calculate \(\widehat{\epsilon}_s = Y_s - \widehat{Y}_s\) for \(s < t\) and use the residuals for for \(t + 1\)).
In the case of the differences model, we have that:
The first fitted value is equal to the true value
We can compare the fitted values with the ones from the ARIMA model on \(Y_t\):
aa <- data.frame(m_tst_1$fitted, c(NA, y_fit))
colnames(aa) <- c("Y_ARIMA", "dY_ARMA")
aa$diff <- aa$Y_ARIMA - aa$dY_ARMA
head(aa)
## Y_ARIMA dY_ARMA diff
## 1 4.600572 NA NA
## 2 4.612035 4.612035 -2.486900e-14
## 3 4.627495 4.627495 1.154632e-14
## 4 4.595446 4.595446 1.154632e-14
## 5 4.636687 4.636687 1.154632e-14
## 6 4.586286 4.586286 1.154632e-14
Obviously, diff(aa$Y_ARIMA)
and diff(aa$dY_ARMA)
will also be equal.
However, this is different from fitted(m_tst_2)
:
bbb <- data.frame(diff(m_tst_1$fitted), m_tst_2$fitted)
colnames(bbb) <- c("Y_ARIMA_diff", "dY_ARMA_diff")
bbb$diff <- bbb$Y_ARIMA - bbb$dY_ARMA
head(bbb)
## Y_ARIMA_diff dY_ARMA_diff diff
## 1 0.01146288 0.00686458 0.004598303
## 2 0.01546000 0.00686458 0.008595420
## 3 -0.03204900 0.00686458 -0.038913580
## 4 0.04124100 0.00686458 0.034376420
## 5 -0.05040100 0.00686458 -0.057265580
## 6 0.02504100 0.00686458 0.018176420
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.2253176 -0.0242888 0.0024959 -0.0000115 0.0260387 0.1579624
All in all, it is much easier to specify the model for \(Y_t\), instead of manually calculating the differences and estimating a model on \(\Delta Y_t\), as most forecasting and fitting functions are readily available.
4.2.3.2 Task 6
Assume that we have selected the following model as the “best” one:
## Series: lnyse
## ARIMA(0,1,0) with drift
##
## Coefficients:
## drift
## 0.0069
## s.e. 0.0018
##
## sigma^2 estimated as 0.00165: log likelihood=942.79
## AIC=-1881.59 AICc=-1881.56 BIC=-1873.05
which we can write as: \(\Delta Y_{t} = 0.0069 + \epsilon_t\), or equivalently as: \(Y_t = 0.0069 + Y_{t-1} + \epsilon_t\). In other words, this is a random walk with drift.
Considerations
Assume that we are not so “lucky” and our model is much more complex. For example
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.07357 0.27887 -0.26382 0.79203
## L(d(lnyse), 1:5)1 0.04398 0.04382 1.00357 0.31606
## L(d(lnyse), 1:5)2 -0.03932 0.04380 -0.89786 0.36968
## L(d(lnyse), 1:5)3 0.01035 0.04380 0.23620 0.81337
## L(d(lnyse), 1:5)4 0.02288 0.04376 0.52299 0.60121
## L(d(lnyse), 1:5)5 0.09225 0.04373 2.10951 0.03538
## time(lnyse) 0.00004 0.00014 0.28548 0.77539
Since we needed the trend for unit root testing - having removed the insignificant \(\rho\) coefficient, we could try to remove the insignificant trend and improve our model. But we will assume that this model is acceptable as is.
Remembering the model parametrization of dynlm
, the estimated model for \(\Delta Y_t\) is:
\(\Delta Y_t = -0.07357 + 0.00004 \cdot t + 0.04398 \Delta Y_{t-1} -0.03932 \Delta Y_{t-2} + 0.01035 \Delta Y_{t-3} + 0.02288 \Delta Y_{t-4} + 0.09225 \Delta Y_{t-5} + \epsilon_t\),
which we can re-write as a model for \(Y_t\).
We use the property \(\gamma_j = -[\phi_{j+1} + ... + \phi_p]\), \(j = 1, ..., p - 1\) and the expression of \(\rho\) to get the coefficients:
\[ \begin{aligned} \gamma_5 &= - \phi_6 &&\Longrightarrow \phi_6 = - \gamma_5 = -0.09225\\ \gamma_4 &= - [\phi_5 + \phi_6] &&\Longrightarrow \phi_5 = - \gamma_4 - \phi_6 = -0.02288+0.09225 = 0.06937\\ \gamma_3 &= - [\phi_4 + \phi_5 + \phi_6] &&\Longrightarrow \phi_4 = - \gamma_3 - [\phi_5 + \phi_6] = - \gamma_3 + \gamma_4 = -0.01035+0.02288 = 0.01253\\ \gamma_2 &= - [\phi_3 + \phi_4 + \phi_5 + \phi_6] &&\Longrightarrow \phi_3 = - \gamma_2 + \gamma_3 = 0.03932 + 0.01035 = 0.04967\\ \gamma_1 &= - [\phi_2 + \phi_3 + \phi_4 + \phi_5 + \phi_6] &&\Longrightarrow \phi_2 = - \gamma_1 + \gamma_2 = -0.04398 -0.03932 = -0.0833\\ \rho &= \phi_1 + \phi_2 + \phi_3 + \phi_4 + \phi_5 + \phi_6 - 1,\text{ since } \rho = 0 &&\Longrightarrow \phi_1 = 1 - (\phi_2 + \phi_3 + \phi_4 + \phi_5 + \phi_6) = 1 + \gamma_1= 1 + 0.04398 = 1.04398 \end{aligned} \]
So, the equivalent model for \(Y_t\) is:
\(Y_t = -0.07357 + 0.00004 \cdot t + 1.04398 Y_{t-1} -0.0833 Y_{t-2} + 0.04967 Y_{t-3} + 0.01253 Y_{t-4} + 0.06937 Y_{t-5} -0.09225 Y_{t-6} + \epsilon_t\)
To verify that this is indeed the correct transformation, we can try to simulate a few observations with the same residuals:
First, we will simulate \(Y_t\):
Y <- NULL
Y[1] <- -0.07357 + 0.00004 * tt[1] + eps[1]
Y[2] <- -0.07357 + 0.00004 * tt[2] + 1.04398 * Y[1] + eps[2]
Y[3] <- -0.07357 + 0.00004 * tt[3] + 1.04398 * Y[2] -0.0833 * Y[1] + eps[3]
Y[4] <- -0.07357 + 0.00004 * tt[4] + 1.04398 * Y[3] -0.0833 * Y[2] + 0.04967 * Y[1] + eps[4]
Y[5] <- -0.07357 + 0.00004 * tt[5] + 1.04398 * Y[4] -0.0833 * Y[3] + 0.04967 * Y[2] + 0.01253 * Y[1] + eps[5]
Y[6] <- -0.07357 + 0.00004 * tt[6] + 1.04398 * Y[5] -0.0833 * Y[4] + 0.04967 * Y[3] + 0.01253 * Y[2] + 0.06937 * Y[1] + eps[6]
for(j in 7:N){
Y[j] <- -0.07357 + 0.00004 * tt[j] + 1.04398 * Y[j-1] -0.0833 * Y[j-2] + 0.04967 * Y[j - 3] + 0.01253 * Y[j - 4] + 0.06937 * Y[j - 5] - 0.09225 * Y[j - 6] + eps[j]
}
Secondly, we will simulate \(\Delta Y_t\):
dY <- NULL
dY[1] <- -0.07357 + 0.00004 * tt[1] + eps[1]
dY[2] <- -0.07357 + 0.00004 * tt[2] + 0.04398 * dY[1] + eps[2]
dY[3] <- -0.07357 + 0.00004 * tt[3] + 0.04398 * dY[2] -0.03932 * dY[1] + eps[3]
dY[4] <- -0.07357 + 0.00004 * tt[4] + 0.04398 * dY[3] -0.03932 * dY[2] + 0.01035 * dY[1] + eps[4]
dY[5] <- -0.07357 + 0.00004 * tt[5] + 0.04398 * dY[4] -0.03932 * dY[3] + 0.01035 * dY[2] + 0.02288 * dY[1] + eps[5]
for(j in 6:N){
dY[j] <- -0.07357 + 0.00004 * tt[j] + 0.04398 * dY[j-1] -0.03932 * dY[j-2] + 0.01035 * dY[j - 3] + 0.02288 * dY[j - 4] + 0.09225 * dY[j - 5] + eps[j]
}
Finally, we will assume that the first 100
, or so, observations of \(Y_t\) are burn-in values (since the first couple of values assume that \(Y_j = \epsilon_j = 0\), for \(j \leq 0\)). This means that we treat treat the first 101
\(\Delta Y_t\) values as burn-in values as well and drop them:
We can compare \(\Delta Y_t\) with \(Y_t - Y_{t-1}\):
dt_compare <- data.frame(diff(Y), dY)
dt_compare$DIFFERENCE <- round(diff(Y) - dY, 10)
head(dt_compare)
## diff.Y. dY DIFFERENCE
## 1 0.4191779 0.4191779 0
## 2 -0.1389262 -0.1389262 0
## 3 -0.5043527 -0.5043527 0
## 4 -1.1470432 -1.1470432 0
## 5 -0.2097488 -0.2097488 0
## 6 -0.7880470 -0.7880470 0
We see that there is no difference between the simulate \(Y_t - Y_{t-1}\) (diff.Y.
), which was calculated from the simulated \(Y_t\) series, and \(\Delta Y_t\) (dY
).
We can also visually compare the charts:
So, the two models are equivalent. For simulation and forecasting we may be more interested in \(Y_t\) model, as it is the original series, which we are interested in.
We may also be interesting in verifying that this is indeed the same model by simulating it for a larger sample:
Y <- NULL
Y[1] <- -0.07357 + 0.00004 * tt[1] + eps[1]
Y[2] <- -0.07357 + 0.00004 * tt[2] + 1.04398 * Y[1] + eps[2]
Y[3] <- -0.07357 + 0.00004 * tt[3] + 1.04398 * Y[2] -0.0833 * Y[1] + eps[3]
Y[4] <- -0.07357 + 0.00004 * tt[4] + 1.04398 * Y[3] -0.0833 * Y[2] + 0.04967 * Y[1] + eps[4]
Y[5] <- -0.07357 + 0.00004 * tt[5] + 1.04398 * Y[4] -0.0833 * Y[3] + 0.04967 * Y[2] + 0.01253 * Y[1] + eps[5]
Y[6] <- -0.07357 + 0.00004 * tt[6] + 1.04398 * Y[5] -0.0833 * Y[4] + 0.04967 * Y[3] + 0.01253 * Y[2] + 0.06937 * Y[1] + eps[5]
for(j in 7:N){
Y[j] <- -0.07357 + 0.00004 * tt[j] + 1.04398 * Y[j-1] -0.0833 * Y[j-2] + 0.04967 * Y[j - 3] + 0.01253 * Y[j - 4] + 0.06937 * Y[j - 5] - 0.09225 * Y[j - 6] + eps[j]
}
and estimating the model:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.09239 0.05048 -1.83037 0.06735
## L(d(Y), 1:5)1 0.00563 0.02293 0.24545 0.80614
## L(d(Y), 1:5)2 -0.07151 0.02291 -3.12086 0.00183
## L(d(Y), 1:5)3 0.00008 0.02297 0.00331 0.99736
## L(d(Y), 1:5)4 0.02966 0.02291 1.29427 0.19573
## L(d(Y), 1:5)5 0.08955 0.02292 3.90678 0.00010
## time(Y) 0.00008 0.00004 1.95842 0.05033
The coefficients are close to the true values, used in model simulation:
On the other hand, if our model was specified via Arima
, we would need to differently specify the model.
## Series: lnyse
## Regression with ARIMA(5,1,0) errors
##
## Coefficients:
## ar1 ar2 ar3 ar4 ar5 tt
## 0.0380 -0.0350 0.0089 0.0228 0.0923 0.0069
## s.e. 0.0433 0.0434 0.0434 0.0434 0.0434 0.0020
##
## sigma^2 estimated as 0.001645: log likelihood=945.98
## AIC=-1877.96 AICc=-1877.75 BIC=-1848.08
which is equivalent to:
## Series: lnyse
## ARIMA(5,1,0) with drift
##
## Coefficients:
## ar1 ar2 ar3 ar4 ar5 drift
## 0.0380 -0.0350 0.0089 0.0228 0.0923 0.0069
## s.e. 0.0433 0.0434 0.0434 0.0434 0.0434 0.0020
##
## sigma^2 estimated as 0.001645: log likelihood=945.98
## AIC=-1877.96 AICc=-1877.75 BIC=-1848.08
or
##
## Call:
## arima(x = lnyse, order = c(5, 1, 0), xreg = 1:length(lnyse))
##
## Coefficients:
## ar1 ar2 ar3 ar4 ar5 1:length(lnyse)
## 0.0380 -0.0350 0.0089 0.0228 0.0923 0.0069
## s.e. 0.0433 0.0434 0.0434 0.0434 0.0434 0.0020
##
## sigma^2 estimated as 0.001627: log likelihood = 945.98, aic = -1877.96
This model, estimated via Arima
for \(\Delta Y_t\) is:
\[ \begin{cases} \Delta Y_t &= \tilde{\delta} + u_t \\ u_t &= \gamma_1 u_{t-1} + \gamma_2 u_{t-2} + \gamma_3 u_{t-3} + \gamma_4 u_{t-4} + \gamma_5 u_{t-5} + \epsilon_t \end{cases} \]
which is equivalent to:
\[ \Delta Y_t = \tilde{\delta} (1 - \gamma_1 - \gamma_2 - \gamma_3 - \gamma_4 - \gamma_5) + \gamma_1 \Delta Y_{t-1} + \gamma_2 \Delta Y_{t-2} + \gamma_3 \Delta Y_{t-3} + \gamma_4 \Delta Y_{t-4} + \gamma_5 \Delta Y_{t-5} + \epsilon_t \]
The constant term can be calculated as:
cfs <- coef(forecast::Arima(lnyse, order = c(5, 1, 0), include.drift = TRUE))
print(round(unname(cfs["drift"]) * (1 - sum(cfs[paste0("ar", 1:5)])), 4))
## [1] 0.006
Along with the coefficient values this gives us:
\[ \Delta Y_t = 0.006 + 0.0380 \Delta Y_{t-1} - 0.0350 \Delta Y_{t-2} + 0.0089\Delta Y_{t-3} + 0.0228 \Delta Y_{t-4} + 0.0923 \Delta Y_{t-5} + \epsilon_t \]
An important distinction is how we arrived at such a model. In the previous case - we have transformed \(Y_t\) into an equivalent model via differences in order to test whether a unit root exists. Since it did - we removed an additional insignificant parameter and were left with a model, which included both a constant and a trend in the differences model.
Assuming that the parametrization is the same as before, with \(\rho = 0\) - we arrive at the following coefficient parametrizations:
- \(\tilde{\delta} (1 - \gamma_1 - \gamma_2 - \gamma_3 - \gamma_4 - \gamma_5)\) is the constant.
- The remaining coefficient have the same expressions as before:
\[ \begin{aligned} \phi_1 &= 1 + \gamma_1 \\ \phi_2 &= -\gamma_1 + \gamma_2\\ \phi_3 &= -\gamma_2 + \gamma_3\\ \phi_4 &= -\gamma_3 + \gamma_4\\ \phi_5 &= -\gamma_4 + \gamma_5\\ \phi_6 &= -\gamma_5 \end{aligned} \]
So the final equation is \[ \begin{aligned} Y_t &= \tilde{\delta} (1 - \gamma_1 - \gamma_2 - \gamma_3 - \gamma_4 - \gamma_5) + (1 + \gamma_1) Y_{t-1} + (\gamma_2-\gamma_1) Y_{t-2} + (\gamma_3 - \gamma_2) Y_{t-3} + (\gamma_4-\gamma_3) Y_{t-4} + (\gamma_5-\gamma_4) Y_{t-5} + (-\gamma_5)Y_{t-6} + \epsilon_t \end{aligned} \]
We can use the above equations to calculate the \(\widehat{\phi}_j\) coefficients:
## [1] "phi_1 = 1.038000"
## [1] "phi_2 = -0.072900"
## [1] "phi_3 = 0.043800"
## [1] "phi_4 = 0.013900"
## [1] "phi_5 = 0.069500"
## [1] "phi_6 = -0.092300"
The model along with the coefficients can be written as:
\[ \begin{aligned} Y_t &= 0.006 + 1.038 Y_{t-1} -0.0729 Y_{t-2} + 0.0438 Y_{t-3} + 0.0139 Y_{t-4} + 0.0695 Y_{t-5} -0.0923Y_{t-6} + \epsilon_t \end{aligned} \]
We will show a simulation example to highlight that the two equations produce the same processes.
Considerations
Comparison with dynlm
If we want to directly compare with dynlm
, we need to specify the trend to be \(t = 1,...\), instead of time()
, which takes into account the frequency and year.
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.005143 0.003664 1.403733 0.161000
## L(d(lnyse), 1:5)1 0.043978 0.043822 1.003570 0.316056
## L(d(lnyse), 1:5)2 -0.039325 0.043798 -0.897864 0.369677
## L(d(lnyse), 1:5)3 0.010346 0.043802 0.236199 0.813372
## L(d(lnyse), 1:5)4 0.022884 0.043757 0.522990 0.601206
## L(d(lnyse), 1:5)5 0.092252 0.043731 2.109512 0.035381
## lnyse_tt 0.000003 0.000012 0.285478 0.775393
The dynlm
model is:
\(\Delta Y_t = 0.005143 + 0.000003 \cdot t + 0.04398 \Delta Y_{t-1} -0.03932 \Delta Y_{t-2} + 0.01035 \Delta Y_{t-3} + 0.02288 \Delta Y_{t-4} + 0.09225 \Delta Y_{t-5} + \epsilon_t\),
Using the coefficient equalities \(\phi_6 = -\gamma_5\), \(\phi_5 = -\gamma_4 + \gamma_5\), …, we have that the above model for \(Y_t\) is:
\(Y_t = 0.005143 + 0.000003 \cdot t + 1.04398 Y_{t-1} -0.0833 Y_{t-2} + 0.04967 Y_{t-3} + 0.01253 Y_{t-4} + 0.06937 Y_{t-5} -0.09225 Y_{t-6} + \epsilon_t\)
The difference from the Arima
is due to the different parametrization and different parameter estimation methods employed.
Note that since we changed the trend component to have values \(t = 1,...\), only the (intercept)
and trend
parameters changed in the dynlm
model.
Considerations
If we want our model to have a trend component, we need to use the fact that \(Y_t = c_1 + c_2 \cdot t + \phi Y_{t-1} + w_t\) can be expressed as
\(Y_t - Y_{t-1} = c_1 - c_1 + c_2 \cdot t - c2 \cdot (t-1) + \phi Y_{t-1} - \phi Y_{t-1} + w_t - w_{t-1}\)
or
\(\Delta Y_t = c_2 + \phi \Delta Y_{t-1} + \Delta w_t\). In other words, we cannot identify the constant \(c_1\) and so we can set it to \(0\).
Considerations
To again verify the equivalency between the models, we will simulate \(Y_t\):
Y <- NULL
Y[1] <- delta_coef + eps[1]
Y[2] <- delta_coef + (1 + 0.0380) * Y[1] + eps[2]
Y[3] <- delta_coef + (1 + 0.0380) * Y[2] + (-0.0350 - 0.0380) * Y[1] + eps[3]
Y[4] <- delta_coef + (1 + 0.0380) * Y[3] + (-0.0350 - 0.0380) * Y[2] + (0.0089 + 0.0350) * Y[1] + eps[4]
Y[5] <- delta_coef + (1 + 0.0380) * Y[4] + (-0.0350 - 0.0380) * Y[3] + (0.0089 + 0.0350) * Y[2] + (0.0228 - 0.0089) * Y[1] + eps[5]
Y[6] <- delta_coef + (1 + 0.0380) * Y[5] + (-0.0350 - 0.0380) * Y[4] + (0.0089 + 0.0350) * Y[3] + (0.0228 - 0.0089) * Y[2] + (0.0923 - 0.0228) * Y[1] + eps[6]
for(j in 7:N){
Y[j] <- delta_coef + (1 + 0.0380) * Y[j-1] + (-0.0350 - 0.0380) * Y[j-2] + (0.0089 + 0.0350) * Y[j - 3] + (0.0228 - 0.0089) * Y[j - 4] + (0.0923 - 0.0228) * Y[j - 5] - 0.0923 * Y[j - 6] + eps[j]
}
Secondly, we will simulate \(\Delta Y_t\):
dY <- NULL
dY[1] <- delta_coef + eps[1]
dY[2] <- delta_coef + 0.0380 * dY[1] + eps[2]
dY[3] <- delta_coef + 0.0380 * dY[2] -0.0350 * dY[1] + eps[3]
dY[4] <- delta_coef + 0.0380 * dY[3] -0.0350 * dY[2] + 0.0089 * dY[1] + eps[4]
dY[5] <- delta_coef + 0.0380 * dY[4] -0.0350 * dY[3] + 0.0089 * dY[2] + 0.0228 * dY[1] + eps[5]
for(j in 6:N){
dY[j] <- delta_coef + 0.0380 * dY[j-1] -0.0350 * dY[j-2] + 0.0089 * dY[j - 3] + 0.0228 * dY[j - 4] + 0.0923 * dY[j - 5] + eps[j]
}
Finally, we will assume that the first 100
, or so, observations of \(Y_t\) are burn-in values (since the first couple of values assume that \(Y_j = \epsilon_j = 0\), for \(j \leq 0\)). This means that we treat treat the first 101
\(\Delta Y_t\) values as burn-in values as well and drop them:
We can compare \(\Delta Y_t\) with \(Y_t - Y_{t-1}\):
dt_compare <- data.frame(diff(Y), dY)
dt_compare$DIFFERENCE <- round(diff(Y) - dY, 10)
head(dt_compare)
## diff.Y. dY DIFFERENCE
## 1 0.5067329 0.5067329 0
## 2 -0.0581769 -0.0581769 0
## 3 -0.4142284 -0.4142284 0
## 4 -1.0578262 -1.0578262 0
## 5 -0.1185169 -0.1185169 0
## 6 -0.7046021 -0.7046021 0
We see that there is no difference between the simulated \(Y_t - Y_{t-1}\) (diff.Y.
), which was calculated from the simulated \(Y_t\) series, and \(\Delta Y_t\) (dY
).
We can also visually compare the charts:
Furthermore, we can compare with the dynlm
coefficient values:
Y_dynlm <- NULL
Y_dynlm[1] <- 0.005143 + 0.000003 * tt[1] + eps[1]
Y_dynlm[2] <- 0.005143 + 0.000003 * tt[2] + 1.04398 * Y_dynlm[1] + eps[2]
Y_dynlm[3] <- 0.005143 + 0.000003 * tt[3] + 1.04398 * Y_dynlm[2] -0.0833 * Y[1] + eps[3]
Y_dynlm[4] <- 0.005143 + 0.000003 * tt[4] + 1.04398 * Y_dynlm[3] -0.0833 * Y_dynlm[2] + 0.04967 * Y_dynlm[1] + eps[4]
Y_dynlm[5] <- 0.005143 + 0.000003 * tt[5] + 1.04398 * Y_dynlm[4] -0.0833 * Y_dynlm[3] + 0.04967 * Y_dynlm[2] + 0.01253 * Y_dynlm[1] + eps[5]
Y_dynlm[6] <- 0.005143 + 0.000003 * tt[6] + 1.04398 * Y_dynlm[5] -0.0833 * Y_dynlm[4] + 0.04967 * Y_dynlm[3] + 0.01253 * Y_dynlm[2] + 0.06937 * Y_dynlm[1] + eps[6]
for(j in 7:N){
Y_dynlm[j] <- 0.005143 + 0.000003 * tt[j] + 1.04398 * Y_dynlm[j-1] -0.0833 * Y_dynlm[j-2] + 0.04967 * Y_dynlm[j - 3] + 0.01253 * Y_dynlm[j - 4] + 0.06937 * Y_dynlm[j - 5] - 0.09225 * Y_dynlm[j - 6] + eps[j]
}
plot.ts(ts(Y), col = "red", lwd = 2, ylim = c(min(Y, Y_dynlm), max(Y, Y_dynlm)))
lines(ts(Y_dynlm), col = "blue", lty = 2, lwd = 2)
legend("topright", c("Y via forecast::Arima() parametrization", "", "Y via dynlm() parametrization"), col = c("red", NA, "blue"), lty = c(1, NA, 2))
Again, we may want to estimate the model on a larger simulated dataset:
Y <- NULL
Y[1] <- delta_coef + eps[1]
Y[2] <- delta_coef + (1 + 0.0380) * Y[1] + eps[2]
Y[3] <- delta_coef + (1 + 0.0380) * Y[2] + (-0.0350 - 0.0380) * Y[1] + eps[3]
Y[4] <- delta_coef + (1 + 0.0380) * Y[3] + (-0.0350 - 0.0380) * Y[2] + (0.0089 + 0.0350) * Y[1] + eps[4]
Y[5] <- delta_coef + (1 + 0.0380) * Y[4] + (-0.0350 - 0.0380) * Y[3] + (0.0089 + 0.0350) * Y[2] + (0.0228 - 0.0089) * Y[1] + eps[5]
Y[6] <- delta_coef + (1 + 0.0380) * Y[5] + (-0.0350 - 0.0380) * Y[4] + (0.0089 + 0.0350) * Y[3] + (0.0228 - 0.0089) * Y[2] + (0.0923 - 0.0228) * Y[1] + eps[6]
for(j in 7:N){
Y[j] <- delta_coef + (1 + 0.0380) * Y[j-1] + (-0.0350 - 0.0380) * Y[j-2] + (0.0089 + 0.0350) * Y[j - 3] + (0.0228 - 0.0089) * Y[j - 4] + (0.0923 - 0.0228) * Y[j - 5] - 0.0923 * Y[j - 6] + eps[j]
}
and estimating the model:
## Series: Y
## ARIMA(5,1,0) with drift
##
## Coefficients:
## ar1 ar2 ar3 ar4 ar5 drift
## 0.0120 -0.0258 0.0022 0.0248 0.0996 0.0063
## s.e. 0.0112 0.0112 0.0112 0.0112 0.0112 0.0126
##
## sigma^2 estimated as 0.9913: log likelihood=-11170.79
## AIC=22355.57 AICc=22355.59 BIC=22404.39
The coefficients are close to the true values, used in model simulation:
## Series: lnyse
## ARIMA(5,1,0) with drift
##
## Coefficients:
## ar1 ar2 ar3 ar4 ar5 drift
## 0.0380 -0.0350 0.0089 0.0228 0.0923 0.0069
## s.e. 0.0433 0.0434 0.0434 0.0434 0.0434 0.0020
##
## sigma^2 estimated as 0.001645: log likelihood=945.98
## AIC=-1877.96 AICc=-1877.75 BIC=-1848.08
4.2.4 Exercise Set 4: Forecasting The Future
4.2.4.1 Task 7
We can calculate the forecasts as follows (note that if we include a trend in the \(ARIMA\) as an exogeneous regressor, we will need to specify the future trend values as well):
We will plot the original historical data, the fitted values and the forecasts.
plot.ts(exp(lnyse),
xlim = c(floor(min(time(lnyse))), floor(max(time(lnyse))) + 1),
ylim = c(0, max(exp(mdl_5_1_forc$mean), exp(mdl_5_2_forc$mean))))
#
lines(exp(fitted(mdl_5_1)), col = "red", lty = 2)
lines(exp(fitted(mdl_5_2)), col = "blue", lty = 2)
#
lines(exp(mdl_5_1_forc$mean), col = "red", lty = 2)
lines(exp(mdl_5_2_forc$mean), col = "blue", lty = 2)
As it is difficult to compare the values, we will also shrink the time window to see the last few years:
plot.ts(window(exp(lnyse), start = 1990),
xlim = c(1990, floor(max(time(lnyse))) + 1),
ylim = c(1800, max(exp(mdl_5_1_forc$mean), exp(mdl_5_2_forc$mean))),
lwd = 2)
#
lines(exp(fitted(mdl_5_1)), col = "red", lty = 2)
lines(exp(fitted(mdl_5_2)), col = "blue", lty = 2)
#
lines(exp(mdl_5_1_forc$mean), col = "red", lty = 2)
lines(exp(mdl_5_2_forc$mean), col = "blue", lty = 2)
#
legend("topleft", legend = c(as.expression(bquote("Actual"~Y[t])),
"ARIMA(5, 1, 0) with drift",
"ARIMA(0, 1, 0) with drift"),
lty = 1, col = c("black", "red", "blue"))
We see that the model forecasts do not appear to differ by a whole lot.
4.2.5 Exercise Set 5: Model Cross Validation
4.2.5.1 Task 8
Assume that our best model (again, we can select it via AIC or BIC but we choose a more complex one for a more general example) is:
## Series: lnyse
## Regression with ARIMA(5,1,0) errors
##
## Coefficients:
## ar1 ar2 ar3 ar4 ar5 tt
## 0.0380 -0.0350 0.0089 0.0228 0.0923 0.0069
## s.e. 0.0433 0.0434 0.0434 0.0434 0.0434 0.0020
##
## sigma^2 estimated as 0.001645: log likelihood=945.98
## AIC=-1877.96 AICc=-1877.75 BIC=-1848.08
## Series: lnyse
## ARIMA(5,1,0) with drift
##
## Coefficients:
## ar1 ar2 ar3 ar4 ar5 drift
## 0.0380 -0.0350 0.0089 0.0228 0.0923 0.0069
## s.e. 0.0433 0.0434 0.0434 0.0434 0.0434 0.0020
##
## sigma^2 estimated as 0.001645: log likelihood=945.98
## AIC=-1877.96 AICc=-1877.75 BIC=-1848.08
So, we will use \(k\) such that \(20\%\) of our series and create \(k = \lfloor 0.2 \cdot T \rfloor\) expanding window samples and for each sample we will:
- estimate the model;
- forecast one period ahead;
- calculate the error between the forecast and the true value;
for(i in k:1){
# get the sample
smpl <- lnyse[1:(length(lnyse) - i)]
# re-estimate the model:
tt <- data.frame(tt = 1:length(smpl))
smpl_mdl <- forecast::Arima(smpl, order = c(5, 1, 0), include.drift = TRUE)
# calculate the one-step ahead forecast
smpl_forc_1 <- forecast(smpl_mdl, h = 1) # since we have decided to include the trend as an exogeneous regressor,
# alternatively - see the code in this file on how to include it automatically
# calculate the forecast error:
tmp_e <- lnyse[length(smpl) + 1] - smpl_forc_1$mean
# save the parameters:
mdl_params <- rbind(mdl_params, coef(smpl_mdl))
# Save the forecast errors:
forcast_errors <- c(forcast_errors, tmp_e)
}
## [1] 0.017620987 0.019993786 0.024964316 0.010563250 0.022469878 -0.005528249 0.029867446 -0.023850741 0.037985968 0.006078093
The root mean squared error is:
We can compare with the RMSE of the same model in TASK 5
(which uses the full sample):
## [1] "RMSE (Task 5) = 0.04029411"
or, using the built-in functions:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 1.503454e-05 0.04029411 0.03042603 -0.002528188 0.4917919 0.2247457 0.006285276
We see that the out-of-sample RMSE from the cross-validation is close to the in-sample RMSE from the model on the full sample.
Considerations
Remember that we are calculating the in-sample (i.e. from the data that was used to estimate the parameters) RMSE, \(RMSE_{in}\) for the model in TASK 5
. While for the cross-validation we have calculated the out-of-sample (i.e. from data, which was not used in parameter estimation) RMSE, \(RMSE_{out}\) for the one-step-ahead forecasts. It should always be the case that \(RMSE_{in} \leq RMSE_{out}\), since the model parameters are calculated so as to minimize the error of the training set.
In this case, we could say that the one-step-ahead forecasts are just as accurate as the fitted model values, since their error is similar to the in-sample error, in terms of RMSE.
The mean of each estimated parameter is:
## ar1 ar2 ar3 ar4 ar5 drift
## [1,] 0.04456041 -0.04141763 0.01444384 0.02751923 0.09970742 0.006606406
and the variance is:
## ar1 ar2 ar3 ar4 ar5 drift
## [1,] 9.209398e-05 5.081817e-05 0.0001011694 9.729149e-05 7.961683e-05 2.978902e-08
We can visualize the estimated parameter values for each sample as separate plots:
par(mfrow = c(3, 2))
for(i in 1:ncol(mdl_params)){
plot(mdl_params[, i], main = colnames(mdl_params)[i], xlab = "sample", ylab = "coef. estimate", type = "o")
points(length(mdl_params[, i]) + 1, coef(mdl_5_1)[i], col = "red", pch = 19)
}
The red dot coincides with the full sample estimate (we can also use abline
to draw it as a horizontal line, but we do not want to treat it as a true coefficient value).
We see that as the sample size increases (here the sample size is T - k + 'sample'
), the estimates converge to some value.
Considerations
If we had more data, would the values would continue to converge to some unknown value? If the coefficient values vary quite a lot with the sample size, it is likely, that our model is not stable and would be a poor forecast for a longer future horizon.
The drift
estimate exhibits more variation compared to the remaining parameters so it is unclear how stable it would be if we had a larger sample size.
Considerations
We can also use a built-in function to carry out the cross validation for us. We do need to specify our model as a separate function for this process to work:
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 1952 NA NA NA NA NA NA 0.0570292521 -0.0717783295 NA -0.0143642770 0.0072660919 0.0122336911
## 1953 NA NA NA -0.0326470999 NA -0.0652138685 -0.0002246052 -0.0585932384
Note that this process is long and performs a cross-validation on the whole dataset. We can compare the last few values with our manually calculated ones:
## [1] 0.017620987 0.019993786 0.024964316 0.010563250 0.022469878 -0.005528249 0.029867446 -0.023850741 0.037985968 0.006078093 NA
We see that the errors are the same as from our manual calculations:
4.2.6 Additional Analysis
In the above tasks, we only considered unit root and trend cases. We now briefly turn to the airpassangers
dataset.
The airpassangers
series, which we analysed in our lectures, may also be seasonally integrated. This would require us to repeat the previous steps not only on non-seasonal differences, but on the seasonal differences as well (as well as combining both seasonal and non-seasonal differences). Here we will simply provide the output of the automatic model selection (auto.arima
in R
) function:
## Series: log(fma::airpass)
## ARIMA(0,1,1)(0,1,1)[12]
##
## Coefficients:
## ma1 sma1
## -0.4018 -0.5569
## s.e. 0.0896 0.0731
##
## sigma^2 estimated as 0.001371: log likelihood=244.7
## AIC=-483.4 AICc=-483.21 BIC=-474.77
It appears that the series exhibited unit roots in both the seasonal and non-seasonal autoregressive components (this depends on the documentation of the automatic model order selection function - in R
differences are taken if a unit root is present).
We can forecast the logarithms of the series, as well as plot the fitted values as follows:
plot(forecast(mdl_auto), main = forecast(mdl_auto)$method)
lines(fitted.values(mdl_auto), col = "red")
In order to get the original values, we would need to take exponents of appropriate elements from the forecast(mdl_auto)
variable. A more convenient way is fitting the original series and specifying a logarithmic transformation:
## Series: airpass
## ARIMA(0,1,1)(0,1,1)[12]
## Box Cox transformation: lambda= 0
##
## Coefficients:
## ma1 sma1
## -0.4018 -0.5569
## s.e. 0.0896 0.0731
##
## sigma^2 estimated as 0.001371: log likelihood=244.7
## AIC=-483.4 AICc=-483.21 BIC=-474.77
We can forecast the series, as well as plot the fitted values of the original data:
plot(forecast(mdl_auto), main = forecast(mdl_auto)$method)
lines(fma::airpass, col = "darkgreen", lwd = 2) # to verify that it is indeed the original series
lines(fitted.values(mdl_auto), col = "red")
Brockwell, P. J., and R. A. Davis. 2009. Time Series: Theory and Methods. Springer Series in Statistics. Springer. https://books.google.lt/books?id=\_DcYu\_EhVzUC.
———. 2016. Introduction to Time Series and Forecasting. Springer Texts in Statistics. Springer International Publishing. https://books.google.lt/books?id=P3fhDAAAQBAJ.
Hyndman, R. J., and G. Athanasopoulos. 2014. Forecasting: Principles and Practice: OTexts. https://books.google.lt/books?id=gDuRBAAAQBAJ.
Lapinskas, R. 2013a. Practical Econometrics II. Time Series Analysis (Computer Labs). Vilnius.
———. 2013b. Practical Econometrics II. Time Series Analysis (Lecture Notes). Vilnius.