In [1]:
# Set plot options in Jupyter Lab
options(repr.plot.width = 20)
options(repr.plot.height = 6)



Revisiting Time Series Decomposition and (S)ARMA models

We will revisit some of the examples for Time Series Decomposition and Seasonal ARMA model estimation

In [2]:
?fma::airpass
airpass {fma}R Documentation

Monthly Airline Passenger Numbers 1949-1960

Description

The classic Box & Jenkins airline data. Monthly totals of international airline passengers (1949–1960).

Usage

airpass

Format

A monthly time series, in thousands.

Source

Makridakis, Wheelwright and Hyndman (1998) Forecasting: methods and applications, John Wiley & Sons: New York. Exercise 2.4, Chapter 3, Exercise 4.7.

References

Box, Jenkins and Reinsell (1994) Time series analysis: forecasting and control, 3rd edition, Holden-Day: San Francisco. Series G.

Examples

plot(airpass)
seasonplot(airpass)
tsdisplay(airpass)

[Package fma version 2.3 ]
In [3]:
airpass <- fma::airpass

We can examine the data plot:

In [4]:
plot.ts(airpass)

Note the totak number of years in the series:

In [5]:
total_years <- length(unique(floor(time(airpass))))
print(total_years)
[1] 12

We can also plot the seasonal plot:

In [6]:
forecast::seasonplot(airpass, year.labels = TRUE, year.labels.left = TRUE, col = 1:total_years)

Looking at the $\rm ACF$ and $\rm PACF$ plots:

In [7]:
forecast::tsdisplay(airpass)

and note the ACF slowly declining and the decline is in waves, which appear to repeat every 12 lags - this indicates seasonality. The time series chart indicates that there is an increasing trend.

Time Series Transformation

The variation appears to be increasing, which also indicates that the series may be multiplicative: $Y_t = T_t \cdot S_t \cdot E_t$.

We want to take logarithms of the data, to get an additive form: $\log(Y_t) = \widetilde{Y}_t$ so that $\widetilde{Y}_t = \widetilde{T}_t + \widetilde{S}_t + \widetilde{E}_t$

In [8]:
log_passengers = log(airpass)
In [9]:
forecast::tsdisplay(log_passengers)

Now the series appears to be additive. We can move on to decomposition of the series.

Decomposition via Moving-average

Trend Decomposition

We can decompose the series manually via the filter(...) function:

In [10]:
one_sided_trd <- filter(log_passengers, filter = c(1/24, rep(1/12, total_years - 1), 1/24), sides = 1)
two_sided_trd <- filter(log_passengers, filter = c(1/24, rep(1/12, total_years - 1), 1/24), sides = 2)
In [11]:
plot.ts(log_passengers, main = "Log of Airpassangers")
lines(one_sided_trd, col = "red")
lines(two_sided_trd, col = "blue")
legend(x = "topleft", legend = c("Actual", "One-sided MA Trend", "Two-sided MA Trend"), lty = 1, col = c("black", "red", "blue"))

Seasonal Decomposition using the decomposed trend

Next, we move to decompose the seasonality

In [12]:
#seasonal component of one-sided averaging
one_sided_season <- log_passengers - one_sided_trd
one_sided_season <- matrix(one_sided_season, ncol = 12, byrow = TRUE)
one_sided_season <- apply(one_sided_season, 2, mean, na.rm = TRUE)
#seasonal component of two-sided averaging
two_sided_season <- log_passengers - two_sided_trd
two_sided_season <- matrix(two_sided_season, ncol = 12, byrow = TRUE)
two_sided_season <- apply(two_sided_season, 2, mean, na.rm = TRUE)

We then scale the seasons to sum up to 0:

In [13]:
one_sided_season <- one_sided_season - sum(one_sided_season) / length(one_sided_season)
In [14]:
two_sided_season <- two_sided_season - sum(two_sided_season) / length(two_sided_season)
In [15]:
par(mfrow = c(1, 2))
plot.ts(one_sided_season, main = "Seasonality from One-sided trend decomposition")
plot.ts(two_sided_season, main = "Seasonality from Two-sided trend decomposition")

Using $S_t = S_{t+d}$, create the series for the full years:

In [16]:
one_sided_season <- rep(one_sided_season, times = total_years)
one_sided_season <- ts(one_sided_season, start = start(log_passengers), freq = frequency(log_passengers))
#
two_sided_season <- rep(two_sided_season, times = total_years)
two_sided_season <- ts(two_sided_season, start = start(log_passengers), freq = frequency(log_passengers))

Next, we can examine the decomposition visually

In [17]:
par(mfrow = c(1, 2))
#
plot(log_passengers, lwd = 2, main = "One-sided averaging decomposition")
lines(one_sided_trd, col = "blue", lwd = 2)
lines(one_sided_trd + one_sided_season, col = "red", lwd = 2)
#
plot(log_passengers, lwd = 2, main = "Two-sided averaging decomposition")
lines(two_sided_trd, col = "blue", lwd = 2)
lines(two_sided_trd + two_sided_season, col = "red", lwd = 2)

We can also use the decompose(...) function to do all of this:

In [18]:
two_sided_decomp <- decompose(log_passengers, type = "additive")

Compared with the filter() function:

In [19]:
two_sided_trd - two_sided_decomp$trend
     Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1949  NA  NA  NA  NA  NA  NA   0   0   0   0   0   0
1950   0   0   0   0   0   0   0   0   0   0   0   0
1951   0   0   0   0   0   0   0   0   0   0   0   0
1952   0   0   0   0   0   0   0   0   0   0   0   0
1953   0   0   0   0   0   0   0   0   0   0   0   0
1954   0   0   0   0   0   0   0   0   0   0   0   0
1955   0   0   0   0   0   0   0   0   0   0   0   0
1956   0   0   0   0   0   0   0   0   0   0   0   0
1957   0   0   0   0   0   0   0   0   0   0   0   0
1958   0   0   0   0   0   0   0   0   0   0   0   0
1959   0   0   0   0   0   0   0   0   0   0   0   0
1960   0   0   0   0   0   0  NA  NA  NA  NA  NA  NA

The results are the same. The upside is we automatically get the seasonal decomposition as well, which are the same as our manually calculated ones:

In [20]:
two_sided_season - two_sided_decomp$seasonal
     Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1949   0   0   0   0   0   0   0   0   0   0   0   0
1950   0   0   0   0   0   0   0   0   0   0   0   0
1951   0   0   0   0   0   0   0   0   0   0   0   0
1952   0   0   0   0   0   0   0   0   0   0   0   0
1953   0   0   0   0   0   0   0   0   0   0   0   0
1954   0   0   0   0   0   0   0   0   0   0   0   0
1955   0   0   0   0   0   0   0   0   0   0   0   0
1956   0   0   0   0   0   0   0   0   0   0   0   0
1957   0   0   0   0   0   0   0   0   0   0   0   0
1958   0   0   0   0   0   0   0   0   0   0   0   0
1959   0   0   0   0   0   0   0   0   0   0   0   0
1960   0   0   0   0   0   0   0   0   0   0   0   0

We can also plot the series:

In [21]:
plot.ts(log_passengers, main = "Log of Airpassangers")
lines(two_sided_decomp$trend, col = "blue", lwd = 2)
lines(two_sided_decomp$trend + two_sided_decomp$seasonal, col = "red", lwd = 2)

and examine the residuals:

In [22]:
forecast::tsdisplay(two_sided_decomp$random)

Looking at the $\rm ACF$, the residuals still appear to have some searonality.


Exponential Smoothing examples

Simple exponential smoothing:

In [23]:
s_es <- forecast::ses(log_passengers)
In [24]:
par(mfrow = c(1, 2))
#
plot.ts(log_passengers)#, main = "Simple Exponential smoothing")
lines(fitted.values(s_es), col = "blue", lty = 2, lwd = 2)
lines(fitted.values(forecast::ses(log_passengers, alpha = 0.1)), col = "darkorange", lty = 2, lwd = 2)
#
plot(forecast::forecast(s_es, h = 10))

Double Exponential Smoothing (Holt's Linear Method)

In [25]:
d_es <- forecast::holt(log_passengers)
In [26]:
par(mfrow = c(1, 2))
#
plot.ts(log_passengers)#, main = "Double Exponential smoothing")
lines(fitted.values(d_es), col = "blue", lty = 2, lwd = 2)
lines(fitted.values(forecast::holt(log_passengers, alpha = 0.1, beta = 0.09)), col = "darkorange", lty = 2, lwd = 2)
#
plot(forecast::forecast(d_es, h = 10))

Triple Exponential Smoothing (Holt-Winter's Method)

In [27]:
t_es <- forecast::hw(log_passengers)
In [28]:
par(mfrow = c(1, 2))
#
plot.ts(log_passengers)#, main = "Triple Exponential smoothing")
lines(fitted.values(t_es), col = "blue", lty = 2, lwd = 2)
#
plot(forecast::forecast(t_es, h = 10))

R allows to use an automated procedure to choose the smoothing parameters - the function ets selects the best variant automatically (by the minimum of AIC) and estimates its parameters:

In [29]:
ets_es <- forecast::ets(log_passengers)
In [30]:
par(mfrow = c(1, 2))
#
plot.ts(log_passengers)
lines(fitted.values(ets_es), col = "blue", lty = 2, lwd = 2)
#
plot(forecast::forecast(ets_es, h = 10))

Note the help file:

  • The first letter denotes the error type ("A", "M" or "Z");
  • The second letter denotes the trend type ("N","A","M" or "Z");
  • The third letter denotes the season type ("N","A","M" or "Z").

In all cases, "N"=none, "A"=additive, "M"=multiplicative and "Z"=automatically selected. So, for example, "ANN" is simple exponential smoothing with additive errors, "MAM" is multiplicative Holt-Winters' method with multiplicative errors, and so on.

In [31]:
forecast::tsdisplay(ets_es$residuals)

The residuals ppear to have a slight autocorrelation

SARMA models with trend

In [32]:
tt_dt <- data.frame(tt = 1:length(log_passengers))
tt_dt$tt_sq <- tt_dt$tt^2
#
mdl_auto = forecast::auto.arima(airpass, max.d = 0, max.D = 0, xreg = tt_dt)
In [33]:
mdl_auto
Series: airpass 
Regression with ARIMA(0,0,3)(0,0,1)[12] errors 

Coefficients:
         ma1     ma2     ma3    sma1  intercept      tt   tt_sq
      0.9875  0.7195  0.3621  0.7592   117.1154  1.5895  0.0068
s.e.  0.0822  0.1077  0.0914  0.0668    20.7197  0.6575  0.0044

sigma^2 estimated as 359.3:  log likelihood=-630.18
AIC=1276.35   AICc=1277.42   BIC=1300.11
In [34]:
plot(airpass, main = "SARMA with trend")
lines(mdl_auto$fitted, col = "red")

We can examine the residuals

In [35]:
forecast::tsdisplay(mdl_auto$residuals)

Note that there appears to be a large autocorrelation for every 12-th lag. So the restriction of no differencing causes auto.arima to fail to find a suitible model.

For somparison, if we were to specify a $SARIMA(1, 0, 1)\times(1, 0, 1)_{12}$. If we use the default specification, we may get a warning that the series is not stationary:

In [36]:
tryCatch({
    forecast::Arima(airpass, order = c(1, 0, 1), seasonal = c(1, 0, 1), xreg = tt_dt)
}, warning = function(w) {
   # warning-handler-code
}, error = function(e) {
    print(e)
}, finally = {
    # cleanup-code
})
<simpleError in stats::arima(x = x, order = order, seasonal = seasonal, xreg = xreg,     include.mean = include.mean, method = method, ...): non-stationary seasonal AR part from CSS>

If we are sure that the process is stationary, we can specify a different parameter estimation method:

In [37]:
mdl_manual = forecast::Arima(airpass, order = c(1, 0, 1), seasonal = c(1, 0, 1), xreg = tt_dt, method = "ML")
In [38]:
mdl_manual
Series: airpass 
Regression with ARIMA(1,0,1)(1,0,1)[12] errors 

Coefficients:
         ar1      ma1    sar1     sma1  intercept      tt   tt_sq
      0.7540  -0.1003  0.9630  -0.1336   119.0055  1.4857  0.0077
s.e.  0.0724   0.1045  0.0161   0.0925    38.2025  0.5299  0.0033

sigma^2 estimated as 128:  log likelihood=-564.67
AIC=1145.35   AICc=1146.41   BIC=1169.11
In [39]:
plot(airpass, main = "SARMA with trend")
lines(mdl_manual$fitted, col = "red")
In [40]:
forecast::tsdisplay(mdl_manual$residuals)

The residuals look much better compared to the auto.arima results with restriction that the data is not to be differenced.

Finally, we can forecast the process. we do need to have the soem future values of the trend. Thankfully, we have opted to use the standard indexing of $t = 1,2,...$ for the trend.

In [41]:
tt_forc <- data.frame(tt = tail(tt_dt$tt, 1) + 1:20)
tt_forc$tt_sq <- tt_forc$tt^2
In [42]:
plot(forecast::forecast(mdl_manual, xreg = tt_forc))
lines(mdl_manual$fitted, col = "red")



Time series with unit root

In [43]:
# Set plot options in Jupyter Lab
options(repr.plot.width = 15)
options(repr.plot.height = 4)

(1) Series plot and transformation

In [44]:
suppressPackageStartupMessages({library(forecast)})
In [45]:
nyse <- read.table(url("http://uosis.mif.vu.lt/~rlapinskas/(data%20R&GRETL/nyse.txt"), header = TRUE)
nyse <- ts(nyse, start = 1952, frequency = 12)
In [46]:
head(nyse, 36)
           Jan       Feb       Mar       Apr       May       Jun       Jul
1952 100.00000 101.55801  98.35478 102.49584  97.45797  99.92923 103.53153
1953 107.99719 107.70146 106.74088 104.92693 101.94905 101.89004  99.83214
1954 102.88020 108.08621 109.38209 113.14183 117.91136 120.87454 121.79292
           Aug       Sep       Oct       Nov       Dec
1952 104.61471 103.15134 100.79363 100.10055 105.16128
1953 102.22141  97.10542  96.98586 101.49628 103.35702
1954 127.93562 124.25727 132.08256 129.77409 141.50459
In [47]:
par(mfrow = c(1, 3))
plot(nyse, main = bquote(Y[t])) 
plot(log(nyse), main = bquote(log(Y[t])))
plot(diff(log(nyse)), main = bquote(Delta~log(Y[t])))

Note that the growth appears to be exponential - it is more convenient to take logarithms to linearize the series.

In [48]:
lnyse <- log(nyse)

(2) $\rm ACF$ and $\rm PACF$ plots

In [49]:
par(mfrow = c(1, 2))
forecast::Acf(lnyse, )
forecast::Pacf(lnyse)

Slow ACF decay - indication of non-stationarity, or a trend, or a nonstationarity and/or drift and/or trend. We need to test whether the data is TS or DS. To do this - we need to carry out unit root testing.

We can always load the relevant libraries to avoid the use of :: notation.


(3) Unit root testing

In [50]:
library(forecast)
library(dynlm)
library(urca)
Loading required package: zoo

Attaching package: 'zoo'

The following objects are masked from 'package:base':

    as.Date, as.Date.numeric

(3.1) Manual Unit root testing

Unit root tests consist of:

  • Selecting the lag order of the autoregressive model $\Delta Y_t = \alpha + \delta t + \rho Y_{t-1} + \sum_{j = 1}^p \gamma_j \Delta Y_{t-1} + \epsilon_t$
  • Determine whether the trend coefficient, $\widehat{\delta}$, is significant;
  • Dependin on the final model, asses whether $\widehat{\rho}$ is significant.

Lag Order selection

We will examine three cases:

  • Starting with $p_\max = 4$;
  • Starting with $p_\max = 5$;
  • Starting with $p_\max = \biggr\lfloor 12 \cdot \left( \dfrac{T}{100} \right)^{1/4} \biggr\rfloor$;

Starting with $p_\max = 4$

In [51]:
mod.1 <- dynlm(d(lnyse) ~ L(lnyse) + L(d(lnyse), 1:4) + time(lnyse))
summary(mod.1)
Time series regression with "ts" data:
Start = 1952(6), End = 1995(13)

Call:
dynlm(formula = d(lnyse) ~ L(lnyse) + L(d(lnyse), 1:4) + time(lnyse))

Residuals:
      Min        1Q    Median        3Q       Max 
-0.220902 -0.024758  0.001637  0.026627  0.152618 

Coefficients:
                    Estimate Std. Error t value Pr(>|t|)  
(Intercept)       -2.3687707  1.1485800  -2.062   0.0397 *
L(lnyse)          -0.0172567  0.0083590  -2.064   0.0395 *
L(d(lnyse), 1:4)1  0.0544182  0.0439376   1.239   0.2161  
L(d(lnyse), 1:4)2 -0.0282788  0.0439795  -0.643   0.5205  
L(d(lnyse), 1:4)3  0.0150921  0.0439341   0.344   0.7313  
L(d(lnyse), 1:4)4  0.0364522  0.0439285   0.830   0.4070  
time(lnyse)        0.0012584  0.0006077   2.071   0.0389 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.04061 on 517 degrees of freedom
Multiple R-squared:  0.01244,	Adjusted R-squared:  0.0009837 
F-statistic: 1.086 on 6 and 517 DF,  p-value: 0.3696

The last lag coefficient, L(d(lnyse), 1:4)4 is insignificant since its $p-value = 0.4070 > 0.05`. We will reduce the maximum lag and continue the procedure.

In [52]:
mod.1 <- dynlm(d(lnyse) ~ L(lnyse) + L(d(lnyse), 1:3) + time(lnyse))
print(round(coef(summary(mod.1)), 4))
                  Estimate Std. Error t value Pr(>|t|)
(Intercept)        -2.2563     1.1441 -1.9721   0.0491
L(lnyse)           -0.0162     0.0083 -1.9420   0.0527
L(d(lnyse), 1:3)1   0.0514     0.0439  1.1689   0.2430
L(d(lnyse), 1:3)2  -0.0274     0.0439 -0.6227   0.5338
L(d(lnyse), 1:3)3   0.0151     0.0439  0.3438   0.7311
time(lnyse)         0.0012     0.0006  1.9789   0.0484

p-value of L(d(lnyse), 1:3)3 is > 0.05.

In [53]:
mod.1 <- dynlm(d(lnyse) ~ L(lnyse) + L(d(lnyse), 1:2) + time(lnyse))
print(round(coef(summary(mod.1)), 4))
                  Estimate Std. Error t value Pr(>|t|)
(Intercept)        -2.2314     1.1372 -1.9621   0.0503
L(lnyse)           -0.0161     0.0083 -1.9483   0.0519
L(d(lnyse), 1:2)1   0.0493     0.0438  1.1244   0.2614
L(d(lnyse), 1:2)2  -0.0263     0.0439 -0.6004   0.5485
time(lnyse)         0.0012     0.0006  1.9698   0.0494

p-value of L(d(lnyse), 1:2)2 is > 0.05.

In [54]:
mod.1 <- dynlm(d(lnyse) ~ L(lnyse) + L(d(lnyse), 1) + time(lnyse))
print(round(coef(summary(mod.1)), 4))
               Estimate Std. Error t value Pr(>|t|)
(Intercept)     -2.2880     1.1308 -2.0233   0.0435
L(lnyse)        -0.0164     0.0082 -1.9956   0.0465
L(d(lnyse), 1)   0.0480     0.0438  1.0956   0.2737
time(lnyse)      0.0012     0.0006  2.0302   0.0428

p-value of L(d(lnyse), 1) is > 0.05.

In [55]:
mod.1 <- dynlm(d(lnyse) ~ L(lnyse) + time(lnyse))
print(round(coef(summary(mod.1)), 4))
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  -2.1695     1.1246 -1.9292   0.0542
L(lnyse)     -0.0155     0.0082 -1.8995   0.0580
time(lnyse)   0.0012     0.0006  1.9362   0.0534

The final model contains no lags of $\Delta Y_{t-j}$ - so there is no autoregressive coefficient for the difference model.


Starting with $p_\max = 5$

We will repeat the procedure as before.

In [56]:
mod.2 <- dynlm(d(lnyse) ~ L(lnyse) + L(d(lnyse), 1:5) + time(lnyse))
print(round(coef(summary(mod.2)), 4))
                  Estimate Std. Error t value Pr(>|t|)
(Intercept)        -2.6325     1.1505 -2.2882   0.0225
L(lnyse)           -0.0192     0.0084 -2.2920   0.0223
L(d(lnyse), 1:5)1   0.0540     0.0439  1.2319   0.2185
L(d(lnyse), 1:5)2  -0.0286     0.0439 -0.6523   0.5145
L(d(lnyse), 1:5)3   0.0208     0.0439  0.4738   0.6359
L(d(lnyse), 1:5)4   0.0329     0.0438  0.7516   0.4527
L(d(lnyse), 1:5)5   0.1030     0.0438  2.3508   0.0191
time(lnyse)         0.0014     0.0006  2.2962   0.0221

L(d(lnyse), 1:5)5 has p-value < 0.05, so the final lag order is 5.


Starting with $p_\max = \biggr\lfloor 12 \cdot \left( \dfrac{T}{100} \right)^{1/4} \biggr\rfloor$

The maximum lag via the rule-of-thumb is:

In [57]:
floor(12 * (length(lnyse) / 100)^(1/4))
18
In [58]:
mod.3 <- dynlm(d(lnyse) ~ L(lnyse) + L(d(lnyse), 1:18) + time(lnyse))
print(round(coef(summary(mod.3)), 4))
                    Estimate Std. Error t value Pr(>|t|)
(Intercept)          -2.0342     1.2305 -1.6532   0.0989
L(lnyse)             -0.0152     0.0091 -1.6706   0.0954
L(d(lnyse), 1:18)1    0.0509     0.0453  1.1234   0.2618
L(d(lnyse), 1:18)2   -0.0321     0.0454 -0.7058   0.4806
L(d(lnyse), 1:18)3    0.0307     0.0453  0.6776   0.4984
L(d(lnyse), 1:18)4    0.0279     0.0454  0.6162   0.5381
L(d(lnyse), 1:18)5    0.1048     0.0451  2.3241   0.0205
L(d(lnyse), 1:18)6   -0.0660     0.0453 -1.4561   0.1460
L(d(lnyse), 1:18)7   -0.0292     0.0454 -0.6423   0.5210
L(d(lnyse), 1:18)8   -0.0719     0.0454 -1.5853   0.1135
L(d(lnyse), 1:18)9    0.0048     0.0455  0.1060   0.9156
L(d(lnyse), 1:18)10  -0.0205     0.0455 -0.4518   0.6516
L(d(lnyse), 1:18)11   0.0159     0.0453  0.3505   0.7261
L(d(lnyse), 1:18)12   0.0322     0.0453  0.7105   0.4777
L(d(lnyse), 1:18)13   0.0164     0.0451  0.3640   0.7160
L(d(lnyse), 1:18)14  -0.1022     0.0450 -2.2723   0.0235
L(d(lnyse), 1:18)15   0.0043     0.0452  0.0948   0.9245
L(d(lnyse), 1:18)16  -0.0426     0.0451 -0.9442   0.3455
L(d(lnyse), 1:18)17   0.0341     0.0451  0.7574   0.4492
L(d(lnyse), 1:18)18  -0.0507     0.0451 -1.1236   0.2618
time(lnyse)           0.0011     0.0007  1.6627   0.0970

L(d(lnyse), 1:18)18 is insignificant. We will continue reducing the lag order untill the last lag is significant. We omit the intermediate model output with insignificant $p_\max$ coefficients. The final model

In [59]:
mod.3 <- dynlm(d(lnyse) ~ L(lnyse) + L(d(lnyse), 1:14) + time(lnyse))
print(round(coef(summary(mod.3)), 4))
                    Estimate Std. Error t value Pr(>|t|)
(Intercept)          -2.0920     1.2078 -1.7320   0.0839
L(lnyse)             -0.0151     0.0089 -1.7043   0.0890
L(d(lnyse), 1:14)1    0.0510     0.0448  1.1383   0.2555
L(d(lnyse), 1:14)2   -0.0233     0.0449 -0.5198   0.6034
L(d(lnyse), 1:14)3    0.0283     0.0449  0.6307   0.5285
L(d(lnyse), 1:14)4    0.0301     0.0448  0.6722   0.5018
L(d(lnyse), 1:14)5    0.1041     0.0448  2.3227   0.0206
L(d(lnyse), 1:14)6   -0.0664     0.0451 -1.4723   0.1416
L(d(lnyse), 1:14)7   -0.0299     0.0450 -0.6637   0.5072
L(d(lnyse), 1:14)8   -0.0658     0.0450 -1.4631   0.1441
L(d(lnyse), 1:14)9    0.0047     0.0449  0.1057   0.9159
L(d(lnyse), 1:14)10  -0.0163     0.0447 -0.3638   0.7162
L(d(lnyse), 1:14)11   0.0120     0.0447  0.2690   0.7880
L(d(lnyse), 1:14)12   0.0389     0.0446  0.8727   0.3832
L(d(lnyse), 1:14)13   0.0100     0.0446  0.2247   0.8223
L(d(lnyse), 1:14)14  -0.0972     0.0446 -2.1787   0.0298
time(lnyse)           0.0011     0.0006  1.7390   0.0826

has a $p_\max = 14$ lag order.


If we compare the $\rm AIC$ and $\rm BIC$ values of the three models:

In [60]:
a <- data.frame('mdl_1 (p = 0)' = c(AIC(mod.1), BIC(mod.1)), 
                'mdl_2 (p = 5)' = c(AIC(mod.2), BIC(mod.2)), 
                'mdl_3 (p = 14)' = c(AIC(mod.3), BIC(mod.3)),
                check.names = FALSE)
rownames(a) <- c("AIC", "BIC")
a
mdl_1 (p = 0)mdl_2 (p = 5)mdl_3 (p = 14)
AIC-1881.349-1860.785-1813.996
BIC-1864.273-1822.449-1737.636

The $\rm AIC$ and $\rm BIC$ values are smaller for the model, where we started with an arbitraty value of $p = 4$. Having determined the lag order, we asses the trend significance.

Note: we will examine the final models for each case. In reality we would need to select an arbitrary lag outselves, if we opt to manually test for a unit root.

Determining the trend significance

For the 1st model:

In [61]:
print(round(coef(summary(mod.1)), 4))
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  -2.1695     1.1246 -1.9292   0.0542
L(lnyse)     -0.0155     0.0082 -1.8995   0.0580
time(lnyse)   0.0012     0.0006  1.9362   0.0534

The trend is insignificant, so we remove it.

In [62]:
mod.1 <- dynlm(d(lnyse) ~ L(lnyse))
print(round(coef(summary(mod.1)), 4))
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   0.0077     0.0122  0.6352   0.5256
L(lnyse)     -0.0001     0.0019 -0.0720   0.9426

For the 2nd model:

In [63]:
mod.2 <- dynlm(d(lnyse) ~ L(lnyse) + L(d(lnyse), 1:5) + time(lnyse))
print(round(coef(summary(mod.2)), 4))
                  Estimate Std. Error t value Pr(>|t|)
(Intercept)        -2.6325     1.1505 -2.2882   0.0225
L(lnyse)           -0.0192     0.0084 -2.2920   0.0223
L(d(lnyse), 1:5)1   0.0540     0.0439  1.2319   0.2185
L(d(lnyse), 1:5)2  -0.0286     0.0439 -0.6523   0.5145
L(d(lnyse), 1:5)3   0.0208     0.0439  0.4738   0.6359
L(d(lnyse), 1:5)4   0.0329     0.0438  0.7516   0.4527
L(d(lnyse), 1:5)5   0.1030     0.0438  2.3508   0.0191
time(lnyse)         0.0014     0.0006  2.2962   0.0221

The trend is significant, so we leave it.


For the 3rd model:

In [64]:
print(round(coef(summary(mod.3)), 4))
                    Estimate Std. Error t value Pr(>|t|)
(Intercept)          -2.0920     1.2078 -1.7320   0.0839
L(lnyse)             -0.0151     0.0089 -1.7043   0.0890
L(d(lnyse), 1:14)1    0.0510     0.0448  1.1383   0.2555
L(d(lnyse), 1:14)2   -0.0233     0.0449 -0.5198   0.6034
L(d(lnyse), 1:14)3    0.0283     0.0449  0.6307   0.5285
L(d(lnyse), 1:14)4    0.0301     0.0448  0.6722   0.5018
L(d(lnyse), 1:14)5    0.1041     0.0448  2.3227   0.0206
L(d(lnyse), 1:14)6   -0.0664     0.0451 -1.4723   0.1416
L(d(lnyse), 1:14)7   -0.0299     0.0450 -0.6637   0.5072
L(d(lnyse), 1:14)8   -0.0658     0.0450 -1.4631   0.1441
L(d(lnyse), 1:14)9    0.0047     0.0449  0.1057   0.9159
L(d(lnyse), 1:14)10  -0.0163     0.0447 -0.3638   0.7162
L(d(lnyse), 1:14)11   0.0120     0.0447  0.2690   0.7880
L(d(lnyse), 1:14)12   0.0389     0.0446  0.8727   0.3832
L(d(lnyse), 1:14)13   0.0100     0.0446  0.2247   0.8223
L(d(lnyse), 1:14)14  -0.0972     0.0446 -2.1787   0.0298
time(lnyse)           0.0011     0.0006  1.7390   0.0826

The trend is insignificant, so we remove it

In [65]:
mod.3 <- dynlm(d(lnyse) ~ L(lnyse) + L(d(lnyse), 1:14))
print(round(coef(summary(mod.3)), 4))
                    Estimate Std. Error t value Pr(>|t|)
(Intercept)           0.0083     0.0129  0.6479   0.5173
L(lnyse)             -0.0001     0.0020 -0.0471   0.9624
L(d(lnyse), 1:14)1    0.0427     0.0446  0.9574   0.3388
L(d(lnyse), 1:14)2   -0.0317     0.0447 -0.7094   0.4784
L(d(lnyse), 1:14)3    0.0198     0.0447  0.4433   0.6577
L(d(lnyse), 1:14)4    0.0217     0.0447  0.4859   0.6273
L(d(lnyse), 1:14)5    0.0957     0.0447  2.1419   0.0327
L(d(lnyse), 1:14)6   -0.0753     0.0449 -1.6792   0.0937
L(d(lnyse), 1:14)7   -0.0379     0.0449 -0.8456   0.3982
L(d(lnyse), 1:14)8   -0.0733     0.0449 -1.6329   0.1031
L(d(lnyse), 1:14)9   -0.0017     0.0449 -0.0374   0.9701
L(d(lnyse), 1:14)10  -0.0232     0.0446 -0.5202   0.6031
L(d(lnyse), 1:14)11   0.0051     0.0446  0.1139   0.9093
L(d(lnyse), 1:14)12   0.0320     0.0445  0.7184   0.4729
L(d(lnyse), 1:14)13   0.0031     0.0445  0.0688   0.9452
L(d(lnyse), 1:14)14  -0.1046     0.0445 -2.3500   0.0192

So, we have that:

  • The 1st model does not contain a trend.
  • The 2nd model does contain a trend;
  • The 3rd model does not contain a trend;

Carrying out the unit root test

The $\rm ADF$ test tests the null hypothesis: $$ H_0: \rho = 0 \text{ (i.e. the series contains a unit root)} $$

Depending on the trend, the critical values are:

  • The 1st model does not contain a trend - we need to compare the t-statistic of $\widehat{\rho}$ with -2.89;
  • The 2nd model does contain a trend - we need to compare the t-statistic of $\widehat{\rho}$ with -3.45;
  • The 3rd model does not contain a trend - we need to compare the t-statistic of $\widehat{\rho}$ with -2.89.

Looking at the 1st model:

In [66]:
print(round(coef(summary(mod.1)), 4))
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   0.0077     0.0122  0.6352   0.5256
L(lnyse)     -0.0001     0.0019 -0.0720   0.9426

The coefficient $\widehat{\rho}$ t-value is L(lnyse) = -0.0720 > -2.89. So we have no grounds to reject the null hypothesis and conclude that the series contains a unit root.



Since we have determined that L(lnyse) is insignificant, we could remove it from the model, if we wanted to consider this model as a contender for the best model for this series:

In [67]:
mod.1 <- dynlm(d(lnyse) ~ 1)
print(round(coef(summary(mod.1)), 4))
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   0.0069     0.0018  3.8836    1e-04

Then the estimated for $\Delta Y_t$ is: $\Delta Y_t = 0.0069 + \epsilon_t$, which we can write it as a model for $Y_t$: $Y_t = 0.0069 + Y_{t-1} + \epsilon_t$, which is an $\rm AR(1)$ model for $Y_t$.



Looking at the 2nd model:

In [68]:
print(round(coef(summary(mod.2)), 4))
                  Estimate Std. Error t value Pr(>|t|)
(Intercept)        -2.6325     1.1505 -2.2882   0.0225
L(lnyse)           -0.0192     0.0084 -2.2920   0.0223
L(d(lnyse), 1:5)1   0.0540     0.0439  1.2319   0.2185
L(d(lnyse), 1:5)2  -0.0286     0.0439 -0.6523   0.5145
L(d(lnyse), 1:5)3   0.0208     0.0439  0.4738   0.6359
L(d(lnyse), 1:5)4   0.0329     0.0438  0.7516   0.4527
L(d(lnyse), 1:5)5   0.1030     0.0438  2.3508   0.0191
time(lnyse)         0.0014     0.0006  2.2962   0.0221

The coefficient $\widehat{\rho}$ t-value is L(lnyse) = -2.2920 > -3.45. So we have no grounds to reject the null hypothesis and conclude that the series contains a unit root.



Since we have determined that L(lnyse) is insignificant, we could remove it from the model, if we wanted to consider this model as a contender for the best model for this series:

In [69]:
mod.2 <- dynlm(d(lnyse) ~ L(d(lnyse), 1:5) + time(lnyse))
print(round(coef(summary(mod.2)), 5))
                  Estimate Std. Error  t value Pr(>|t|)
(Intercept)       -0.07357    0.27887 -0.26382  0.79203
L(d(lnyse), 1:5)1  0.04398    0.04382  1.00357  0.31606
L(d(lnyse), 1:5)2 -0.03932    0.04380 -0.89786  0.36968
L(d(lnyse), 1:5)3  0.01035    0.04380  0.23620  0.81337
L(d(lnyse), 1:5)4  0.02288    0.04376  0.52299  0.60121
L(d(lnyse), 1:5)5  0.09225    0.04373  2.10951  0.03538
time(lnyse)        0.00004    0.00014  0.28548  0.77539

Since we needed the trend for unit root testing - having removed the insignificant $\rho$ coefficient, we could try to remove the insignificant trend and improve our model. But we will assume that this model is acceptible as is.

Then the estimated model for $\Delta Y_t$ is:

$\Delta Y_t = -0.07357 + 0.00004 \cdot t + 0.04398 \Delta Y_{t-1} -0.03932 \Delta Y_{t-2} + 0.01035 \Delta Y_{t-3} + 0.02288 \Delta Y_{t-4} + 0.09225 \Delta Y_{t-5} + \epsilon_t$,

which we can re-write as a model for $Y_t$.

We use the property $\gamma_j = -[\phi_{j+1} + ... + \phi_p]$, $j = 1, ..., p - 1$ and the expression of $\rho$ to get the coefficients:

$ \begin{aligned} \gamma_5 &= - \phi_6 &&\Longrightarrow \phi_6 = - \gamma_5 = -0.09225\\ \gamma_4 &= - [\phi_5 + \phi_6] &&\Longrightarrow \phi_5 = - \gamma_4 - \phi_6 = -0.02288+0.09225 = 0.06937\\ \gamma_3 &= - [\phi_4 + \phi_5 + \phi_6] &&\Longrightarrow \phi_4 = - \gamma_3 - [\phi_5 + \phi_6] = - \gamma_3 + \gamma_4 = -0.01035+0.02288 = 0.01253\\ \gamma_2 &= - [\phi_3 + \phi_4 + \phi_5 + \phi_6] &&\Longrightarrow \phi_3 = - \gamma_2 + \gamma_3 = 0.03932 + 0.01035 = 0.04967\\ \gamma_1 &= - [\phi_2 + \phi_3 + \phi_4 + \phi_5 + \phi_6] &&\Longrightarrow \phi_2 = - \gamma_1 + \gamma_2 = -0.04398 -0.03932 = -0.0833\\ \rho &= \phi_1 + \phi_2 + \phi_3 + \phi_4 + \phi_5 + \phi_6 - 1,\text{ since } \rho = 0 &&\Longrightarrow \phi_1 = 1 - (\phi_2 + \phi_3 + \phi_4 + \phi_5 + \phi_6) = 1 + \gamma_1= 1 + 0.04398 = 1.04398 \end{aligned} $


So, the equivalent model for $Y_t$ is:

$Y_t = -0.07357 + 0.00004 \cdot t + 1.04398 Y_{t-1} -0.0833 Y_{t-2} + 0.04967 Y_{t-3} + 0.01253 Y_{t-4} + 0.06937 Y_{t-5} -0.09225 Y_{t-6} + \epsilon_t$




[INFO: START]

To verify that this is indeed the correct transformation, we can try to simulate a few observations with the same residuals:

In [70]:
set.seed(123)
N <- 200
eps <- rnorm(mean = 0, sd = 1, n = N)
tt  <- 1:length(eps)

First, we will simulate $Y_t$:

In [71]:
Y <- NULL
Y[1] <- -0.07357 + 0.00004 * tt[1] + eps[1]
Y[2] <- -0.07357 + 0.00004 * tt[2] + 1.04398 * Y[1] + eps[2]
Y[3] <- -0.07357 + 0.00004 * tt[3] + 1.04398 * Y[2] −0.0833 * Y[1] + eps[3]
Y[4] <- -0.07357 + 0.00004 * tt[4] + 1.04398 * Y[3] −0.0833 * Y[2] + 0.04967 * Y[1] + eps[4]
Y[5] <- -0.07357 + 0.00004 * tt[5] + 1.04398 * Y[4] −0.0833 * Y[3] + 0.04967 * Y[2] + 0.01253 * Y[1] + eps[5]
Y[6] <- -0.07357 + 0.00004 * tt[6] + 1.04398 * Y[5] −0.0833 * Y[4] + 0.04967 * Y[3] + 0.01253 * Y[2] + 0.06937 * Y[1] + eps[5]
for(j in 7:N){
    Y[j] <- -0.07357  + 0.00004 * tt[j] + 1.04398 * Y[j-1] −0.0833 * Y[j-2] + 0.04967 * Y[j - 3] + 0.01253 * Y[j - 4] + 0.06937 * Y[j - 5] − 0.09225 * Y[j - 6] + eps[j]    
}

Secondly, we will simulate $\Delta Y_t$:

In [72]:
dY <- NULL
dY[1] <- -0.0735  + 0.00004 * tt[1] + eps[1]
dY[2] <- -0.07357 + 0.00004 * tt[2] + 0.04398 * dY[1] + eps[2]
dY[3] <- -0.07357 + 0.00004 * tt[3] + 0.04398 * dY[2] −0.03932 * dY[1] + eps[3]
dY[4] <- -0.07357 + 0.00004 * tt[4] + 0.04398 * dY[3] −0.03932 * dY[2] + 0.01035 * dY[1] + eps[4]
dY[5] <- -0.07357 + 0.00004 * tt[5] + 0.04398 * dY[4] −0.03932 * dY[3] + 0.01035 * dY[2] + 0.02288 * dY[1] + eps[5]
for(j in 6:N){
    dY[j] <- -0.07357 + 0.00004 * tt[j] + 0.04398 * dY[j-1] −0.03932 * dY[j-2] + 0.01035 * dY[j - 3] + 0.02288 * dY[j - 4] + 0.09225 * dY[j - 5] + eps[j]    
}

Finally, we will assume that the first 100, or so, observations of $Y_t$ are burn-in values (since the first couple of values assume that $Y_j = \epsilon_j = 0$, for $j \leq 0$). This means that we treat treat the first 101 $\Delta Y_t$ values as burn-in values as well and drop them:

In [73]:
Y <- Y[-c(1:100)]
In [74]:
dY <- dY[-c(1:101)]

We can compare $\Delta Y_t$ with $Y_t - Y_{t-1}$:

In [75]:
dt_compare <- data.frame(diff(Y), dY)
dt_compare$DIFFERENCE <- round(diff(Y) - dY, 10)
head(dt_compare)
diff.Y.dYDIFFERENCE
0.4191779 0.41917790
-0.1389262-0.13892620
-0.5043527-0.50435270
-1.1470432-1.14704320
-0.2097488-0.20974880
-0.7880470-0.78804700

We see that there is no difference between the simulate $Y_t - Y_{t-1}$ (diff.Y.), which was calculated from the simulated $Y_t$ series, and $\Delta Y_t$ (dY).

We can also visually compare the charts:

In [76]:
plot.ts(ts(diff(Y)), col = "red", lwd = 2)
lines(ts(dY), col = "blue", lty = 2, lwd = 2)

So, the two models are equivalent. For simulation and forecasting we may be more interested in $Y_t$ model, as it is the original series, which we are interested in.

[INFO: END]




The final, 3rd model

In [77]:
print(round(coef(summary(mod.3)), 4))
                    Estimate Std. Error t value Pr(>|t|)
(Intercept)           0.0083     0.0129  0.6479   0.5173
L(lnyse)             -0.0001     0.0020 -0.0471   0.9624
L(d(lnyse), 1:14)1    0.0427     0.0446  0.9574   0.3388
L(d(lnyse), 1:14)2   -0.0317     0.0447 -0.7094   0.4784
L(d(lnyse), 1:14)3    0.0198     0.0447  0.4433   0.6577
L(d(lnyse), 1:14)4    0.0217     0.0447  0.4859   0.6273
L(d(lnyse), 1:14)5    0.0957     0.0447  2.1419   0.0327
L(d(lnyse), 1:14)6   -0.0753     0.0449 -1.6792   0.0937
L(d(lnyse), 1:14)7   -0.0379     0.0449 -0.8456   0.3982
L(d(lnyse), 1:14)8   -0.0733     0.0449 -1.6329   0.1031
L(d(lnyse), 1:14)9   -0.0017     0.0449 -0.0374   0.9701
L(d(lnyse), 1:14)10  -0.0232     0.0446 -0.5202   0.6031
L(d(lnyse), 1:14)11   0.0051     0.0446  0.1139   0.9093
L(d(lnyse), 1:14)12   0.0320     0.0445  0.7184   0.4729
L(d(lnyse), 1:14)13   0.0031     0.0445  0.0688   0.9452
L(d(lnyse), 1:14)14  -0.1046     0.0445 -2.3500   0.0192

The coefficient $\widehat{\rho}$ t-value is L(lnyse) = -0.0471 > -2.89. So we have no grounds to reject the null hypothesis and conclude that the series contains a unit root.

We will not re-write this models equations as they can be determined by applying the same methods as before.


Overall, regardless of the lag order we see that the series has a unit root.

It is important to note, that removing the trend increased the t-statistic of $\widehat{\rho}$, so it is very possible that, if we incorrectly remove the trend - we may make an incorrect test conclusion.



(3.2) Automatic Unit root testing

We will carry out different tests from different packages. For completeness sake, we will provide the same test results from more than one package.

ADF test

The $\rm ADF$ test tests the null hypothesis: $H_0: \rho = 0 \text{ (i.e. the series contains a unit root)}$

We can set the maximum lag order, and choose the criterion for the order selection:

In [78]:
tst_adf_1 <- urca::ur.df(lnyse, type = "trend", lags = 18, selectlags = "BIC")
print(tst_adf_1)
############################################################### 
# Augmented Dickey-Fuller Test Unit Root / Cointegration Test # 
############################################################### 

The value of the test statistic is: -2.0965 6.0649 2.2091 

We can examine the model

In [79]:
tst_adf_1@testreg
Call:
lm(formula = z.diff ~ z.lag.1 + 1 + tt + z.diff.lag)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.219561 -0.024529  0.003247  0.026918  0.152312 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)  
(Intercept)  9.001e-02  4.010e-02   2.244   0.0252 *
z.lag.1     -1.755e-02  8.370e-03  -2.096   0.0365 *
tt           1.044e-04  5.041e-05   2.070   0.0390 *
z.diff.lag   5.043e-02  4.444e-02   1.135   0.2570  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.0409 on 506 degrees of freedom
Multiple R-squared:  0.01033,	Adjusted R-squared:  0.004462 
F-statistic:  1.76 on 3 and 506 DF,  p-value: 0.1538

The critical values can be extracted via @cval:

In [80]:
print(tst_adf_1@cval)
      1pct  5pct 10pct
tau3 -3.96 -3.41 -3.12
phi2  6.09  4.68  4.03
phi3  8.27  6.25  5.34

To carry out the unit root test, we examine the t-statistic of z.lag.1 - it is -2.096 (Note that this is the same as the first value in the output The value of the test statistic is: -2.0965 6.0649 2.2091). The relevant critival value is tau, which is -3.41 for the $5\%$ significance level.

Since The test statistic -2.096 > -3.45, we do not reject the null hypothesis of a unit root.


The second way to carry out and $\rm ADF$ test:

In [81]:
tst_adf_2 <- tseries::adf.test(lnyse, alternative = "stationary")

We can either manually specify the lag order, or let the function select the maximum lag order itself.

In [82]:
tst_adf_2
	Augmented Dickey-Fuller Test

data:  lnyse
Dickey-Fuller = -1.9019, Lag order = 8, p-value = 0.6198
alternative hypothesis: stationary

This output is a bit easier to read - the $p-value = 0.6198 > 0.05$, so we have no grounds to reject the null hypothesis of a unit root.


KPSS test

The $\rm KPSS$ test tests the null hypothesis: $H_0: \text{ the data is stationary}$

The test involves specifying and estimating the following equation with a random-walk component: $$ Y_t = r_t + \delta \cdot t + \epsilon_t,\quad r_t = r_{t-1} + u_t $$

The null hypothesis of stationarity is equivalent to the hypothesis: $H_0: \sigma^2_u = 0$, which would mean that $r_t = r_{t-1} = ... = r_0 = \rm const$ (i.e. a constant intercept).

We can set the number of lags to use the rule of thumb (lags = "long", which is equivalent to $\root 4 \of {12 \times (n/100)}$), or lags = "short", which is equivalent to $\root 4 \of {4 \times (n/100)}$. The test types specify as deterministic component either a constant mu or a constant with linear trend tau.

In [83]:
tst_kpss_1 <- urca::ur.kpss(lnyse, type = "tau", lags = "long")
tst_kpss_1
####################################### 
# KPSS Unit Root / Cointegration Test # 
####################################### 

The value of the test statistic is: 0.3909 

We see that the test statistic is 0.3909. The critical values are:

In [84]:
tst_kpss_1@cval
10pct5pct2.5pct1pct
critical values0.1190.1460.1760.216

Since 0.3909 > 0.146, we reject the null hypothesis of stationarity and conclude that the series is not stationary.


The second way to carry out the KPSS test (where lshort = FALSE indicates to use the rule-of-thumb larger lag selection):

In [85]:
tst_kpss_2 <- tseries::kpss.test(lnyse, null = "Trend", lshort = FALSE)
Warning message in tseries::kpss.test(lnyse, null = "Trend", lshort = FALSE):
"p-value smaller than printed p-value"
In [86]:
print(tst_kpss_2)
	KPSS Test for Trend Stationarity

data:  lnyse
KPSS Trend = 0.43099, Truncation lag parameter = 16, p-value = 0.01

Since p-value = 0.01 < 0.05, we reject the null hypothesis of stationarity and conclude that the series is not stationary.


PP test

The $\rm PP$ test tests builds on the ADF test and tests the null hypothesis: $H_0: \text{ (i.e. the series contains a unit root)}$.

We begin with the built-in function:

In [87]:
tst_pp_0 <- stats::PP.test(lnyse, lshort = FALSE)
In [88]:
tst_pp_0
	Phillips-Perron Unit Root Test

data:  lnyse
Dickey-Fuller = -1.9395, Truncation lag parameter = 18, p-value =
0.6039

p-value = 0.6039 > 0.05, so we have no grounds to reject the null hypothesis of a unit root.


The urca package also contains the $PP$ test:

In [89]:
tst_pp_1 <- urca::ur.pp(lnyse, model = "trend", type = "Z-tau", lags = "long")
tst_pp_1
################################################## 
# Phillips-Perron Unit Root / Cointegration Test # 
################################################## 

The value of the test statistic is: -1.9393 

Note: use type = "Z-tau", otherwise the critical values will be missing!.

If we are interested, we can examine the specified regression, which was used for this test:

In [90]:
tst_pp_1@testreg
Call:
lm(formula = y ~ y.l1 + trend)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.221927 -0.025066  0.002876  0.026943  0.147915 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 1.045e-01  5.145e-02   2.032   0.0427 *  
y.l1        9.845e-01  8.151e-03 120.792   <2e-16 ***
trend       9.600e-05  4.958e-05   1.936   0.0534 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.04055 on 525 degrees of freedom
Multiple R-squared:  0.9981,	Adjusted R-squared:  0.9981 
F-statistic: 1.38e+05 on 2 and 525 DF,  p-value: < 2.2e-16

Although the critical test statistic is calculated differently than the one in the $ADF$ test, the critical values for the PP test are the same as for the DF test.

The critical values are:

In [91]:
tst_pp_1@cval
1pct5pct10pct
critical values-3.97979 -3.420314-3.132507

Since the value of the test statistic -1.9393 is greater than the $5\%$ critical value -3.420314, we have no grounds to reject the null hypothesis of a unit root.


The tseries package also has the $PP$ test:

In [92]:
tst_pp_2 <- tseries::pp.test(lnyse, alternative = "stationary", type = "Z(t_alpha)", lshort = FALSE)
In [93]:
tst_pp_2
	Phillips-Perron Unit Root Test

data:  lnyse
Dickey-Fuller Z(t_alpha) = -1.9395, Truncation lag parameter = 18,
p-value = 0.6039
alternative hypothesis: stationary

The p-value = 0.6039 > 0.05, we have no grounds to reject the null hypothesis and conclude that the series has a unit root.


So, we can conclude that all tests indicate that the series has a unit root.


In [94]:
# Set plot options in Jupyter Lab
options(repr.plot.width = 10)
options(repr.plot.height = 4)

(4) Model Building

Since we have determined that the series has a unit root (and we see no indication of seasonality), we can take the differences of the series:

In [95]:
dlnyse <- diff(lnyse)

If we examine the series:

In [96]:
forecast::tsdisplay(dlnyse)

The series appear to be stationary. We will test this by repeating the unit root tests on the differenced data. For simplicity, we will use the test, which have the p-values, from the tseries package:

In [97]:
tseries::adf.test(dlnyse, alternative = "stationary")
Warning message in tseries::adf.test(dlnyse, alternative = "stationary"):
"p-value smaller than printed p-value"
	Augmented Dickey-Fuller Test

data:  dlnyse
Dickey-Fuller = -8.1328, Lag order = 8, p-value = 0.01
alternative hypothesis: stationary

$p-value = 0.01 < 0.05$, so we reject the null hypothesis of a unit root.

In [98]:
tseries::kpss.test(dlnyse, null = "Level", lshort = FALSE)
Warning message in tseries::kpss.test(dlnyse, null = "Level", lshort = FALSE):
"p-value greater than printed p-value"
	KPSS Test for Level Stationarity

data:  dlnyse
KPSS Level = 0.095985, Truncation lag parameter = 16, p-value = 0.1

$p-value = 0.1 > 0.05$, so we have no grounds to reject the null hypothesis of stationarity.

In [99]:
tseries::pp.test(dlnyse, alternative = "stationary", type = "Z(t_alpha)", lshort = FALSE)
Warning message in tseries::pp.test(dlnyse, alternative = "stationary", type = "Z(t_alpha)", :
"p-value smaller than printed p-value"
	Phillips-Perron Unit Root Test

data:  dlnyse
Dickey-Fuller Z(t_alpha) = -22.021, Truncation lag parameter = 18,
p-value = 0.01
alternative hypothesis: stationary

$p-value = 0.01 < 0.05$, so we reject the null hypothesis of a unit root.

In other words, the differences, $\Delta Y_t$, are stationary.


To make it interesting, we will specify three different models:

  • A model based on one selected from the $\rm ADF$ test;
  • A model based on the $\rm ACF$, $\rm PACF$ of $\Delta Y_t$;
  • An automatically specified model via auto.arima, without any restrictions.

- From the ADF test we select and re-estimate mod.2 model (this is a different parametrization than the one on dynlm)

Furthermore, we will only be able to estimate the drift component, since if $Y_t$ contais a constant $\alpha$ and trend $\delta \cdot t$, then $\Delta Y_t$ contains $\delta$.

In this specification $\alpha$ is not identifiable.

In [100]:
mdl_4_1 <- forecast::Arima(lnyse, order = c(5, 1, 0), xreg = data.frame(tt = 1:length(lnyse)))
In [101]:
print(mdl_4_1)
Series: lnyse 
Regression with ARIMA(5,1,0) errors 

Coefficients:
         ar1      ar2     ar3     ar4     ar5      tt
      0.0380  -0.0350  0.0089  0.0228  0.0923  0.0069
s.e.  0.0433   0.0434  0.0434  0.0434  0.0434  0.0020

sigma^2 estimated as 0.001645:  log likelihood=945.98
AIC=-1877.96   AICc=-1877.75   BIC=-1848.08

The residuals of this model:

In [102]:
forecast::tsdisplay(mdl_4_1$residuals)

Appear to be white noise.




[INFO: START]

We will now expand upon what a different parametrization entails for models specified dynlm versus the ones specified with Arima().

- The model, specified via forecast::Arima (or via stats::arima) is specified as a process with a linear trend, whose deviations from the trend are described as the AR(1) process:

For example, assume that we would want to specify a model as a stationary $\rm AR(1)$ model with a trend using forecast::Arima (or stats::arima) the specification is as follows:

$ \begin{cases} Y_t &= \alpha + \delta \cdot t + u_t,\\ u_t &= \phi u_{t-1} + \epsilon_t \end{cases} $

In [103]:
mod.a1 <- stats::arima(lnyse, order = c(1,0,0), xreg=time(lnyse))
mod.a1
Call:
stats::arima(x = lnyse, order = c(1, 0, 0), xreg = time(lnyse))

Coefficients:
         ar1  intercept  time(lnyse)
      0.9832  -144.2528       0.0763
s.e.  0.0076    12.4969       0.0063

sigma^2 estimated as 0.001632:  log likelihood = 945.18,  aic = -1882.37
In [104]:
mod.a2 <- forecast::Arima(lnyse, order = c(1,0,0), xreg=time(lnyse))
mod.a2
Series: lnyse 
Regression with ARIMA(1,0,0) errors 

Coefficients:
         ar1  intercept  time(lnyse)
      0.9832  -144.2528       0.0763
s.e.  0.0076    12.4969       0.0063

sigma^2 estimated as 0.001642:  log likelihood=945.18
AIC=-1882.37   AICc=-1882.29   BIC=-1865.28

The estimated model is:

$ \begin{cases} Y_t &= -144.2528 + 0.0763 \cdot t + u_t,\\ u_t &= 0.9832 u_{t-1} + \epsilon_t \end{cases} $

In this case, we would want to test whether the residuals exhibit a unit root by testing whether $\phi = 1$.

- The model, specified via dynlm::dynlm is specified as an autoregressive process with a linear trend:

From the (1) expression we would have that:

$ \begin{cases} u_t &= Y_t - \alpha - \delta \cdot t,\\ u_t &= \phi u_{t-1} + \epsilon_t \end{cases} \Longrightarrow Y_t - \alpha - \delta \cdot t = \phi \left(Y_{t-1} - \alpha - \delta \cdot (t-1) \right) + \epsilon_t $

which means that an alternative specification involves a reparametrization of the previous process as:

$ \begin{aligned} Y_t &= \tilde{\alpha} + \tilde{\delta} \cdot t + \phi Y_{t-1} + \epsilon_t = \left[ (1 - \phi) \alpha + \phi \delta\right] + \left[ (1 - \phi)\delta \right] \cdot t + \phi Y_{t-1} + \epsilon_t \end{aligned} $

which is as an autoregressive process with a linear trend.

This is the specification that dynlm::dynlm(...) uses:

In [105]:
mod.b1 <- dynlm::dynlm(lnyse ~ time(lnyse) + L(lnyse))
summary(mod.b1)
Time series regression with "ts" data:
Start = 1952(2), End = 1996(1)

Call:
dynlm::dynlm(formula = lnyse ~ time(lnyse) + L(lnyse))

Residuals:
      Min        1Q    Median        3Q       Max 
-0.221927 -0.025066  0.002876  0.026943  0.147915 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) -2.169524   1.124569  -1.929   0.0542 .  
time(lnyse)  0.001152   0.000595   1.936   0.0534 .  
L(lnyse)     0.984518   0.008150 120.792   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.04055 on 525 degrees of freedom
Multiple R-squared:  0.9981,	Adjusted R-squared:  0.9981 
F-statistic: 1.38e+05 on 2 and 525 DF,  p-value: < 2.2e-16

The estimated model is:

$ \begin{aligned} Y_t &= 2.169524 + 0.001152 \cdot t + 0.984518 Y_{t-1} + \epsilon_t \end{aligned} $

The models are equivalent, but due to the parametrization $\alpha \neq \tilde \alpha$ and $\delta \neq \tilde{\delta}$. The autoregressive coefficient $\widehat{\phi}$ also differ due to different parameter estimation methods (which again may further differ for different parametrization options). However they are fairly close: ar1 = 0.9832 versus L(lnyse) = 0.984518.

If we use the parametrization on the estimated coefficients from the first model, we would have that:

$ \begin{aligned} \tilde{\alpha} &= \left[ (1 - \phi) \alpha + \phi \delta\right]\\ \tilde{\delta} &=\left[ (1 - \phi)\delta \right] \end{aligned} $

In [106]:
# tilde{alpha}
(1 - 0.9832) * (-144.2528) + 0.9832 * 0.0763
-2.34842888000001
In [107]:
# tilde{delta}
(1 - 0.9832) * 0.0763
0.00128184

We see that they are fairly close to (Intercept) = -2.169524 and time(lnyse) = 0.001152 respectively.

[INFO: END]




- Selecting the model based on the ACF and PACF of the series:

In [108]:
par(mfrow = c(1, 2))
forecast::Acf(diff(lnyse))
forecast::Pacf(diff(lnyse))

We can test the lag significance via the $Ljung-Box$ test. The null hypothesis is: $$ \begin{aligned} H_0&: \rho(1) = ... = \rho(k) = 0\\ H_1&: \exists j: \rho(j) \neq 0 \end{aligned} $$ We will test, whether the first 20 lags have a significant autocorrelation with different values of $k$:

In [109]:
for(i in 1:20){
    aa <- Box.test(diff(lnyse), lag = i, type = "Ljung-Box")
    print(paste0("lag = ", i, " p-value = ", round(aa$p.value, 5)))
}
[1] "lag = 1 p-value = 0.36736"
[1] "lag = 2 p-value = 0.49395"
[1] "lag = 3 p-value = 0.70145"
[1] "lag = 4 p-value = 0.76131"
[1] "lag = 5 p-value = 0.25479"
[1] "lag = 6 p-value = 0.20216"
[1] "lag = 7 p-value = 0.20613"
[1] "lag = 8 p-value = 0.1685"
[1] "lag = 9 p-value = 0.2293"
[1] "lag = 10 p-value = 0.29176"
[1] "lag = 11 p-value = 0.36542"
[1] "lag = 12 p-value = 0.40789"
[1] "lag = 13 p-value = 0.48727"
[1] "lag = 14 p-value = 0.21937"
[1] "lag = 15 p-value = 0.27295"
[1] "lag = 16 p-value = 0.31138"
[1] "lag = 17 p-value = 0.34642"
[1] "lag = 18 p-value = 0.30703"
[1] "lag = 19 p-value = 0.17814"
[1] "lag = 20 p-value = 0.20485"

We see that in all cases the $p-value > 0.05$, so we have no grounds to reject the null hypothesis. This indicates that the differences, $\Delta Y_t$, are white noise. We can specify the model as:

In [110]:
mdl_4_2 <- forecast::Arima(lnyse, order = c(0, 1, 0), include.drift = TRUE)
In [111]:
print(mdl_4_2)
Series: lnyse 
ARIMA(0,1,0) with drift 

Coefficients:
       drift
      0.0069
s.e.  0.0018

sigma^2 estimated as 0.00165:  log likelihood=942.79
AIC=-1881.59   AICc=-1881.56   BIC=-1873.05

The residuals

In [112]:
forecast::tsdisplay(mdl_4_2$residuals)
In [113]:
for(i in (length(coef(mdl_4_2)) + 1):