Data

A number of equations are provided for data simulation.

In addition, there are various time-series data available, such as:

  • datasets - a package of dataset collection;
  • github, which are available in R’s fma package.

Data: Simulated

  • Generate the following process with linear trend and correlated errors: \[Y_t = -7 + 0.3 \cdot t + 5 e_t, \text{ where } e_t = 0.88 e_{t-1} - 0.53 e_{t-2} + w_t,\ t = 1,...,150\]
  • Generate a Random Walk with drift \(0.1\) (\(T = 150\)).

Data: Historical

Until 1982, when time series data and their analysis were published, most economists believed that all data time series were \(TS\) (i.e. after removing the trend, they became stationary). Nelson and Plosser proved that most of economics series were \(DS\) (i.e. their differences were stationary). Verify whether this is true on some sample datasets with real-world data.

  • Fourteen U.S. economic time series data from 1860 to 1970. See the documentation for the variable descriptions:

    • Take gnp.r and cpi
data(nporg, package = "urca")
  • UK data frame of quarterly data ranging from 1955:Q1 until 1984:Q4. The data is expressed in natural logarithms:

    • consl - The log of total real consumption in the U.K.
    • incl - The log of real disposable income in the U.K.
data(UKconinc, package = "urca")
  • UK data frame of quarterly data ranging from 1957:Q1 until 1975:Q4:

    • cons - Consumers non-durable expenditure in the U.K. in 1970 prices.
    • inc - Personal disposable income in the U.K. in 1970 prices.
    • price- Consumers expenditure deflator index, 1970 = 100.
data(UKconsumption, package = "urca")
  • Number of users logged on to an internet server each minute over a 100-minute period.
internet <- stats::ts(c(88, 84, 85, 85, 84, 85, 83, 85, 88, 89, 91, 99,
                        104, 112, 126, 138, 146, 151, 150, 148, 147, 149, 143, 132, 131,
                        139, 147, 150, 148, 145, 140, 134, 131, 131, 129, 126, 126, 132,
                        137, 140, 142, 150, 159, 167, 170, 171, 172, 172, 174, 175, 172,
                        172, 174, 174, 169, 165, 156, 142, 131, 121, 112, 104, 102, 99,
                        99, 95, 88, 84, 84, 87, 89, 88, 85, 86, 89, 91, 91, 94, 101, 110,
                        121, 135, 145, 149, 156, 165, 171, 175, 177, 182, 193, 204, 208,
                        210, 215, 222, 228, 226, 222, 220), s = 1, f = 1)
  • Price of chicken in US (constant dollars): 1924–1993.
chicken <- stats::ts(c(164.16, 169.17, 180.65, 168.30, 180.73, 192.55,
                       159.43, 150.11, 126.05, 106.08, 119.92, 157.06, 156.59, 161.21,
                       151.94, 137.47, 134.10, 153.25, 166.02, 203.24, 194.83, 208.18,
                       204.40, 171.61, 180.87, 154.12, 133.40, 139.22, 120.43, 119.53,
                       90.41, 100.48, 85.16, 70.41, 70.04, 54.59, 59.59, 48.84, 48.78,
                       47.25, 42.90, 40.80, 43.23, 34.23, 34.09, 38.27, 33.90, 27.48,
                       31.12, 49.16, 28.44, 26.60, 33.02, 29.34, 27.49, 27.67, 19.29,
                       17.65, 15.43, 18.43, 22.12, 19.88, 16.48, 14.00, 11.25, 17.38,
                       16.45, 15.69, 15.25, 14.64), s = 1924, f = 1)
  • Daily air quality measurements in New York, May to September 1973. Examine the Temp (temperature) variable.
airquality <- datasets::airquality

Note: It may very well be the case that some (or even all) of the data do not have unit roots. The idea is to carry out the unit root testing and model building procedures, as you would when working with any other empirical data.

Tasks

  1. Plot the series - do they appear stationary. Do they appear to exhibit exponential changes? If needed, transform the series. Continue working with the transformed data.
  2. Plot their \(\rm ACF\) and \(\rm PACF\) - does the series appear to be correlated?
  3. Carry out a unit root test three ways:

    • Manually (i.e. sequentially) by using dynlm to estimate the relevant models for unit root testing. Don’t forget to write down the null hypothesis for the unit root test.
    • Use the built-in functions to carry out ADF, KPSS and PP tests. Write down the null hypothesis;
  4. Depending on the results, transform the series to induce stationarity. And examine its \(\rm ACF\) and \(\rm PACF\) plots. Select the appropriate model either manually, or using auto.arima (remember the drawback of automated procedures - if needed restrict the maximum number of differences and seasonal/nonseasonal lag orders).

  5. Write down the model equation for \(\Delta Y_t\) and the equation for \(Y_t\) (Note: you are free to use either dynlm or Arima to specify your model equation as long as it is the one you used in the previous tasks. You can also use the auto.arima documentation on its authors website for a more general model formula using the autocorrelation parameter lag functions.)

  6. Calculate the \(10\)-step ahead forecasts for the original series.

  7. Carry out cross-validation for one-step-ahead forecasts by creating between 5 and 20 subsets by using the maximum possible \(k\) samples for your dataset:

    • Create \(k\) different samples: \((Y_1, ..., Y_{T-k-1})\), \((Y_1,..., Y_{T-k})\), …, \((Y_1, ..., Y_{t-1})\);
    • For each sample, re-estimate the model from (4) and calculate its one-step ahead forecast;
    • Calculate the error \(e_i\) between the true value and its one-step ahead forecast \(e_i = Y_i - \widehat{Y}_i\), where \(Y\) is the transformed series;
    • Save the model coefficient estimates.
    • Calculate the \(RMSE = \sqrt{\dfrac{1}{k} \sum_{i = 1}^k e_i^2}\) and compare with the \(\rm RMSE\) from the model in (4) - are they close? If they are - what does it say about your model?
    • Plot the model coefficient estimates - do they exhibit large changes as the sample size increases?