# Dataset

suppressPackageStartupMessages({
library(plm)
})
data("Gasoline", package = "plm")
head(Gasoline)

The dataset contains 18 country observations from 1960 to 1978 on gasoline consumption (for a total of 342 observations). In the dataset:

• country - a factor with 18 levels;
• year - the year;
• lgaspcar - logarithm of motor gasoline consumption per car;
• lincomep - logarithm of real per-capita income;
• lrpmg - logarithm of real price of motor gasoline;
• lcarpcap - logarithm of the stock of cars per capita.

Visually inspect the lgaspcar data in each country. Is the (logarithm of) gas consumption per car similar throughout the countries or could there be some country-specific (fixed) effects, which could influence gas consumption? If so, what, in your opinion, could these (fixed) effects be?

Assume that you are tasked to create a model for the (log of) gasoline consumption using the data available in this dataset. Which exogenous variables would you include in your model and what would you expect their signs to be?

# Task 3: POLS (Pooled OLS)

Create a pooled ols model for lgaspcar using relevant exogenous variables. Then answer the following:

• Are the coefficients significant?
• Are the signs of the coefficients the same as you expected from Part 2?

Taking into account the results from Part 3 and the overview of the data from Part 1:

• Specify a Fixed Effects model and estimate it. Are the exogenous predictor coefficients different from those in Part 3?
• Test whether the country-specific fixed effects are statistically significant. Which model would you choose - FE or POLS?

Let’s say that we believe that the variation across entities (people, cities, etc.) is assumed to be random and uncorrelated with the predictor (i.e. independent) variables included in the model. Estimate a Random Effects model and compare the predictor coefficients with the ones from POLS and FE.

# Task 6: FE vs RE

We would prefer the RE estimator if we can be sure that the individual-specific effect really is an unrelated effect (see slide 8). Test whether the RE estimator is consistent compared to the FE estimator. Based on the test result and the result from Task 4 - which one model would you choose - POLS, FE, or RE ?

# Task 7: Plotting the fitted model

Plot the fitted data alongside your actual data for:

• The best model (based on your conclusion in Task 6);
• One of the two (or both) remaining models of your choosing;

Visually inspect the data - does the best model fit the countries equally well?

Calculate the mean squared error for each country separately - which country has the largest MSE, and does it align with you conclusions from the plots?

# Optional Task: Forecasting when we do not have exogeneous variable forecasts

Note that in order to forecast lgaspcar your specified models require having forecasts of exogeneous variables, which we do not usually have. We can think of two quickest (but not necessarily the best) ways to do remedy this:

(NOTE: you can take 80% of the dataset and re-fit your previous panel data model. Then you can compare the exogeneous variable forecasts as well as the panel data model forecasts with the actual values.)

• For each country and each variable - use auto.arima to fit the model for each exogeneous variables (some, or all of lincomep, lrpmg, lcarpcap), which you included in your model. Forecast each model $$h = 5$$ periods ahead.

• For each country, fit a VAR (or VECM) model on the included exogeneous variable vector (some, or all of lincomep, lrpmg, lcarpcap)

Once we obtain the forecasts for the exogeneous variables, we can move on to forecast our variable of interest:

• Use the forecasted exogeneous variables (either one, or both forecasting methods) to estimate a forecast of lgaspcar in your panel data model.

• Examine the forecasts - would you consider them adequate (take note the historical increase/decrease in the data and compare whether the forecasts make sense).

Something to think about: you panel data model forecasts will depend not only on the accuracy of the panel data model, but also on the accuracy of the exogeneous variable models.