We will illustrate some of the methods outlined during the lectures. Note that we will update the tasks or data as necessary for new methods introduced during upcoming lectures. In some cases, we may need to present an additional dataset to carry out some examples.

Generally, a number of example datasets can be found at Principles of Econometrics book website with their dataset definitions or from R datasets which are available in both R and Python.

Data Overview

Use the following datasets:

  1. food (data definition). Let’s say that we are interested in estimating how the expenditure on food, food_exp, depends on the income. The dataset can be loaded in both R and Python:
dt1 <- read.csv(file = "http://www.principlesofeconometrics.com/poe5/data/csv/food.csv", sep = ",", dec = ".", header = TRUE)
import pandas as pd
#
dt1 = pd.read_csv("http://www.principlesofeconometrics.com/poe5/data/csv/food.csv")
  1. nuclear. Let’s say that we are interested in estimating how the cost depends on the power capacity cap. The dataset can be loaded in both R and Python:
dt2 <- boot::nuclear
import statsmodels.api as sm
#
dt2 = sm.datasets.get_rdataset("nuclear", "boot")
#print(dt.__doc__) #documentation about the data
dt2 = dt2.data
  1. stockton5_small (data definition) contains data on houses sold in Stockton, California in 1996-1998. Assume that we are interested how is the sale price, sprice, affected by the house living area, livarea.
dt3 <- read.csv(file = "http://www.principlesofeconometrics.com/poe5/data/csv/stockton5_small.csv", sep = ",", dec = ".", header = TRUE)
dt3 = pd.read_csv("http://www.principlesofeconometrics.com/poe5/data/csv/stockton5_small.csv")
  1. cps5_small (data definition) contains data on hourly wage rates, education, etc. from the 2013 Current Population Survey. Suppose we are interested in examining how does education, educ, affect wage.
dt4 <- read.csv(file = "http://www.principlesofeconometrics.com/poe5/data/csv/cps5_small.csv", sep = ",", dec = ".", header = TRUE)
dt4 = pd.read_csv("http://www.principlesofeconometrics.com/poe5/data/csv/cps5_small.csv")
  1. tuna, (data definition) contains weekly data (we will ignore the time dimension for now) on the number of cans sold of brand \(\#1\) tuna (sal1). Consider examining how the ratio of brand \(\#1\) tuna prices, apr1, to brand \(\#3\) tuna prices, apr3, affects sal1 in thousands of units. Do the same with apr2.
dt5 <- read.csv(file = "http://www.principlesofeconometrics.com/poe5/data/csv/tuna.csv", sep = ",", dec = ".", header = TRUE)
dt5 = pd.read_csv("http://www.principlesofeconometrics.com/poe5/data/csv/tuna.csv")

Note: either dt3 or dt4 will be selected and analysed during lectures.




Tasks

Below are the tasks that you should carry out for the datasets:

(2018-10-25)

  1. Plot the scatter plot of the dependent variable \(Y\) and the independent variables \(X_1,...,X_k\). Which variables \(X_j\) visually appear to be related to \(Y\)? Are there any variables \(X_i\), \(X_j\) that seem to have a linear dependence between one another?
  2. Specify one regression in a mathematical formula notation based on economic theory. What coefficient sign do you expect \(\beta_1,\beta_2,...\) to have? Explain. Note: This is not necessarily the best regression - it is simply one you think makes economic sense.
  3. Estimate the regression via OLS. Are the signs of \(\beta_1,\beta_2,...\) the same as you expected?
  4. Test, which variables are statistically significant. Remove the insignificant variables (leave the initial estimated model as a separate variable).
  5. Write down the final (with significant variables) estimated regression formula.

(2018-11-15)

  1. Examine the residual plots. Test for normality, autocorrelation, heteroskedasticity. Do the residuals violate our (MR.3) - (MR.6) model assumptions?
  2. Add interaction variables to your model, provide an interpretation for what kind of sign you expect. Then, estimate the model and check if they are significant. If they are - re-examine the residuals.
  3. Are there any economic restrictions you could evaluate on the estimated model? If so, test them, otherwise, think of some arbitrary ones from the model output and test them.

(2018-11-22)

  1. If you do not reject the null hypothesis of your specified linear restrictions, try to re-estimate the model via RLS. What changes (if any) do you notice about your model coefficients and their significance?
  2. Using the model with OLS estimates, check if any variables are collinear in your model. If so, try to account for multicollinearity in some way.
  3. Use the residuals of your finalized model, with OLS estimates, and test them for autocorrelation and heteroskedasticity.
  4. If there is a presence of autocorrelation or heteroskedasticity in the residuals (of the model with OLS estimates), do the following (based on the test results):
    • use a consistent error variance estimator to re-estimate the standard errors;
    • specify the variance-covariance matrix form for the residuals and use a FGLS estimator to re-estimate the parameters.
  5. Compare the parameter estimates - if there are any differences between the FGLS and OLS estimates with consistent-errors - are they cause for concern?