3.9 Chapter Summary

As we have learned, in order to successfully create a univariate regression on a data sample, we must do the following:

  1. Read in and plot the data: plot the histograms of the dependent variable \(Y\), and the independent variable \(X\). Plot the scatter plot of \(Y\) on \(X\). This would help identify the form of the relationship between these variables.
  2. Based on economic theory, specify a theoretical model (linear, log-linear, linear-log, log-log, quadratic, etc.) which you assume (but are not yet certain) would fit the data and make economic sense. Try to answer the following questions to help you: can \(Y\) take positive, negative or zero values? What could be the sign of the parameter of the independent variable \(X\) in the specified model?
  3. Estimate the model and examine the results - are the coefficients significant, do they have the expected signs from the previous step? Provide an interpretation of the parameter of the independent variable. If the coefficients are insignificant, or their signs do not make economic sense - try to re-specify the model, or re-examine, whether these different parameter signs make economic sense.
  4. Having specified the model with significant parameters with correct signs, examine the model goodness-of-fit to make sure that the model assumptions (UR.2)-(UR.4) hold:
    • Look at the \(R^2\), provide its interpretation;
    • Examine the residual histogram, Q-Q plots - do the residuals follow a normal distribution?
    • Examine the residual vs. fitted and residual vs. \(X\) scatter plots - are there any patterns in these plots? If there are - this means that the specified model does not account for some non-linear relationship (and as such we would see a relationship between the residuals and \(X\) - this would violate our (UR.2) assumption) - consider transforming \(Y\) and/or \(X\) (take note on what transformations can be applied to your data).
    • Examine the residual variance from the previously plotted scatter plots - is the variance constant, or is it increasing/decreasing? When dealing with heteroskedasticity, the resulting OLS estimated are not efficient and the test statistics are biased.
    • In addition to the plots, carry out some goodness-of-fit tests: Breusch–Pagan Test for homoskedasticity, Durbin-Watson Test for autocorrelation, Shapiro-Wilk Test for normality.
  5. Having the residuals of the specified model follow a normal distribution with a constant variance, independent of \(X\) and not serially correlated, we can move on to predicting the values and calculating the prediction intervals: we can do this either for the existing data, or for some additional values of \(X\), which we did not have. Make sure that the predictions (i.e. the forecasts) make economic sense - can the predicted values be very large, zero, or negative for your data?