Example: Individual equation estimation

Consider the following IS-LM model:

\[\begin{aligned} \begin{cases} R_t = \beta_{11} + \beta_{12} M_t + \beta_{13} Y_t + \beta_{14} M_{t-1} + u_{1t}\\ Y_t = \beta_{21} + \beta_{22} R_t + \beta_{23} I_t + u_{2t} \end{cases} \end{aligned} \]

Where:

  • \(R_t\) denotes the interest rates;
  • \(M_t\) denotes the money stock;
  • \(Y_t\) is GDP;
  • \(I_t\) is investment expenditure.

The example data are annual time series from 1969 to 1977 for the UK economy:

suppressPackageStartupMessages({
  library(readxl)
  require(systemfit)
  require(AER)
})
txt1 <- "http://uosis.mif.vu.lt/~rlapinskas/(data%20R&GRETL/"
txt2 <- "simult.xls"
tmp = tempfile(fileext = ".xls")
download.file(url = paste0(txt1, txt2), 
              destfile = tmp, mode = "wb")
sim.dt <- data.frame(read_excel(path = tmp))
data1  <- ts(sim.dt[, -1], frequency = 1, start = 1969)
data1 <- data.frame(data1, 
                    M.L1 = c(NA, data1[-nrow(data1), 2]))
  • In this equation system, \(R_t\) and \(Y_t\) are endogenous variables and \(M_t\), \(M_{t-1}\) and \(I_t\) are exogenous variables.
  • The first (LM) equation is exactly identified and the second (IS) one is overidentified (i.e. we have more available exogenous variables than coefficients to estimate)

Estimating via Single-Equation OLS

We could estimate the individual equations via OLS (for the sake of comparativeness with other estimations, we will exclude the first row with NA value):

R.ols <- lm(R ~ M + Y + M.L1, data = data1[-1, ])
Y.ols <- lm(Y ~ R + I, data = data1[-1, ])

round(summary(R.ols)$coefficients, 4)
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)  19.5581    10.1861  1.9201   0.0668
## M             0.0013     0.0018  0.6979   0.4920
## Y            -0.1031     0.2048 -0.5035   0.6192
## M.L1         -0.0015     0.0017 -0.8406   0.4089
round(summary(Y.ols)$coefficients, 4)
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)  14.9134     8.0532  1.8519   0.0759
## R            -0.1691     0.3165 -0.5342   0.5979
## I             0.0007     0.0001 11.8330   0.0000

However, in this case, we would ignore the fact that there are endogenous variables in the right hand side of our equations. Because the endogenous variables correlate with the residuals, our OLS estimates are no longer BLUE. As such, we need to account for this and carry out a two-stage least square procedure.

Estimating via 2SLS (Two-Stage Least Squares)

The 2SLS procedure is carried out as follows:

  1. Obtain the estimator(s) of the endogenous variables on right-hand side of the model by regressing them on the exogenous variables (all the exogenous variables that could correlate with the right-hand side endogenous variable).
  2. Replace the right-hand side endogenous variables with their estimated values from the previous step and estimate the equation via OLS.
R.2sls <- ivreg(R ~ M + Y + M.L1 | M + M.L1 + I, data = data1[-1, ])
Y.2sls <- ivreg(Y ~ R + I | M + M.L1 + I, data = data1[-1, ])

round(summary(R.2sls)$coefficients, 4)
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)  27.5275    11.1348  2.4722   0.0209
## M             0.0019     0.0019  1.0091   0.3230
## Y            -0.2647     0.2241 -1.1809   0.2492
## M.L1         -0.0017     0.0018 -0.9855   0.3342
round(summary(Y.2sls)$coefficients, 4)
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)  98.7996    68.7067  1.4380   0.1628
## R            -4.0430     3.1306 -1.2914   0.2084
## I             0.0002     0.0004  0.5329   0.5988

The coefficients of Y as well as M are insignificant, suggesting that the LM function is very flat. Overall, accounting for the endogeneity changed the coefficient values.

However, in this case the coefficients remained insignificant, which indicates that besides endogeneity, but other factors might also not be overlooked. One of which is the fact that the residuals (or, more intuitively, random [economic] shocks) are correlated between the equations. If that is the case, we need to estimate the whole equation system as a whole.

Equation system estimation

We will begin by estimating the whole equation system the same way as we did previously (as independent equations). The only difference is that we will estimate them both at once. (Note: systemfit cannot deal with NA values, so those rows need to be dropped)

  • The OLS and 2SLS methods (disturbances are uncorrelated, i.e. this is equivalent to single-equation estimation)
eq.R <- R ~ 1 + M + Y + M.L1
eq.Y <- Y ~ 1 + R + I
eq.sys <- list(R = eq.R, Y = eq.Y)
instr  <- ~ M + M.L1 + I

eq.sys.ols <- systemfit(eq.sys, method = "OLS", data = data1[-1, ])
eq.sys.2sls<- systemfit(eq.sys, method = "2SLS", inst = instr, data = data1[-1, ])

round(summary(eq.sys.ols)$coefficients, 4)
##               Estimate Std. Error t value Pr(>|t|)
## R_(Intercept)  19.5581    10.1861  1.9201   0.0668
## R_M             0.0013     0.0018  0.6979   0.4920
## R_Y            -0.1031     0.2048 -0.5035   0.6192
## R_M.L1         -0.0015     0.0017 -0.8406   0.4089
## Y_(Intercept)  14.9134     8.0532  1.8519   0.0759
## Y_R            -0.1691     0.3165 -0.5342   0.5979
## Y_I             0.0007     0.0001 11.8330   0.0000
round(summary(eq.sys.2sls)$coefficients, 4)
##               Estimate Std. Error t value Pr(>|t|)
## R_(Intercept)  27.5275    11.1348  2.4722   0.0209
## R_M             0.0019     0.0019  1.0091   0.3230
## R_Y            -0.2647     0.2241 -1.1809   0.2492
## R_M.L1         -0.0017     0.0018 -0.9855   0.3342
## Y_(Intercept)  98.7996    68.7067  1.4380   0.1628
## Y_R            -4.0430     3.1306 -1.2914   0.2084
## Y_I             0.0002     0.0004  0.5329   0.5988

The results are identical to the ones from lm and ivreg. The disadvantage is that if some data is missing and needs to be excluded from systemfit, the estimated coefficients may sometimes differ from the ones in lm and ivreg, since the missing values are not needed to be excluded from the dataset.

  • SUR (disturbances are correlated, however, endogeneity of right hand side predictors is not accounted for):
eq.sys.sur <- systemfit(eq.sys, method = "SUR", data = data1[-1, ])

round(summary(eq.sys.sur)$coefficients, 4)
##               Estimate Std. Error t value Pr(>|t|)
## R_(Intercept)  10.3175     9.8370  1.0488   0.3047
## R_M             0.0004     0.0017  0.2083   0.8368
## R_Y             0.0987     0.1968  0.5014   0.6206
## R_M.L1         -0.0010     0.0016 -0.6187   0.5419
## Y_(Intercept)  23.8657     7.8012  3.0592   0.0052
## Y_R            -0.5928     0.3021 -1.9623   0.0610
## Y_I             0.0007     0.0001 11.1904   0.0000

While bow we are accounting for the correlation of shocks, we are not accounting for endogenous predictors in the right hand side of the equations.

  • 3SLS (disturbances are correlated, however, instrumental variables are used to account for the endogeneity problem):
eq.sys.3sls <- systemfit(eq.sys, method = "3SLS", inst = instr, data = data1[-1, ])

round(summary(eq.sys.3sls)$coefficients, 4)
##               Estimate Std. Error t value Pr(>|t|)
## R_(Intercept)  18.7421     8.9367  2.0972   0.0467
## R_M            -0.0003     0.0009 -0.3496   0.7297
## R_Y            -0.0697     0.1689 -0.4130   0.6833
## R_M.L1          0.0001     0.0011  0.0572   0.9548
## Y_(Intercept)  98.7996    68.7067  1.4380   0.1628
## Y_R            -4.0430     3.1306 -1.2914   0.2084
## Y_I             0.0002     0.0004  0.5329   0.5988

For easier comparison, we can combine all the model results (remember that singe-equation estimation can be specified with systemfit as well):

cof <- cbind(coef(eq.sys.ols), coef(eq.sys.2sls), 
             coef(eq.sys.sur), coef(eq.sys.3sls))
colnames(cof)<-c("OLS","2SLS","SUR", "3SLS")
round(cof, 4)
##                   OLS    2SLS     SUR    3SLS
## R_(Intercept) 19.5581 27.5275 10.3175 18.7421
## R_M            0.0013  0.0019  0.0004 -0.0003
## R_Y           -0.1031 -0.2647  0.0987 -0.0697
## R_M.L1        -0.0015 -0.0017 -0.0010  0.0001
## Y_(Intercept) 14.9134 98.7996 23.8657 98.7996
## Y_R           -0.1691 -4.0430 -0.5928 -4.0430
## Y_I            0.0007  0.0002  0.0007  0.0002

We can see that, once we incorporate the fact that we have an equation system where the shocks could be correlated across equations, the estimated values do differ. If we also address the endogeneity problem, the coefficients change even more. Some of the most notable differences:

  • in \(R_t\) equation, \(M_t\) coefficient sign changes to negative, when estimating via 3SLS, which indicates that an increase in money supply lowers the interest rate (note that we are also including its lag, \(M_{t-1}\) into the equation, so, if we expect the immediate effect to be negative, then 3SLS gives us the expected results).
  • in \(Y_t\) equation, the negative effect of \(R_t\) is much larger once we account for the endogenous right-hand-side predictors and use instrumental variables.

We can also compare the standard errors of the coefficients:

coef.se <- cbind(summary(eq.sys.ols)$coefficients[, 2], summary(eq.sys.2sls)$coefficients[, 2], 
             summary(eq.sys.sur)$coefficients[, 2], summary(eq.sys.3sls)$coefficients[, 2])
colnames(coef.se)<-c("OLS","2SLS","SUR", "3SLS")
round(coef.se, 4)
##                   OLS    2SLS    SUR    3SLS
## R_(Intercept) 10.1861 11.1348 9.8370  8.9367
## R_M            0.0018  0.0019 0.0017  0.0009
## R_Y            0.2048  0.2241 0.1968  0.1689
## R_M.L1         0.0017  0.0018 0.0016  0.0011
## Y_(Intercept)  8.0532 68.7067 7.8012 68.7067
## Y_R            0.3165  3.1306 0.3021  3.1306
## Y_I            0.0001  0.0004 0.0001  0.0004

While for some predictors the standard errors decrease, there are some, like the \(R_t\) predictor in \(Y_t\) equation, which standard errors increase, although compared with the 2SLS, it is not significant.

We note that the second equation is overidentified - we are using 3+1=4 instrumental (exogenous predictors + constant) variables for the 2nd equation, which has only 3 parameters - we could try to decrease the number of instruments, or find instruments that have a higher correlation with \(R_t\).