Example: binary classification

We will use the following libraries in some capacity:

We will use the MROZ dataset contains cross-sectional labor force participation data. As an alternative to directly downloading it online, we will download the file and save it as a .csv file, then place it in the same directory as our code. We are doing this to reduce the load on the website that this dataset is hosted, as well as to avoid periods, when the website is in maintanance mode and the files are temporary unavailable.

dt <- foreign::read.dta("http://fmwww.bc.edu/ec-p/data/wooldridge/mroz.dta")
dt <- dt %>% as.data.table()
write.csv(dt, "./mroz.csv", row.names = FALSE)
dt <- read.csv(file = "./mroz.csv", header = TRUE) %>% as.data.table()
dt %>% head(10) %>% print()
    inlf hours kidslt6 kidsge6 age educ   wage repwage hushrs husage huseduc huswage faminc    mtr motheduc fatheduc unem city exper  nwifeinc     lwage expersq
 1:    1  1610       1       0  32   12 3.3540    2.65   2708     34      12  4.0288  16310 0.7215       12        7  5.0    0    14 10.910060 1.2101541     196
 2:    1  1656       0       2  30   12 1.3889    2.65   2310     30       9  8.4416  21800 0.6615        7        7 11.0    1     5 19.499981 0.3285121      25
 3:    1  1980       1       3  35   12 4.5455    4.04   3072     40      12  3.5807  21040 0.6915       12        7  5.0    0    15 12.039910 1.5141380     225
 4:    1   456       0       3  34   12 1.0965    3.25   1920     53      10  3.5417   7300 0.7815        7        7  5.0    0     6  6.799996 0.0921233      36
 5:    1  1568       1       2  31   14 4.5918    3.60   2000     32      12 10.0000  27300 0.6215       12       14  9.5    1     7 20.100060 1.5242720      49
 6:    1  2032       0       0  54   12 4.7421    4.70   1040     57      11  6.7106  19495 0.6915       14        7  7.5    1    33  9.859054 1.5564801    1089
 7:    1  1440       0       2  37   16 8.3333    5.95   2670     37      12  3.4277  21152 0.6915       14        7  5.0    0    11  9.152048 2.1202600     121
 8:    1  1020       0       0  54   12 7.8431    9.98   4120     53       8  2.5485  18900 0.6915        3        3  5.0    0    35 10.900040 2.0596340    1225
 9:    1  1458       0       2  48   12 2.1262    0.00   1995     52       4  4.2206  20405 0.7515        7        7  3.0    0    24 17.305000 0.7543364     576
10:    1  1600       0       2  39   12 4.6875    4.15   2100     43      12  5.7143  20425 0.6915        7        7  5.0    0    21 12.925000 1.5448990     441
1
1

The variable list is as follows:

Important

We want to identify, which factors determined participation in the labor force (inlf).