Example: binary classification
We will use the following libraries in some capacity:
1
1
We will use the MROZ dataset contains cross-sectional labor force participation data. As an alternative to directly downloading it online, we will download the file and save it as a .csv
file, then place it in the same directory as our code. We are doing this to reduce the load on the website that this dataset is hosted, as well as to avoid periods, when the website is in maintanance mode and the files are temporary unavailable.
dt <- foreign::read.dta("http://fmwww.bc.edu/ec-p/data/wooldridge/mroz.dta")
dt <- dt %>% as.data.table()
write.csv(dt, "./mroz.csv", row.names = FALSE)
inlf hours kidslt6 kidsge6 age educ wage repwage hushrs husage huseduc huswage faminc mtr motheduc fatheduc unem city exper nwifeinc lwage expersq
1: 1 1610 1 0 32 12 3.3540 2.65 2708 34 12 4.0288 16310 0.7215 12 7 5.0 0 14 10.910060 1.2101541 196
2: 1 1656 0 2 30 12 1.3889 2.65 2310 30 9 8.4416 21800 0.6615 7 7 11.0 1 5 19.499981 0.3285121 25
3: 1 1980 1 3 35 12 4.5455 4.04 3072 40 12 3.5807 21040 0.6915 12 7 5.0 0 15 12.039910 1.5141380 225
4: 1 456 0 3 34 12 1.0965 3.25 1920 53 10 3.5417 7300 0.7815 7 7 5.0 0 6 6.799996 0.0921233 36
5: 1 1568 1 2 31 14 4.5918 3.60 2000 32 12 10.0000 27300 0.6215 12 14 9.5 1 7 20.100060 1.5242720 49
6: 1 2032 0 0 54 12 4.7421 4.70 1040 57 11 6.7106 19495 0.6915 14 7 7.5 1 33 9.859054 1.5564801 1089
7: 1 1440 0 2 37 16 8.3333 5.95 2670 37 12 3.4277 21152 0.6915 14 7 5.0 0 11 9.152048 2.1202600 121
8: 1 1020 0 0 54 12 7.8431 9.98 4120 53 8 2.5485 18900 0.6915 3 3 5.0 0 35 10.900040 2.0596340 1225
9: 1 1458 0 2 48 12 2.1262 0.00 1995 52 4 4.2206 20405 0.7515 7 7 3.0 0 24 17.305000 0.7543364 576
10: 1 1600 0 2 39 12 4.6875 4.15 2100 43 12 5.7143 20425 0.6915 7 7 5.0 0 21 12.925000 1.5448990 441
1
1
The variable list is as follows:
-
inlf
- \(=1\) if in labor force, 1975 -
hours
- hours worked, 1975 -
kidslt6
- number of kids < 6 years -
kidsge6
- number of kids 6-18 -
age
- woman’s age in yrs -
educ
- years of schooling -
wage
- estimated wage from earns., hours -
repwage
- reported wage at interview in 1976 -
hushrs
- hours worked by husband, 1975 -
husage
- husband’s age -
huseduc
- husband’s years of schooling -
huswage
- husband’s hourly wage, 1975 -
faminc
- family income, 1975 -
mtr
- fed. marginal tax rate facing woman -
motheduc
- mother’s years of schooling -
fatheduc
- father’s years of schooling -
unem
- unem. rate in county of resid. -
city
- \(=1\) if live in SMSA -
exper
- Actual years of wife’s previous labor market experience -
nwifeinc
- \((faminc - wage\times hours)/1000\) -
lwage
- \(\log(wage)\) -
expersq
- \(exper^2\)
Important
We want to identify, which factors determined participation in the labor force (inlf
).