#
#
data_source <- "http://www.principlesofeconometrics.com/poe5/data/csv/transport.csv"
dt1 <- read.csv(file = data_source, sep = ",", dec = ".", header = TRUE)
Datasets
A number of example datasets can be found at Principles of Econometrics book website along with their dataset definitions or from R datasets which are available in both R and Python.
Of course, you are encouraged to find any other data, which may be interesting to you. The data that will be used in this section is chosen for its ease of access.
The datasets can be loaded in both R and Python.
Dataset 1: Choice of transportation
The dataset: transport ( data definition).
We want to evaluate whether someone will choose to ride a bus, or car.
import pandas as pd
#
= "http://www.principlesofeconometrics.com/poe5/data/csv/transport.csv"
data_source = pd.read_csv(data_source) dt1
Dataset 2: Titanic survivor data
The dataset: new titanic data (previous, much smaller version titanic).
Let’s say that we are interested in estimating whether a passenger will Survive
1 based on their age, gender, economic status and other factors.
#
#
data_source <- "https://raw.githubusercontent.com/paulhendricks/titanic/master/inst/data-raw/train.csv"
dt2 <- read.csv(file = data_source, sep = ",", dec = ".", header = TRUE)
import pandas as pd
#
= "https://raw.githubusercontent.com/paulhendricks/titanic/master/inst/data-raw/train.csv"
data_source = pd.read_csv(data_source) dt2
Dataset 3: Code or Pepsi?
The dataset: coke (data definition).
we want to evaluate whether a customer will choose coke, or pepsi.
#
#
data_source <- "http://www.principlesofeconometrics.com/poe5/data/csv/coke.csv"
dt3 <- read.csv(file = data_source, sep = ",", dec = ".", header = TRUE)
import pandas as pd
#
= "http://www.principlesofeconometrics.com/poe5/data/csv/coke.csv"
data_source = pd.read_csv(data_source) dt3
Dataset 4: Defaulting on debt
The dataset: credit card default.
The aim is to predict which customers will default
on their credit card debt.
#
#
#
#
dt4 <- ISLR::Default
import statsmodels.api as sm
#
= sm.datasets.get_rdataset("Default", "ISLR")
dt4 #print(dt4.__doc__) #documentation about the data
= dt4.data dt4
Dataset 5: U.S. Women’s Labor-Force Participation
The dataset: MROZ (definition) ( more data ).
#
#
dt5 <- foreign::read.dta("http://fmwww.bc.edu/ec-p/data/wooldridge/mroz.dta")
dt5 <- data.frame(dt5)
import pandas as pd
#
= pd.read_stata("http://fmwww.bc.edu/ec-p/data/wooldridge/mroz.dta") dt5
Or in this case, could have survived.↩︎