#
#
data_source <- "http://www.principlesofeconometrics.com/poe5/data/csv/food.csv"
dt1 <- read.csv(file = data_source, sep = ",", dec = ".", header = TRUE)
Datasets
A number of example datasets can be found at Principles of Econometrics book website along with their dataset definitions or from R datasets which are available in both R and Python.
Of course, you are encouraged to find any other data, which may be interesting to you. The data that will be used in this section is chosen for its ease of access.
The datasets can be loaded in both R and Python.
Dataset 1: Food & income
The dataset: food (data definition).
Let’s say that we are interested in estimating how the expenditure on food, food_exp
(\(Y\)), depends on the income
variable and its polynomial transformations.
import pandas as pd
#
= "http://www.principlesofeconometrics.com/poe5/data/csv/food.csv"
data_source = pd.read_csv(data_source) dt1
Dataset 2: Nuclear Power Station Construction Data
Let’s say that we are interested in estimating how the cost (Y) depends on (some of) the remaining explanatory variable(-s).
dt2 <- boot::nuclear
import statsmodels.api as sm
#
= sm.datasets.get_rdataset("nuclear", "boot")
dt2 #print(dt.__doc__) #documentation about the data
= dt2.data dt2
Dataset 3: Home sales
The dataset: stockton5_small (data definition) contains data on houses sold in Stockton, California in 1996-1998.
Assume that we are interested how does the sale price, sprice
(\(Y\)), is affected by (some of) the remaining explanatory variable(-s).
#
#
data_source <- "http://www.principlesofeconometrics.com/poe5/data/csv/stockton5_small.csv"
dt3 <- read.csv(file = data_source, sep = ",", dec = ".", header = TRUE)
import pandas as pd
#
= "http://www.principlesofeconometrics.com/poe5/data/csv/stockton5_small.csv"
data_source = pd.read_csv(data_source) dt3
Dataset 4: 2013 Current Population Survey data
The dataset: cps5_small (data definition) contains data on hourly wage rates, education, etc. from the 2013 Current Population Survey.
Suppose we are interested in examining which of the various explanatory variables affect wage
(\(Y\)).
#
#
data_source <- "http://www.principlesofeconometrics.com/poe5/data/csv/cps5_small.csv"
dt4 <- read.csv(file = data_source, sep = ",", dec = ".", header = TRUE)
import pandas as pd
#
= "http://www.principlesofeconometrics.com/poe5/data/csv/cps5_small.csv"
data_source = pd.read_csv(data_source) dt4
Dataset 5: Canned tuna sales
The dataset: tuna, (data definition) contains weekly data (we will ignore the time dimension for now) on the number of cans sold of brand 1 tuna (sal1
).
Consider examining how the ratio of brand brand 1 tuna prices, apr1
, to brand 3 tuna prices, apr3
, affects sal1
in thousands of units. In order to do this you will need to:
- Firstly, scale
sal1
, so that it would measure sales in thousands (instead of single units). - Secondly, calculate the ratio as \(\rm price\_ratio=100\cdot(apr1/apr3)\). This ratio indicates the percentage price of brand 1 tuna, relative to brand 3 tuna. When \(\rm price\_ratio>100\), then brand 1 tuna is more expensive, and less expensive when \(price\_ratio<100\). For example:
- if the ratio equals \(100\), then the price of both brands is the same;
- if it is equal to \(90\), then brand 1 is cheaper by \(10\%\) than brand 3;
- if it is equal to \(110\), then brand 1 is \(10\%\) more expensive than brand 3.
- Finally, estimate how the price ratio affects the sales numbers of brand 1.
#
#
data_source <- "http://www.principlesofeconometrics.com/poe5/data/csv/tuna.csv"
dt5 <- read.csv(file = data_source, sep = ",", dec = ".", header = TRUE)
import pandas as pd
#
= "http://www.principlesofeconometrics.com/poe5/data/csv/tuna.csv"
data_source = pd.read_csv(data_source) dt5