1.2 Statistical Data types
This section gives a brief introduction of three of the most common data types used in data analysis (specifically, statistical/econometric analysis and modeling).
1.2.1 Cross-sectional Data
Cross-section data is collected in a single time period and is characterized by individual units - people, companies, countries, etc. Some examples include:
- Student grades at the end of the current semester;
- Household data of the previous year - expenditure on food, unemployment, income, etc.
- Car data - average speed, horsepower, color, etc.
With cross-sectional data the ordering of the data does not matter. In other words, we can order the data by ascending, descending or even randomized order and this will not affect out modeling results.
The following data sample gives the speed of cars and the distances taken to stop. The data were recorded in the 1920’s.
1.2.2 Time Series Data
Data collected at a number of specific points in time is called time series data. Such examples include stock prices, interest rates, exchange rates as well as product prices, GDP, etc. Time series data can be observed at many different frequencies (hourly, daily, weekly, monthly, quarterly, anually, etc.).
Unlike cross-sectional data, the ordering of the data is important in time-series data. Each point represents the values at specific points in time. As such, time series data are typically presented in chronological order. Changing the order of the data ignores the time-dimensionality of the data.
The following data sample is of quarterly observations of Money, GDP and Interest Rate in Canada, where \(m\) is the log of the real money supply, \(y\) is the log of GDP in 1992 dollars, seasonally, adjusted; \(p\) is the log of the price level and \(r\) is the 3-month treasury till rate.
## m y p r
## 1 11.21111 12.62052 -1.49969 4.46333
## 2 11.21075 12.64173 -1.48955 4.17333
## 3 11.20382 12.64643 -1.48414 4.47333
## 4 11.17621 12.65076 -1.47146 5.45333
## 5 11.14330 12.65842 -1.45747 6.69000
## 6 11.11438 12.68715 -1.45569 6.83333
## m y p r
## 0 11.21111 12.62052 -1.49969 4.46333
## 1 11.21075 12.64173 -1.48955 4.17333
## 2 11.20382 12.64643 -1.48414 4.47333
## 3 11.17621 12.65076 -1.47146 5.45333
## 4 11.14330 12.65842 -1.45747 6.69000
## 5 11.11438 12.68715 -1.45569 6.83333
1.2.3 Panel (or Longitudinal) data
Panel data combines cross-sectional and time series data: the same individuals (persons, firms, cities, etc.) are observed at several points in time (days, years, before and after treatment etc.). Panel data allows you to control for variables you cannot observe or measure like:
- cultural (like country or region specific) factors;
- difference in business practices across companies;
If we have the same number of time period observations for each individual, then we have a balanced panel.
The following data sample is of Grunfeld Investment Data - a panel of 10 observations from 1935 to 1954 in the US, where firm
is the firm ID, year
is the date, inv
is the gross investment, value
is the value of the firm and capital
is the stock of plant and equipment.
## firm year inv value capital
## 1 1 1935 317.6 3078.5 2.8
## 2 1 1936 391.8 4661.7 52.6
## 3 1 1937 410.6 5387.1 156.9
## 4 1 1938 257.7 2792.2 209.2
## 5 1 1939 330.8 4313.2 203.4
## 6 1 1940 461.2 4643.9 207.2
## 7 1 1941 512.0 4551.2 255.2
## 8 1 1942 448.0 3244.1 303.7
## 9 1 1943 499.6 4053.7 264.1
## 10 1 1944 547.5 4379.3 201.6
## 11 1 1945 561.2 4840.9 265.0
## 12 1 1946 688.1 4900.9 402.2
## 13 1 1947 568.9 3526.5 761.5
## 14 1 1948 529.2 3254.7 922.4
## 15 1 1949 555.1 3700.2 1020.1
## 16 1 1950 642.9 3755.6 1099.0
## 17 1 1951 755.9 4833.0 1207.7
## 18 1 1952 891.2 4924.9 1430.5
## 19 1 1953 1304.4 6241.7 1777.3
## 20 1 1954 1486.7 5593.6 2226.3
## 21 2 1935 209.9 1362.4 53.8
## 22 2 1936 355.3 1807.1 50.5
## 23 2 1937 469.9 2676.3 118.1
## 24 2 1938 262.3 1801.9 260.2
## 25 2 1939 230.4 1957.3 312.7
## 26 2 1940 361.6 2202.9 254.2
## 27 2 1941 472.8 2380.5 261.4
## 28 2 1942 445.6 2168.6 298.7
## 29 2 1943 361.6 1985.1 301.8
## 30 2 1944 288.2 1813.9 279.1
## 31 2 1945 258.7 1850.2 213.8
## 32 2 1946 420.3 2067.7 132.6
## 33 2 1947 420.5 1796.7 264.8
## 34 2 1948 494.5 1625.8 306.9
## 35 2 1949 405.1 1667.0 351.1
## 36 2 1950 418.8 1677.4 357.8
## 37 2 1951 588.2 2289.5 342.1
## 38 2 1952 645.5 2159.4 444.2
## 39 2 1953 641.0 2031.3 623.6
## 40 2 1954 459.3 2115.5 669.7
## firm year inv value capital
## 0 1 1935 317.6 3078.5 2.8
## 1 1 1936 391.8 4661.7 52.6
## 2 1 1937 410.6 5387.1 156.9
## 3 1 1938 257.7 2792.2 209.2
## 4 1 1939 330.8 4313.2 203.4
## 5 1 1940 461.2 4643.9 207.2
## 6 1 1941 512.0 4551.2 255.2
## 7 1 1942 448.0 3244.1 303.7
## 8 1 1943 499.6 4053.7 264.1
## 9 1 1944 547.5 4379.3 201.6
## 10 1 1945 561.2 4840.9 265.0
## 11 1 1946 688.1 4900.9 402.2
## 12 1 1947 568.9 3526.5 761.5
## 13 1 1948 529.2 3254.7 922.4
## 14 1 1949 555.1 3700.2 1020.1
## 15 1 1950 642.9 3755.6 1099.0
## 16 1 1951 755.9 4833.0 1207.7
## 17 1 1952 891.2 4924.9 1430.5
## 18 1 1953 1304.4 6241.7 1777.3
## 19 1 1954 1486.7 5593.6 2226.3
## 20 2 1935 209.9 1362.4 53.8
## 21 2 1936 355.3 1807.1 50.5
## 22 2 1937 469.9 2676.3 118.1
## 23 2 1938 262.3 1801.9 260.2
## 24 2 1939 230.4 1957.3 312.7
## 25 2 1940 361.6 2202.9 254.2
## 26 2 1941 472.8 2380.5 261.4
## 27 2 1942 445.6 2168.6 298.7
## 28 2 1943 361.6 1985.1 301.8
## 29 2 1944 288.2 1813.9 279.1
## 30 2 1945 258.7 1850.2 213.8
## 31 2 1946 420.3 2067.7 132.6
## 32 2 1947 420.5 1796.7 264.8
## 33 2 1948 494.5 1625.8 306.9
## 34 2 1949 405.1 1667.0 351.1
## 35 2 1950 418.8 1677.4 357.8
## 36 2 1951 588.2 2289.5 342.1
## 37 2 1952 645.5 2159.4 444.2
## 38 2 1953 641.0 2031.3 623.6
## 39 2 1954 459.3 2115.5 669.7