The beginning of this .Rmd
file has the following code:
---
title: "R Notebook Example"
output:
html_document:
toc: true
toc_depth: 2
toc_float: true
---
This means that we want to generatean .html
document with a table of contents which floats when we scoll down or up.
If we want to add an Author name and their email, we can use
---
title: "Task X"
author: "Student Name, student.name@mif.stud.vu.lt"
...
---
This markdown file shows the basic functionality of how the code is estimated and how to write formulas and is ment as a proof of concept rather than a full fledged tutorial.
A more complete introduction to RMarkdown can be found here.
This is an R Markdown Notebook. When you execute code within the notebook, the results appear beneath the code.
Try executing this chunk by clicking the Run button within the chunk or by placing your cursor inside it and pressing Ctrl+Shift+Enter
.
plot(cars)
Add a new chunk by clicking the Insert Chunk button on the toolbar or by pressing Ctrl+Alt+I
.
When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the Preview button or press Ctrl+Shift+K
to preview the HTML file).
The preview shows you a rendered HTML copy of the contents of the editor. Consequently, unlike Knit, Preview does not run any R code chunks. Instead, the output of the chunk when it was last run in the editor is displayed.
Note the difference between:
x1 <- rnorm(100)
print(mean(x1))
## [1] 0.1143196
and
x2 <- rnorm(100)
print(mean(x2))
By specifying eval = FALSE
we are telling R to not evaluate this code. We can check this by looking at the list of variables in our environment:
print(ls())
## [1] "x1"
We can see that only x1
variables was created.
Open this notebook in RStudio and try executing the following code. If you click the green arrow a couple of times, the resulting output will be the same.
set.seed(1233)
mean(rnorm(100))
## [1] -0.162077
Now, try executing the above code and then execute the below code immediately:
mean(rnorm(100))
## [1] -0.03236137
Now, only execute the above code block - you will notice that the output is different each time! In order to avoid this - use the set.seed
function in those code chunks, where you are generating data. (Note: if you use the ‘Run All’ command, then both of the above chunks will produce the same results).
To evaluate all of the chunks in an *.Rmd
file, press Ctrl+Alt+Enter
(‘Run All’ command).
We can print a model output:
my.ols <- lm(mpg ~ disp + cyl, data = mtcars)
summary(my.ols)
##
## Call:
## lm(formula = mpg ~ disp + cyl, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.4213 -2.1722 -0.6362 1.1899 7.0516
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 34.66099 2.54700 13.609 4.02e-14 ***
## disp -0.02058 0.01026 -2.007 0.0542 .
## cyl -1.58728 0.71184 -2.230 0.0337 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.055 on 29 degrees of freedom
## Multiple R-squared: 0.7596, Adjusted R-squared: 0.743
## F-statistic: 45.81 on 2 and 29 DF, p-value: 1.058e-09
We can also print the data output:
print(head(mtcars))
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
However if we have a large data frame, the nall of the data will be printed. If we want to format it differently, we can use a number of libraries:
DT::datatable(mtcars, width = 400)
Note that this interactivity will not work in a .pdf
file.
In most cases regarding the output, using the standard print()
function will be enough. For model where there is a large amount of output, please print()
only the required results. Example:
The coefficients:
print(summary(my.ols)$coefficients)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 34.66099474 2.54700388 13.608536 4.022869e-14
## disp -0.02058363 0.01025748 -2.006696 5.418572e-02
## cyl -1.58727681 0.71184427 -2.229809 3.366495e-02
We can also use inline code with $R^2=$
`r
round(summary(my.ols)$r.squared, 4)
`:
Our \(R^2=\) 0.7596.
More info on Latex formulas, matrices, etc.
Formulas in Markdown are written between the $
symbols for inline formulas, and between $$
symbols for centered formulas.
For example, writing $X_t = \sum_{j = 1}^t \epsilon_j$
produces the following output: \(X_t = \sum_{j = 1}^t \epsilon_j\).
Writing $$X_t = \sum_{j = 1}^t \epsilon_j$$
produces: \[X_t = \sum_{j = 1}^t \epsilon_j\]
If we want to write a matrix, we use (either with $
or $$
, and using \quad
to separate the different matrices)::
$$
\begin{bmatrix}
\alpha& \beta^{*}\\
\gamma^{*}& \delta
\end{bmatrix} \quad
\begin{pmatrix}
\alpha& \beta^{*}\\
\gamma^{*}& \delta
\end{pmatrix}
$$
which produces the following output: \[ \begin{bmatrix} \alpha& \beta^{*}\\ \gamma^{*}& \delta \end{bmatrix}, \quad \begin{pmatrix} \alpha& \beta^{*}\\ \gamma^{*}& \delta \end{pmatrix} \]
Using the align
environment and writing &
next to the symbols we want to align in each row lets us specify multiple equations aligned by the same symbol, for example:
$$
\begin{align}
Y_{1,t} &= \alpha_1 + \beta_1 X_{1,t} + \epsilon_{1,t} \\
Y_{2,t} &= \alpha_1 + \beta_1 X_{1,t} + \epsilon_{1,t}
\end{align}
$$
Produces:
\[ \begin{align} Y_{1,t} &= \alpha_1 + \beta_1 X_{1,t} + \epsilon_{1,t} \\ Y_{2,t} &= \alpha_1 + \beta_1 X_{1,t} + \epsilon_{1,t} \end{align} \]
Or if we need to write our equation in a different form:
$$
\begin{align}
f(x) & = (a+b)^2 \\
& = a^2+2ab+b^2
\end{align}
$$
\[ \begin{align} f(x) &= (a+b)^2 \\ &= a^2+2ab+b^2 \end{align} \]
Or if we simply have longer names for our variables:
\begin{align}
Population_t &= \alpha_1 + \gamma_1 X_{1,t} + \gamma_2 X_{1,t-1} \\
Price_t &= \alpha_2 + \beta_1 Z_{1,t}
\end{align}
\[ \begin{align} Population_t &= \alpha_1 + \gamma_1 X_{1,t} + \gamma_2 X_{1,t-1} \\ Price_t &= \alpha_2 + \beta_1 Z_{1,t} \end{align} \]
We can write the equation systems using the cases
environment (note - we are also using the &
symbol to align our equations):
$$
f(n) = \begin{cases}
n/2 &\mbox{if } n \equiv 0 \\
(3n +1)/2 & \mbox{if } n \equiv 1.
\end{cases} \pmod{2}
$$
\[ f(n) = \begin{cases} n/2 &\mbox{if } n \equiv 0 \\ (3n +1)/2 & \mbox{if } n \equiv 1. \end{cases} \pmod{2} \]
$$
\text{wage} = \beta_0 + \beta_1 \cdot \text{educ}^2 + \epsilon
$$
\[ \text{wage} = \beta_0 + \beta_1 \cdot \text{educ}^2 + \epsilon \]
We can write doen the estimated regression model, along with the standard errors using the following code
$\underset{(se)}{\widehat{\log(\text{wage})}} = \underset{(0.0702)}{1.5968} + \underset{(0.0048)}{0.0988} \cdot \text{educ}$
\(\underset{(se)}{\widehat{\log(\text{wage})}} = \underset{(0.0702)}{1.5968} + \underset{(0.0048)}{0.0988} \cdot \text{educ}\)
We can use $$
instead of $
around the formula in order to center the equation.
Important: you can right-click on the formulas in the .html
file to see the code for the mathematical expressions
nsample = 1000
set.seed(123)
#
x <- seq(from = 0, to = 100, length.out = nsample)
y1 <- rnorm(n = nsample)
y2 <- rexp(n = nsample)
print(mean(y1))
## [1] 0.01612787
print(mean(y2))
## [1] 0.9836951
We can see from the output that the default value of the mean is:
0
for the rnorm()
function;1
for the rexp()
function.#a 1-row, 2-column figure:
par(mfrow=c(1, 2))
# plots are added in the order that we plot them:
plot(x, y1, col = "cornflowerblue", type = "l",
main = bquote("Plot of"~X~"~N("~mu~","~sigma^2~"), "~mu==0~", "~sigma==1))
plot(x, y2, col = "orange", type = "l",
main = bquote("Plot of"~X~"~Exp("~lambda~"), "~lambda==1))
Again, examine the lecture notes/lecture slides for additional ways to plot the data.