Introduction

The beginning of this .Rmd file has the following code:

---
title: "R Notebook Example"
output: 
  html_document:
    toc: true
    toc_depth: 2
    toc_float: true
---

This means that we want to generatean .html document with a table of contents which floats when we scoll down or up.

If we want to add an Author name and their email, we can use

---
title: "Task X"
author: "Student Name, student.name@mif.stud.vu.lt"

...

---

This markdown file shows the basic functionality of how the code is estimated and how to write formulas and is ment as a proof of concept rather than a full fledged tutorial.

A more complete introduction to RMarkdown can be found here.

Some code examples

Basics

This is an R Markdown Notebook. When you execute code within the notebook, the results appear beneath the code.

Try executing this chunk by clicking the Run button within the chunk or by placing your cursor inside it and pressing Ctrl+Shift+Enter.

plot(cars)

Add a new chunk by clicking the Insert Chunk button on the toolbar or by pressing Ctrl+Alt+I.

When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the Preview button or press Ctrl+Shift+K to preview the HTML file).

The preview shows you a rendered HTML copy of the contents of the editor. Consequently, unlike Knit, Preview does not run any R code chunks. Instead, the output of the chunk when it was last run in the editor is displayed.

Note the difference between:

x1 <- rnorm(100)
print(mean(x1))

## [1] 0.1143196

and

x2 <- rnorm(100)
print(mean(x2))

By specifying eval = FALSE we are telling R to not evaluate this code. We can check this by looking at the list of variables in our environment:

print(ls())

## [1] "x1"

We can see that only x1 variables was created.

Code generation example

Open this notebook in RStudio and try executing the following code. If you click the green arrow a couple of times, the resulting output will be the same.

set.seed(1233)
mean(rnorm(100))

## [1] -0.162077

Now, try executing the above code and then execute the below code immediately:

mean(rnorm(100))

## [1] -0.03236137

Now, only execute the above code block - you will notice that the output is different each time! In order to avoid this - use the set.seed function in those code chunks, where you are generating data. (Note: if you use the ‘Run All’ command, then both of the above chunks will produce the same results).

To evaluate all of the chunks in an *.Rmd file, press Ctrl+Alt+Enter (‘Run All’ command).

Printing output

We can print a model output:

my.ols <- lm(mpg ~ disp + cyl, data = mtcars)
summary(my.ols)

## 
## Call:
## lm(formula = mpg ~ disp + cyl, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.4213 -2.1722 -0.6362  1.1899  7.0516 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 34.66099    2.54700  13.609 4.02e-14 ***
## disp        -0.02058    0.01026  -2.007   0.0542 .  
## cyl         -1.58728    0.71184  -2.230   0.0337 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.055 on 29 degrees of freedom
## Multiple R-squared:  0.7596, Adjusted R-squared:  0.743 
## F-statistic: 45.81 on 2 and 29 DF,  p-value: 1.058e-09

We can also print the data output:

print(head(mtcars))

##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

However if we have a large data frame, the nall of the data will be printed. If we want to format it differently, we can use a number of libraries:

DT::datatable(mtcars, width = 400)

Note that this interactivity will not work in a .pdf file.

In most cases regarding the output, using the standard print() function will be enough. For model where there is a large amount of output, please print() only the required results. Example:

The coefficients:

print(summary(my.ols)$coefficients)

##                Estimate Std. Error   t value     Pr(>|t|)
## (Intercept) 34.66099474 2.54700388 13.608536 4.022869e-14
## disp        -0.02058363 0.01025748 -2.006696 5.418572e-02
## cyl         -1.58727681 0.71184427 -2.229809 3.366495e-02

We can also use inline code with $R^2=$ `r round(summary(my.ols)$r.squared, 4)`:

Our $R^2=$ 0.7596.

Formula examples

More info on Latex formulas, matrices, etc.

General formulas

Formulas in Markdown are written between the $ symbols for inline formulas, and between $$ symbols for centered formulas.

For example, writing $X_t = \sum_{j = 1}^t \epsilon_j$ produces the following output: $X_t = \sum_{j = 1}^t \epsilon_j$.

Writing $$X_t = \sum_{j = 1}^t \epsilon_j$$ produces: \[X_t = \sum_{j = 1}^t \epsilon_j\]

Matrices

If we want to write a matrix, we use (either with $ or $$, and using \quad to separate the different matrices)::

$$
\begin{bmatrix}
\alpha& \beta^{*}\\
\gamma^{*}& \delta
\end{bmatrix} \quad 
\begin{pmatrix}
\alpha& \beta^{*}\\
\gamma^{*}& \delta
\end{pmatrix}
$$

which produces the following output: \[ \begin{bmatrix} \alpha& \beta^{*}\\ \gamma^{*}& \delta \end{bmatrix}, \quad \begin{pmatrix} \alpha& \beta^{*}\\ \gamma^{*}& \delta \end{pmatrix} \]

Equation aligning

Using the align environment and writing & next to the symbols we want to align in each row lets us specify multiple equations aligned by the same symbol, for example:

$$
\begin{align}
Y_{1,t} &= \alpha_1 + \beta_1 X_{1,t} + \epsilon_{1,t} \\
Y_{2,t} &= \alpha_1 + \beta_1 X_{1,t} + \epsilon_{1,t}
\end{align}
$$

Produces:

\[ \begin{align} Y_{1,t} &= \alpha_1 + \beta_1 X_{1,t} + \epsilon_{1,t} \\ Y_{2,t} &= \alpha_1 + \beta_1 X_{1,t} + \epsilon_{1,t} \end{align} \]

Or if we need to write our equation in a different form:

$$
\begin{align}
f(x) & = (a+b)^2 \\
& = a^2+2ab+b^2
\end{align}
$$

\[ \begin{align} f(x) &= (a+b)^2 \\ &= a^2+2ab+b^2 \end{align} \]

Or if we simply have longer names for our variables:

\begin{align}
Population_t &= \alpha_1 + \gamma_1 X_{1,t} + \gamma_2 X_{1,t-1} \\
Price_t &= \alpha_2 + \beta_1 Z_{1,t}
\end{align}

\[ \begin{align} Population_t &= \alpha_1 + \gamma_1 X_{1,t} + \gamma_2 X_{1,t-1} \\ Price_t &= \alpha_2 + \beta_1 Z_{1,t} \end{align} \]

Writing equation systems

We can write the equation systems using the cases environment (note - we are also using the & symbol to align our equations):

$$
f(n) = \begin{cases} 
n/2 &\mbox{if } n \equiv 0 \\
(3n +1)/2 & \mbox{if } n \equiv 1. 
\end{cases} \pmod{2} 
$$

\[ f(n) = \begin{cases} n/2 &\mbox{if } n \equiv 0 \\ (3n +1)/2 & \mbox{if } n \equiv 1. \end{cases} \pmod{2} \]

Regression models

$$
\text{wage} = \beta_0 + \beta_1 \cdot \text{educ}^2 + \epsilon
$$

\[ \text{wage} = \beta_0 + \beta_1 \cdot \text{educ}^2 + \epsilon \]

Estimated regression models, along with their standard errors:

We can write doen the estimated regression model, along with the standard errors using the following code

$\underset{(se)}{\widehat{\log(\text{wage})}} = \underset{(0.0702)}{1.5968} + \underset{(0.0048)}{0.0988} \cdot \text{educ}$

$\underset{(se)}{\widehat{\log(\text{wage})}} = \underset{(0.0702)}{1.5968} + \underset{(0.0048)}{0.0988} \cdot \text{educ}$

We can use $$ instead of $ around the formula in order to center the equation.

Important: you can right-click on the formulas in the .html file to see the code for the mathematical expressions

Multiple plots example

nsample = 1000

set.seed(123)
#
x  <- seq(from = 0, to = 100, length.out = nsample)
y1 <- rnorm(n = nsample)
y2 <- rexp(n = nsample)

print(mean(y1))

## [1] 0.01612787

print(mean(y2))

## [1] 0.9836951

We can see from the output that the default value of the mean is:

around 0 for the rnorm() function;
around 1 for the rexp() function.

#a 1-row, 2-column figure:
par(mfrow=c(1, 2))
# plots are added in the order that we plot them:
plot(x, y1, col = "cornflowerblue", type = "l", 
     main = bquote("Plot of"~X~"~N("~mu~","~sigma^2~"), "~mu==0~", "~sigma==1))
plot(x, y2, col = "orange", type = "l", 
     main = bquote("Plot of"~X~"~Exp("~lambda~"), "~lambda==1))

Again, examine the lecture notes/lecture slides for additional ways to plot the data.