Chapter 18 Reporting

In this chapter we will jump into ‘reporting’ using the R packages rmarkdown and knitr. This allows us to easily create (interactive) documents in different formats, such as HTML (like this document), PDF, or even Microsoft Word.

To increase the motivation: what we want to do is to bridge the gap between pure code (source code; .R scripts) and a readable file. This file can be for yourself, your colleague, sister, mum, or boss. It helps to structure and document code and possible solutions. As an illustrative example (Rhine river runoff forecasts): we would like to convert our code (left) into a nice html document (right):

18.1 Getting Started

As the title already indicates, R Markdown combines R and markdown. Markdown is a simple but powerful markup language. In contrast to programming languages, markup languages are used to structure text. Some well known markup languages are:

markdown (which we will use here)
HTML (HyperText Markup Language)
Tex/LaTeX

Markup languages have specific elements which do not count as text (content) of the document, but allow to e.g., specify some text as title, make text bold face or italic, or set up enumerated lists. One example: if we want to define a section title we use:

# in markdown,
<h1>...</h1> in HTML,
or \section{...} in LaTeX.

Create New Document

While R scripts have the file ending .R, R markdown files have the ending .Rmd (for R Markdown). RStudio supports you creating these files. Let us create our first R Markdown file:

Open your RStudio
Create a new document by navigating through: Files > New File > R Markdown.
A new window opens where you can select which type of document you want to create (Document, Presentation, Shiny, From Template). Select Document. You can also specify the title of the document and in which format it should be rendered. Leave this as it is for now.
Press OK and you should end up with a new document (see below).
Next step: save this new document (File > Save; or press the save icon). Save the file as test.Rmd (important is the file ending) on your harddisc as if you would save an .R script.

The first things we have to know

You should now see something very similar to the screenshot below. By default, RStudio creates an .Rmd document with some “demo-content”. Let us concentrate on the first few lines of the document. This differs from what we have learned from R scripts - --- or title: "Untitled" are no R commands and would result in an error.

Important: this is no R code. That’s the header (to be precise, a so called yml header). It defines the title of the document and how it should be rendered. In this case we would like to have an html_document at the end.

Knit! When writing an R script we have a “source” button to run the script. Here we now have a button called “Knit”. Knit takes our R markdown file and converts it, in this case it tries to create an html document (via the R package knitr; thus “Knit”).

Try it. As RStudio gives us some demo content, we could try what’s happening if we knit it. Thus, press the “Knit” button. You will see some output in RStudio which comes from the knitr package. If everything works fine, RStudio opens the final html document and you should see this:

Fails? Well, it should not fail. However, if it does, there is an extra “R Markdown” output tab (bottom of your RStudio window). If errors occur, you should see the output down there.

The Output

In the example above we defined that we want to have an HTML file at the end (output: html_document). After knitting the document RStudio will automatically open the document. In addition, the HTML file is stored on your harddisc.

By default, an HTML file with the same name as your .Rmd file is stored in the same folder as your .Rmd file. In the example above:

Our .Rmd file is called test.Rmd.
In the same folder, you will now have a file test.html.

Markdown Syntax

Before we start mixing markdown and R, let’s introduce the markdown syntax. There is only a handfull of commands we need to know - depending on the software you use this list might be extended (special commands for special applications).

<!DOCTYPE html PUBLIC “-//W3C//DTD HTML 4.0 Transitional//EN” “http://www.w3.org/TR/REC-html40/loose.dtd”>

Element	Markdown Syntax
Heading	`# Heading 1` `## Heading 2` `### Heading 3`
Bold	`bold text`
Italic	`_italic text_`
Strike trough	`~~strike trough~~`
Blockquote	`> blockquote`
Ordered List	`1. First item` `2. Second item` `3. Third item`
Unordered List	`- First item` `- Second item` `- Third item`
Code	`code`
Horizontal rules	`---`
Links	`[title](http://discdown.org)`
Images	`![alt text](image.jpg)`
Fenced Code Blocks	``` `pow_fun <- function(x, y) {` `return(x**y)` `}` ```
Math Expressions	$x^2 = y^2 + z^2$

18.2 First Pure Markdown File

If you followed the guide above, you should have a file test.Rmd opened in your Rstudio. Let us adjust the .Rmd file and delete everything (in the code editor: simply mark everything and delete it) such that you end up with an empty document (see below).

We will now create new content for this document (pure markdown).

Define new yml header with title, author, date, and output format.
Below the yml header, add some content. What we want to have is a section “Introduction” and “Results” with some text content. Not very meaningful - but we will extend this further in a minute.

Write/copy the following lines of code into your test.Rmd file:

---
title: "Our First Markdown File"
author: Me
date: 2020-01-04
output: html_document
---

# Introduction

This is a simple **demo file** to demonstrate markdown.
For more information, check the [discdown](http://discdown.org/rprogramming)
website or the [R Markdown: The Definitive Guide](https://bookdown.org/yihui/rmarkdown/).

# Results

## Part One

The file should be:

* Rendered.
* An HTML file should be created.
* Rstudio automatically openes the HTML file if successfully knit.

## Part Two

Not a lot to say, but we could try _italic text_, **bold text**,
and ~~strike trough~~ text. In addition, we can define `inline code`
elements (e.g., `p <- x**y`).

… such that you end up with this:

We can now render (knit) this file by pressing the “Knit” button again. If everything works well, a new window will pop up which looks as follows:

Great! First (R) markdown file rendered as HTML document. What have we done?

Header: we defined a document title, author, and date.
The content:
1. The first section (“Introduction”; heading 1 denoted by #) contains some text and two links (to discdown and the definitive R markdown guide).
2. The second section (“Results”; #) itself has two subsections (heading 2, denoted by ##) which contain an unordered list, and some text with text styling (bold face, italic, strike trough, and code).

So far, this has nothing to do with R markdown except that it is rendered via R and rmarkdown/knitr. But we could also do this without R (e.g., solely using pandoc). Let’s dive a bit deeper and combine Markdown with R.

18.3 First R Markdown File

To create ‘dynamic’ documents we can combine pure Markdown with R chunks. A chunk is nothing else than a block of R code. R chunks are “parts of R scripts” and have to contain executeable R code and/or comments.

This allows us to execute R commands on the fly. Whenever the document is rendered (‘knitted’) the R commands (scripts) will be executed and embedded in the final document.

When to use?

R markdown is very handy for different tasks. This online learning resource (discdown.org) is completely written in R markdown. If you took the course “Introduction to Programming: Programming in R” at the Universität Innsbruck, everything - from the PDF slides to the exercises, and even the quizzes - is/was based on R markdown.

For yourself

Rather than solely writing an R script, you may write a small R markdown file for yourself. Instead of having a pure script file, R markdown allows you to add extra comments, thoughts, list current restrictions or problems, and even include plots and tables in a structured way.

For your friends

As for yourself, you may write R markdown documents to help out friends and colleagues, or to send a quick update to your boss or advisor. In my old days I (Reto) typically copy-pasted R code into e-mails or skype. That works OKish for simple things with only a hand full lines of code, however, it is hard to structure more complex code/solutions properly. In addition: if the client of your opponent converted everything into html, some parts of the code may be converted into smilies, beer mugs, or flamingos. Don’t believe me?

Try to send this message via skype:

rain_climatology <- function(rain) {
    rain$doy <- as.POSIXlt(rain$date)$yday
    call <- "lm(rain ~ doy, data = rain)"
    mod <- eval(parse(text = call))
    return(mod)
}
n <- as.Date(c("2020-01-02", "2020-02-03"))
rain <- data.frame(date = seq(min(n), max(n), by = 1))
rain$rain <- abs(rnorm(nrow(rain)))
rain_climatology(rain)

The result is this:

Instead, we could write a short and simple .Rmd file which contains

The code.
The result of the code.
Additional explanations.
And even some results (plots, tables).

For your job

More ‘advanced’; imagine working with live data which are getting updated every few hours, days, or weeks, and you do have to create a report for your boss. Or: you create a report for yourself to monitor the data. E.g.,:

Biology: measuring some parameters of a process in the lab (temperature, nutrition, and the corresponding biomass in a test tube).
Meteorology: live-measurements of the smog concentration in your city.
Economics: monthly reports of credit/debit balances.
Marketing: check if your spendings (advertisements) led to the desired result.
Tourism: monitor the number of bookings/tickets sold for a specific event, hotel, or skiing ressort.
Sales management: weekly reports of products sold, products in stock, and an analysis if you have to re-order new some products not to run out of stock.
Web development: monitor the number of visits on your website and the user behaviour.

As you can see, this is not restricted to a specific field of research or industry. Nowadays, data are collected everywhere and dynamic reports (e.g., using R markdown) can be used to analyze these data - and that’s the basis of data science.

R Code Chunks

Whenever we create a new R Markdown file in RStudio, RStudio gives us a demo file with some content (shown above) which also contains some R chunks. R chunks are defined as follows:

```{r}
x <- 1:20
print(x)
plot(x, col = "red", main = "Simple Plot")
```

As in pure markdown we open and close a code block with three backticks (see also Markdown Syntax). To turn a markdown code block into an R chunk, we additionally have to set the {r} when opening the block. This tells knitr that this is code we want to execute. Within the curly braces we set additional options if needed ({r, ...}). By default:

The input (code) will be shown in the final document.
The output (prints) will be shown in the final document.
In case there is a plot, the figure will also be included.

Exercise 18.1 Let’s try it out. Create a new .Rmd file. Specify at least a title and the output format in the yml header, and copy the code chunk shown above into this new .Rmd file.

Feel free to add additional content such as text or titles (headings). Once you’ve done, save the .Rmd file if you have not yet done it and knit the document by pressing the “Knit” button.

If everything works as expected, you should end up with an HTML document which shows the R input (code of the R chunk) and the results - in this case the result of print() and a simple figure created by plot().

Solution.

18.3.0.1 SourceCode

The source code/content of the .Rmd file:

---
title: "My First Dynamic Rmd File"
output: html_document
---

```{r}
x <- 1:20
print(x)
plot(x, col = "red", main = "Simple Plot")
```

18.3.0.2 RStudio

A screenshot of the file in RStudio:

18.3.0.3 Result

Code Chunk Options

Additional knitr-options can be provided to control the chunks. These options can be set globally (check ?knitr::opts_knit) or individually for specific R chunks.

The options are set as key = value pairs within the curly braces ({r, key = value, key = value, ...}). The following list shows some of the most important options:

include = FALSE: if set, the chunk is executed, but nothing shows up in the final document. Can be used to prepare data without including it int the output document.
echo = FALSE: do not show the input (R code). Default is TRUE.
results = "hide": do not show results (e.g., from print()).
fig.keep = "none": do not include plots.
fig.width = X: numeric, width of the resulting plot (in inches).
fig.height = X: numeric, height of the resulting plot (in inches).

An example:

```{r, echo = FALSE, results = "hide", fig.width = 8, fig.height = 3}
# Draw 1000 values from a random distribution
x <- rnorm(1000)
print(x)
# Plot the data
plot(x, type = "h", col = "steelblue",
     main = "Wide Image")
```

This should lead to the result shown below. As you can see, the R source code (input; echo = FALSE) is not shown, neither the result of the print() call shows up (output; results = "hide"). All we get at the end is the plot (format: $8 \cdot 3$ inches; landscape).

18.3.1 Inline Code

In addition to R code chunks we can also execute R commands “inline”. Inline means “in text” and allows us to dynamically create text. As R code chunks are an extension of markdown code blocks, inline code is an extension of markdown inline code (see Markdown Syntax).

Instead of having three backticks and {r} inline code is defined as follows:

`r paste("Happy", 2020)`

The “r” tells knitr that this expression (code) must be executed when knitting the document. We can now use this in the text. As an example: the following line:

**Hy there, we wish you a `r paste("Happy", 2020)`!**

… will generate the following: Hy there, we wish you a Happy 2020! The code within the ticks is executed (paste("Happy", 2020)) and “inserted” into the text at this specific position. This can be used to print some numbers in the text (e.g., the current date, or the maximum of a specific variable), or more complex. With some logic we could adjust the text dynamically and generate data-dependent output.

As an example, let’s imagine we have a weather forecast for tomorrow and, depending on the forecasted temperature and sunshine duration, the text output should be different. Therefore, we could write a small function which we can use in combination with inline code:

weather_forecast <- function(temperature, sunshine) {
    if (temperature < 0 & sunshine > 5) {
        res <- paste("Tomorrow will be very cold with a temperature of only",
                     temperature, "degrees Celsius. However, with ", sunshine,
                     "hours of sun it is worth to go outside!")
    } else if (temperature < 0) {
        res <- paste("Cold and only little sun expected for tomorrow.",
                     "Suggestion: stay at home!")
    } else if (sunshine < 5) {
        res <- paste("With", temperature, "degrees Celsius it will be warm tomorrow,",
                     "however, with only", sunshine, "hours of sun it sounds like a couch day.")
    } else {
        res <- paste("Warm an sunny! With", temperature, "degrees Celsius and up to",
                     sunshine, "hours of sunshine it will be a nice day.")
    }
    return(res)
}

We can now use the function inline (calling weather_forecast(-9.3, 0.2) or weather_forecast(8.3, 6.2), …). Two examples:

Tomorrows weather forecast: Cold and only little sun expected for tomorrow. Suggestion: stay at home!

Tomorrows weather forecast: Warm an sunny! With 8.3 degrees Celsius and up to 6.2 hours of sunshine it will be a nice day.

18.3.2 Tables

knitr comes with a function called kable() to create html tables in the output. The input to kable() is typically a matrix or data frame. Let’s use the old faithful data set to demonstrate this feature:

# Loading the data set
data("faithful")
# Simple print
head(faithful)

##   eruptions waiting
## 1     3.600      79
## 2     1.800      54
## 3     3.333      74
## 4     2.283      62
## 5     4.533      85
## 6     2.883      55

Instead of printing the data frame as shown above, we can call kable(faithful) within the R chunk.

library("knitr")
kable(head(faithful))

eruptions	waiting
3.600	79
1.800	54
3.333	74
2.283	62
4.533	85
2.883	55

Typically you suppress the input (R chunk options; set echo = FALSE) to only show the table, but not the code. One problem with knitr::kable(): if you have dozends of observations, the resulting table will ge very large. Thus, this should be used for summary tables or results, and not to show large data frames or matrices. Other contributed packages (e.g., DT) provide more functionality to create tables. Just as a teaser:

suppressPackageStartupMessages(library("DT"))
# Create table
datatable(faithful, options = list(paging = TRUE, searching = TRUE),
          caption = "Old faithful data set")