5 minute read

Note

This is an .Rmd script that I put together for a workshop at the UC Davis R User’s Group.

If you want to follow along with the script and exercies, you can find all of the material (including this .Rmd file) at this github repo. Enjoy!


Introduction

Welcome to the R Markdown workshop!

The goals of this workshop are to introduce you to some of the functionality of R Markdown. Afterward, you might want to never want to write in .R files again! The learning curve isn’t too steep, and RMarkdown is a great entry point into easily making crisp PDFs and HTMLs that you can share, resumes, and even static websites with blogdown + Hugo.

Today we’ll cover:

  • basic formatting
  • organizing code
  • tools for creating a resume
  • tools for creating a website

That thing above is called the YAML header. Its specifies some text to render, plus some global options. For instance, this .Rmd file will render into a html_document (options include: pdf_document, word_document, beamer_presentation, ioslides_presentation). We can also set some global options for figure sizes, and add fun things like a table of contents in our output.

In this presentation, I’ll focus on html_documents because they render quickly, are easily deployed over the web for sharing, and can contain html widgets like interactive data tables and leaflets.

Formatting

Header 2

Header 3

Header 4

Header 5
Header 6

Some text without a header

some bold text

italics

Some fancy math: \(A = \pi*r^{2}\)

some code, for instance library(dplyr)

A useful link to generate fancy math

R Markdown cheatsheet

Another R Markdown Cheatsheet


^ horizontal line (slide break in a slideshow)

End a line with 2 spaces to start a new paragraph:

For example consider this blockquote:

“I only end lines with one space.” - Mark Twain

“I end lines with two spaces to indicate a new line.”
- Mark Twain

Obama reading “Where the Wild Things Are”


Why Chunks?

A chunk is where R code happens. Write code just as you would in a .R file, and run lines of code one-by-one, highlighted as a group, or by clicking on the arrows in the code chunk. You can run a block of code all at once with a click, or run all chunks above it.

#install.packages("readr")
library(readr)
dat <- read_csv("class_dat.csv")
names(dat)
##  [1] "#"                                                          
##  [2] "What's your graduate group?"                                
##  [3] "How many years are you into your program?"                  
##  [4] "Markdown."                                                  
##  [5] "RMarkdown"                                                  
##  [6] "LaTex"                                                      
##  [7] "jupyter notebooks"                                          
##  [8] "blogdown"                                                   
##  [9] "organize and run code efficiently"                          
## [10] "create PDF and/or HTML files to share"                      
## [11] "create a resume"                                            
## [12] "create a website (e.g. - blogdown)"                         
## [13] "Other"                                                      
## [14] "You're given an ultimatum, and must choose one. You select:"
## [15] "Start Date (UTC)"                                           
## [16] "Submit Date (UTC)"                                          
## [17] "time"                                                       
## [18] "time_s"                                                     
## [19] "Network ID"
# give make some more intuitive column names
colnames(dat) <- c("#","group","years","md","rmd","latex","jupyter","blogdown","code","files","resume","website","other","potatoes","start_date","end_date","time","time_s","network")

Chunk Options
echo = TRUE Display code along with results.
eval = TRUE Evaluate code and include results.
warning = TRUE Display warnings.
error = FALSE Display errors.
message = TRUE Display messages.
tidy = FALSE Reformat code in a tidy way when displaying.
cache = FALSE Cache results for future renders.
comment = "##" Default comments to place before results.
fig.width = 7 Width (inches) for plots.
fig.height = 7 Height (inches) for plots.


Exercise Set 1:

  1. Remove (message = FALSE) in Chunk 2, then knit. What happens in the output? Change it back.
  2. Instead of C&Ping this in every chunk, paste ‘warning = FALSE’ (without quotes) as a global option in Chunk 1: Setup. So your line 27 should look like: knitr::opts_chunk$set(echo = TRUE, warning = FALSE)
  3. Rename Chunk 2 to ‘read data’, which is a better description, and which increases the navigability of your code.
  4. Insert a new chunk with the shortcut: Alt + Command + i

Coding in Chunks

In Rmd, I think of chunks as paragraphs. Each chunk of code is a set of operations that run together. When I know I have one piece that works, I move on to another chunk. This makes debugging easier. When I open a script I haven’t looked at in a while, it’s easy to see my workflow, broken into chunks, and I can jump right to the places that I need to be.

Viewing data in Rmd

# visualizing data is much easier in .Rmd compared to .R
#install.packages("dplyr")
library(dplyr)
glimpse(dat)
dat

What grad groups are the most thorough?

How do we feel about Potatoes?

# run all lines of code above this plot with one button, rather than highlighting.
potatoes_plot<- dat %>% 
  filter(!is.na(potatoes) & !is.na(group)) %>% 
  ggplot(aes(x=potatoes, fill = group)) +
  xlab(NULL) +
  geom_bar()

potatoes_plot

Ecologists unamimously select sweet potatoes if forced to chose between them and french fries.


Tables

RMarkdown has some great out-of-the-box options for dumping data into a table. kable() is the function for formatting data in a table included in knitr. Combined with dplyr, it’s relatively easy to operate on a data frame, then pipe it into a table.

Kable

#install.packages("knitr"); install.packages("kableExtra")
library(knitr)
library(kableExtra)
dat %>% select(group, years, potatoes) %>%
  kable(dat, format = "html") %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)
group years potatoes
HSGG 2 sweet potato fries
Ecology 2 sweet potato fries
Ecology NA NA
Ecology 4+ sweet potato fries
Psychology 4+ french fries
Ecology 4+ sweet potato fries
Food Science 3 sweet potato fries
Ecology 3 sweet potato fries
Postdoc 1 sweet potato fries
Transportation technology policy 1 sweet potato fries
Geography 2 sweet potato fries
Ecology 4+ sweet potato fries
Horticulture and Agronomy 1 sweet potato fries
Avian Sciences 1 french fries
Entomology & Nematology 3 sweet potato fries
Undergraduate NA sweet potato fries
NA 1 sweet potato fries
NA NA NA

Another option that works well with webpages is the DT library’s datatable. This creates an interactive HTML widget that you can embed in html_documents. DTs are searchable and can rank data. They can also hold very large data frames by collapsing data in a viewer.

DT

#install.packages("DT")
library(DT)
dat %>% select(group, years, potatoes, time_s, network) %>%
  filter(potatoes == "sweet potato fries") -> temp
DT::datatable(temp)

HTML widgets

Perhaps one of the coolest applications of RMarkdown is the support of HTML widgets. You’ve already seen of that in action with DT. Basically, in the backend, reactivity is coded into Javascript and D3, and there are many out-of-the-box tools available, which you can explore here.

Leaflet

#install.packages("leaflet"); install.packages("sp")
library(leaflet)
library(sp)
water <- read_csv("water_dat.csv") # water quality data from 1988
## Parsed with column specification:
## cols(
##   Well_ID = col_character(),
##   Database = col_character(),
##   Year = col_integer(),
##   Result = col_double(),
##   Latitude = col_double(),
##   Longitude = col_double()
## )
coords <- cbind(water$Longitude, water$Latitude)
pts <- SpatialPoints(coords)
ptsdf <- SpatialPointsDataFrame(pts, data = water[,1:4])

ptsdf %>%
  leaflet() %>%
  addTiles() %>%
  addCircleMarkers()
# fancy leaflets are not that much more difficult to make
# define color palette to be used. blue values correspond to fresh water, red to salty water

co = c("blue","steelblue2","steelblue1","seashell1","orangered1","red3")
pal = colorBin(palette = co,
               domain = water$Result,
               bins = c(0,200,400,600,800,1000,5000,10000,50000))

ptsdf %>%
  leaflet() %>%
  addTiles() %>%
  addCircleMarkers(color = ~pal(Result),
                   radius = 4,
                   opacity = 0.8,
                   stroke = FALSE,
                   popup = paste(ptsdf$Result, " mg/L TDS", "<br>",
                    "Database: ", ptsdf$Database, "<br>",
                    "Well ID: ", ptsdf$Well_ID, "<br>",
                    "Latitude: ", ptsdf$Latitude, "<br>",
                    "Longitude: ", ptsdf$Longitude)) %>%

    addLegend("topright", pal = pal, # use custom palette
              values = ~Result,
              title = "TDS in the Tulare Basin (1988)",
              labFormat = labelFormat(suffix = " mg/L"),
              opacity = 1
    ) %>%

    addProviderTiles(providers$Esri.WorldTerrain)

dygraphs

#install.packages("dygraphs")
library(dygraphs)
dygraph(nhtemp, main = "New Haven Temperatures") %>%
  dyRangeSelector(dateWindow = c("1920-01-01", "1960-01-01"))

Plotly

#install.packages("plotly")
library(plotly)
ggplotly(potatoes_plot)

NetworkD3

#install.packages("networkD3")
library(networkD3)
data(MisLinks, MisNodes)
forceNetwork(Links = MisLinks, Nodes = MisNodes, Source = "source",
             Target = "target", Value = "value", NodeID = "name",
             Group = "group", opacity = 0.4)

Exercise Set 2:

  1. Follow this link and find 2 bootstrap_options that you want to try out, and then put them in, and view the results.

  2. Don’t forget to cite your soures.[@Land1971] Put ‘bibliography: ref.bib’ in a new line in the YAML header (without the quotation marks), knit the document, and re-examine this question and the footer fo the document. What’s different? Now open ref.bib in the R Markdown Workshop folder and inspect its contents. Add two more citations somewhere in the document according to their code (usually format: lastNameYear), and re-render the document. Ask someone if you’re not sure how to generate a .bib file.

  3. Create a table of contents in your HTML by changing the last line of the YAML header to include the following text:

output: html_document: toc: TRUE smooth_scroll: FALSE

  1. Now change the last line of the YAML header to:

output: html_document:
code_folding: hide toc: TRUE toc_float: TRUE

Make sure your Viewer is wide enough, or alternatively, open the html doc in a broswer. What happened to code? What happened to the table of contents?