Presentation 1: R, Rstudio and Quarto

Want to code along?

The Quarto Way!

Quarto is an open-source publishing suite - a tool-suite that supports workflows for reproducible scholarly writing and publishing.

Quarto documents often begin with a YAML header demarcated by three dashes (---) which specifies things about the document. This includes what type of documents to render (compile) to e.g. HTML, PDF, WORD and whether it should be published to a website project. You can also add information on project title, author, default editor, etc.

Quarto uses a markup language

Quarto works with markup language. A markup language is a text-coding system which specifies the structure and formatting of a document and the relationships among its parts. Markup languages control how the content of a document is displayed.
Pandoc Markdown is the markup language utilized by Quarto - another classic example of a markup language is LaTex.

Lets see how the pandoc Pandoc Markdown works:

This is the third largest header (Header 3)

This is the smallest header (Header 6)

Headers are marked with hashtags. More hashtags equals smaller title.

This is normal text. Yes, it is larger than the smallest header. A Quarto document works similarly to a Word document where you can select tools from the toolbar to write in bold or italic and insert thing like a table:

My friends Their favorite drink Their favorite food
Micheal Beer Burger
Jane Wine Lasagne
Robert Water Salad

… a picture:

Miw

This is a cute cat

Picture source

We can also make a list of things we like:

  • Coffee

  • Cake

  • Water

  • Fruit

Modes of Quarto Document

There are two modes of Quarto: Source and Visual. In the left part of the panel you can change between the two modes.
Some features can only be added when you are in Source mode. E.g write blue text is coded like this in the source code [write blue text]{style="color:blue"}.

Code chunks and structure

Code chunks are where the code is added to the document.

Click the green button +c and a grey code chunk will appear with '{r}' in the beginning. This means that it is an R code chunk. It is also possible to insert code chunks of other coding language.


For executing the code, press the Run button in the top right of the chunk to evaluate the code.

some executable code in an R code chunk.

1+3
[1] 4

Below is a code chunk with a comment. A comment is a line that starts with a hashtag. Comments can be useful in longer code chunks and will often describe the code.

# This is a comment. Here I can write whatever I want because it is in hashtags. 

You can add comments above or to the right of the code. This will not influence the executing of the code.

# Place a comment here 
1+3 # or place a comment here
[1] 4

Output of code chunks

Control whether code is executed.

eval=FALSE not execute the code and eval=TRUE will execute the code.

The code is shown, but the result is not shown ({r, echo=TRUE, eval=FALSE}):

1+3

Show or hide code. echo=FALSE will hide the code and echo=TRUE will show the code. Default is TRUE.

The code is not shown, but the result is shown ({r, echo=FALSE, eval=TRUE}):

[1] 4

Control messages, warnings and errors. Maybe you have a code chunk that you know will produce one of the three and you often don’t want to see it in the compiled document. N.B! It is not a good idea to hide these statements (especially the errors) before you know what they are.

Warning is not printed ({r, message=FALSE, warning=FALSE, error=TRUE}):

log(-1)
[1] NaN

Warning is printed ({r message=TRUE, warning=TRUE, error=TRUE}):

log(-1)
Warning in log(-1): NaNs produced
[1] NaN

Render: Making the report

In the panel there is a blue arrow and the word Render. Open the rendered html file in your browser and admire your work.

Let’s get to coding!


Lets get started on R!

Working directory

The term path refers to the trajectory you need to follow from the place you are β€˜located’ on your computer to the place you want to work from. When working with Quarto your working directory (wd) is always the same locations as your Quarto document (is not true for .R scripts!). The wd becomes important when we start loading data in from other places (presentation 2).

Where am I now? getwd()

getwd()
[1] "/Users/srz223/Desktop/DataLab/FromExceltoR/Teachers/Presentations"

Set working directory, setwd()

The working directory can be changed BUT when working with Quarto this only influences individual code chunks. In contrast, a changing of wd within an .R script affects the document globally - we will not cover .R scripts in this course.
There are two type of paths, absolute paths (from root to desired location) and relative paths (from current location to desired location):

Absolute path:

setwd("/Users/kgx936/Desktop/HeaDS/GitHub_repos/FromExceltoR")

Relative path:

setwd('./Exercises/')

Navigate up in the directory tree (..)

setwd('../Teachers')

Pointing to a dataset

load('/Users/kgx936/Desktop/HeaDS/GitHub_repos/FromExceltoR/Data/MyData.Rdata')

Variable assignment

In R we use an arrow for variable assignment. You may call your almost variables whatever you like. DO NOT: use special characters in variable names, i.e. &, ), $ etc. or make spaces in naming.

The first two variables we create is β€˜a’ and β€˜b’

a <- 1
b <- 3

Now we print β€˜a’ and see what value it has:

print(a)
[1] 1
print(b)
[1] 3

We add a and b without reassignment and get the result printed:

a + b 
[1] 4

If we want to save the result we have to reassign it to a new variable:

c <- a + b
print(c)
[1] 4

A vector of numbers named num1

num1 <- c(5,1,11,6,4)
num1
[1]  5  1 11  6  4

Find the mean of the vector

(5+1+11+6+4)/5
[1] 5.4

Functions and Arguments

Function are chunks of code wrapped in a way which makes the code inside reusable. A function takes an input(s) (arguments) and returns an output(s). You can make your own function but in this course you will only use the functions that are already available in the R packages.

Let’s look at the mean() function

?mean()

Taking the mean of a vector

mean(num1)
[1] 5.4

Functions makes code reusable

num2 <- c(0,3,4,9,1,2,7,10,2,11) # Define new vector
mean(num2) # Print the mean of the vector 
[1] 4.9

Find length of vector

length(num1)
[1] 5
length(num2)
[1] 10

Simple summary statistics

Summary statistics is information such as number of items, mean, median, standard deviation and sum.

Summary statistics of a vector

mean(num2) # mean/average
[1] 4.9
median(num2) # median
[1] 3.5
sd(num2) # standard deviation
[1] 4.012481
sum(num2) # sum
[1] 49
min(num2) # minimum value
[1] 0
max(num2) # maximum value
[1] 11

R packages

R packages are collections of functions written by R developers and super users and they make our lives much easier. Functions used in the same type of R analysis/pipeline are bundled and organized in packages. There is a help page for each package to tell us which functions it contains and which arguments go into these. In order to use a package we need to download and install it on our computer. Most R packages are stored and maintained on the CRAN[https://cran.r-project.org/mirrors.html%5D repository.

Install a package

# install.packages('tidyverse')

Load packages

library(tidyverse)

Query package

?tidyverse

Query function from package

?dplyr::select


Slideshow Intermezzo



Data structures

In the example below we will make two vectors into a tibble. Tibbles are the R object types you will mainly be working with in this course. We will try to convert between data types and structures using the collection of β€˜as.’ functions.

A vector of characters

people <- c("Anders", "Diana", "Tugce", "Henrike", "Chelsea", "Valentina", "Thile", "Helene")
people
[1] "Anders"    "Diana"     "Tugce"     "Henrike"   "Chelsea"   "Valentina"
[7] "Thile"     "Helene"   

A vector of numeric values

joined_year <- c(2019, 2020, 2020, 2021, 2023, 2022, 2020, 2024)
joined_year
[1] 2019 2020 2020 2021 2023 2022 2020 2024

Access data type or structure with the class() function

class(people)
[1] "character"
class(joined_year)
[1] "numeric"

Convert joined_year to character values

joined_year <- as.character(joined_year)
joined_year
[1] "2019" "2020" "2020" "2021" "2023" "2022" "2020" "2024"
class(joined_year)
[1] "character"

Convert joined_year back to numeric values

joined_year <- as.numeric(joined_year)
joined_year
[1] 2019 2020 2020 2021 2023 2022 2020 2024

Convert classes with the β€˜as.’ functions

# as.numeric()
# as.integer()
# as.character()
# as.factor()
# ...

Let’s make a tibble from two vectors

my_data <- tibble(name = people, 
                  joined_year = joined_year)

my_data
# A tibble: 8 Γ— 2
  name      joined_year
  <chr>           <dbl>
1 Anders           2019
2 Diana            2020
3 Tugce            2020
4 Henrike          2021
5 Chelsea          2023
6 Valentina        2022
7 Thile            2020
8 Helene           2024
class(my_data)
[1] "tbl_df"     "tbl"        "data.frame"

Just like you can convert between different data types, you can convert between data structures/objects.

Convert tibble to dataframe

my_data2 <- as.data.frame(my_data)
class(my_data2)
[1] "data.frame"

Convert classes with the β€˜as.’ functions

# as.data.frame()
# as.matrix()
# as.list()
# as.table()
# ...
# as_tibble()

Fundamental operations

You can inspect an R objects in different ways:

1. Simply call it and it will be printed to the console. 2. With large object it is preferable to use `head()` or `tail()` to only see the first or last part. 3. To see the data in a tabular excel style format you can use `view()`

Remove something:

rm(a)

Look at the β€œhead” of an object:

head(my_data, n = 4)
# A tibble: 4 Γ— 2
  name    joined_year
  <chr>         <dbl>
1 Anders         2019
2 Diana          2020
3 Tugce          2020
4 Henrike        2021

Open up tibble as a table (Excel style):

view(my_data)

dim(), short for dimensions, which returns the number of rows and columns of an R object:

dim(my_data)
[1] 8 2

Look at a single column from a tibble using the β€˜$’ symbol:

my_data$joined_year
[1] 2019 2020 2020 2021 2023 2022 2020 2024