library(tidyverse)
Presentation 4: Scripting in R
In this section we will learn more about flow control and how to make more complex code constructs in R.
If-else statments
If-else statements are essential if you want your program to do different things depending on a condition. Here we see how to code them in R.
First define some variables.
<- 8
num1 <- 5 num2
Now that we have variables, we can test logical statement between them: Is num1
larger than num2
? The result of a logical statement is always one of either TRUE
or FALSE
:
> num2 num1
[1] TRUE
Is num1
smaller than num2
?
< num2 num1
[1] FALSE
We use logical statements inside an if
statement to define a condition.
if (num1 > num2){
<- paste(num1, 'is larger than', num2)
statement
}
print(statement)
[1] "8 is larger than 5"
We can add an else if
to test multiple conditions. else
is what applies when all previous checks where FALSE
.
Now we have three possible outcomes:
#try with different values for num2
<- 10
num2
if (num1 > num2){
<- paste(num1, 'is larger than', num2)
statement else if (num1 < num2) {
} <- paste(num1, 'is smaller than', num2)
statement else {
} <- paste(num1, 'is equal to', num2)
statement
}
print(statement)
[1] "8 is smaller than 10"
For-loops
Defining a for loop
Many functions in R are already vectorized, i.e.
<- tibble(num1 = 1:10)
df df
# A tibble: 10 × 1
num1
<int>
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
$num2 <- df$num1 * 10
df df
# A tibble: 10 × 2
num1 num2
<int> <dbl>
1 1 10
2 2 20
3 3 30
4 4 40
5 5 50
6 6 60
7 7 70
8 8 80
9 9 90
10 10 100
The above code applies * 10
to each element of column num1
without us having to invoke a loop.
But sometimes we want to iterate over the elements manually because the situation requires it. For that case we can use a for
loop.
We first define a list containing both numeric and character elements.
<- list(1, 2, 6, 3, 2, 'hello', 'world', 'yes', 7, 8, 12, 15) list1
To loop through list1
, we define a loop variable (here called element
), which takes the value of each item in the vector, one at a time.
for (element in list1) {
print(element)
}
[1] 1
[1] 2
[1] 6
[1] 3
[1] 2
[1] "hello"
[1] "world"
[1] "yes"
[1] 7
[1] 8
[1] 12
[1] 15
The loop variable name is arbitrary - you can call it anything. For example, we can use THIS_VARIABLE
and get the same result. Point is, it does not matter what you call the variable, just avoid overwriting an important variable of your script.
for (THIS_VARIABLE in list1) {
print(THIS_VARIABLE)
}
[1] 1
[1] 2
[1] 6
[1] 3
[1] 2
[1] "hello"
[1] "world"
[1] "yes"
[1] 7
[1] 8
[1] 12
[1] 15
After you loop through a vector or a list, the value of the loop variable is always the last element of your vector. The variable is hence a global variable.
THIS_VARIABLE
[1] 15
Loop control
There are two loop control statements we can use to
- jump to the next iteration:
next
- end the loop before finishing:
break
#example for next
for (element in list1) {
if(element == 'hello'){
next
}
print(element)
}
[1] 1
[1] 2
[1] 6
[1] 3
[1] 2
[1] "world"
[1] "yes"
[1] 7
[1] 8
[1] 12
[1] 15
#example for break
for (element in list1) {
if(element == 'hello'){
break
}
print(element)
}
[1] 1
[1] 2
[1] 6
[1] 3
[1] 2
Which data constructs are iterable in R?
Vectors:
<- c(1, 2, 3, 4, 5)
my_vector for (elem in my_vector) {
print(elem)
}
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
Lists:
<- list(a = 1, b = "Hello", c = TRUE)
my_list for (elem in my_list) {
print(elem)
}
[1] 1
[1] "Hello"
[1] TRUE
Dataframes and tibbles:
<- data.frame(A = 1:3, B = c("X", "Y", "Z"))
my_df my_df
A B
1 1 X
2 2 Y
3 3 Z
#column-wise
for (col in my_df) {
print(col)
}
[1] 1 2 3
[1] "X" "Y" "Z"
For row-wise iteration you can for example use the row index:
for (i in 1:nrow(my_df)) {
print(i)
#print row i
print(my_df[i,])
}
[1] 1
A B
1 1 X
[1] 2
A B
2 2 Y
[1] 3
A B
3 3 Z
If-else in loops
We can now use what we have learned to loop through our list1
and multiply all numeric values with 10:
#to remember contents:
list1
[[1]]
[1] 1
[[2]]
[1] 2
[[3]]
[1] 6
[[4]]
[1] 3
[[5]]
[1] 2
[[6]]
[1] "hello"
[[7]]
[1] "world"
[[8]]
[1] "yes"
[[9]]
[1] 7
[[10]]
[1] 8
[[11]]
[1] 12
[[12]]
[1] 15
for (element in list1) {
if (is.numeric(element)){
<- paste(element, 'times 10 is', element*10)
statement else {
} <- paste(element, 'is not a number!')
statement
}print(statement)
}
[1] "1 times 10 is 10"
[1] "2 times 10 is 20"
[1] "6 times 10 is 60"
[1] "3 times 10 is 30"
[1] "2 times 10 is 20"
[1] "hello is not a number!"
[1] "world is not a number!"
[1] "yes is not a number!"
[1] "7 times 10 is 70"
[1] "8 times 10 is 80"
[1] "12 times 10 is 120"
[1] "15 times 10 is 150"
Note: that this does not work with a vector, i.e. vec <- c(1,2,'hello')
because vectors can only contain one data type so all elements of vec
are characters.
User defined Functions
User defined functions help us to re-use and structure our code.
We will use BMI calculation as an example for this part.
#measurements of one individual
<- 70
weight_kg <- 1.80 height_m
We calculate BMI with this formula:
<- weight_kg/height_m^2
bmi bmi
[1] 21.60494
If we plan to calculate BMI for multiple individuals it is convenient to write the calculation into a function.
Function name:
calculate_bmi
.Function parameters:
weight_kg
andheight_m
.The return value:
bmi
.
The return statement specifies the value that the function will return when called.
<- function(weight_kg, height_m){
calculate_bmi
<- weight_kg/height_m^2
bmi
return(bmi)
}
We can now call the function on our previously defined variables.
calculate_bmi(weight_kg = weight_kg,
height_m = height_m)
[1] 21.60494
We can also pass numbers directly to the function.
calculate_bmi(weight_kg = 100,
height_m = 1.90)
[1] 27.70083
Argument Order in Function Calls
If we specify the parameter names, the order can be changed.
calculate_bmi(height_m = 1.90,
weight_kg = 100)
[1] 27.70083
If we do not specify the parameter names, the arguments will be matched according to the position - so be careful with this.
calculate_bmi(1.90,
100)
[1] 0.00019
Combining function call with if-statement
We can combine user-defined functions with if-else statements, so that the if-else will decide whether we execute the function or not.
#measurements of one individual
<- 45
age <- 85
weight_kg <- 1.75 height_m
Fpr some BMI should only be calculated for individuals over the age of 18.
if (age >= 18){
calculate_bmi(weight_kg, height_m)
}
[1] 27.7551
Combining function call with for-loops
Or we can choose to execute our function once for every element of an iterable, e.g. every row in a dataframe:
<- data.frame(row.names = 1:5,
df age = c(45, 16, 31, 56, 19),
weight_kg = c(85, 65, 100, 45, 76),
height_m = c(1.75, 1.45, 1.95, 1.51, 1.89))
df
age weight_kg height_m
1 45 85 1.75
2 16 65 1.45
3 31 100 1.95
4 56 45 1.51
5 19 76 1.89
Print ID, weight, and height of all individuals.
for (id in rownames(df)){
<- df[id, 'weight_kg']
weight
<- df[id, 'height_m']
height
print(c(id, weight, height))
}
[1] "1" "85" "1.75"
[1] "2" "65" "1.45"
[1] "3" "100" "1.95"
[1] "4" "45" "1.51"
[1] "5" "76" "1.89"
Call function to calculate BMI for all individuals.
for (id in rownames(df)) {
<- df[id, 'weight_kg']
weight
<- df[id, 'height_m']
height
<- calculate_bmi(weight, height)
bmi
print(c(id, bmi))
}
[1] "1" "27.7551020408163"
[1] "2" "30.9155766944114"
[1] "3" "26.2984878369494"
[1] "4" "19.7359764922591"
[1] "5" "21.2760001119789"
Combination of function call, if-statement and for-loops.
Print BMI for individuals that are 18 years old or older.
for (id in rownames(df)) {
if (df[id, 'age'] >= 18) {
<- df[id, 'weight_kg']
weight
<- df[id, 'height_m']
height
<- calculate_bmi(weight, height)
bmi
print(c(id, bmi))
else {
}
print(paste(id, 'is under 18.'))
}
}
[1] "1" "27.7551020408163"
[1] "2 is under 18."
[1] "3" "26.2984878369494"
[1] "4" "19.7359764922591"
[1] "5" "21.2760001119789"
Add BMI to the data frame.
for (id in rownames(df)){
if (df[id, 'age'] >= 18) {
<- df[id, 'weight_kg']
weight
<- df[id, 'height_m']
height
<- calculate_bmi(weight, height)
bmi
else {
}
<- NA
bmi
}
'bmi'] <- bmi
df[id,
}
Have a look at the data frame.
df
age weight_kg height_m bmi
1 45 85 1.75 27.75510
2 16 65 1.45 NA
3 31 100 1.95 26.29849
4 56 45 1.51 19.73598
5 19 76 1.89 21.27600
Error handling in user-defined functions
Currently our BMI function accepts all kinds of inputs. However, what happens if we give a negative weight?
calculate_bmi(weight_kg = -50, height_m = 1.80)
[1] -15.4321
We should require that both weight and height need to be positive values:
<- function(weight_kg, height_m) {
calculate_bmi_2
# Check if weight and height are numeric
if (!is.numeric(weight_kg) | !is.numeric(height_m)) {
stop("Both weight_kg and height_m must be numeric values.")
}
# Check if weight and height are positive
if (weight_kg <= 0) {
stop("Weight must be a positive value.")
}if (height_m <= 0) {
stop("Height must be a positive value.")
}
# Calculate BMI
<- weight_kg / height_m^2
bmi
# Check if BMI is within a reasonable range
if (bmi < 10 | bmi > 60) {
warning("The calculated BMI is outside the normal range. Please check your input values.")
}
return(bmi)
}
When we try to run calculate_bmi_2
with a negative weight we now receive an error:
calculate_bmi_2(weight_kg = -50, height_m = 1.80)
We also added a check whether the calculated BMI is within the normal range:
calculate_bmi_2(weight_kg = 25, height_m = 1.80)
Warning in calculate_bmi_2(weight_kg = 25, height_m = 1.8): The calculated BMI
is outside the normal range. Please check your input values.
[1] 7.716049
Running calculate_bmi_2
with appropriate inputs:
calculate_bmi_2(weight_kg = 75, height_m = 1.80)
[1] 23.14815
Out-sourcing functions to an Rscript you source
It is cleaner to collect all your functions in one place, and perhaps that place should not be your analysis script. You can instead save your functions in a separate R script and source
it inside your analysis script to have access to all your functions without them cluttering your workflow.
We have create a file named presentation4_functions.R
and copied our two function definitions for calculate_bmi
and calculate_bmi_2
into it.
Now we remove our function definitions from the global environment to demonstrate how to source them from an external file.
rm(list = "calculate_bmi", "calculate_bmi_2")
By sourcing
a script, all global variables (including functions) in that script will be loaded and appear in the Global environment in the top left corner. Here we source the functions.R
script. Check the environment to confirm that the two functions appeared.
source('./presentation4_functions.R')
After we sourced the functions script the calculate_bmi
function can be used just like if it was defined in the main script. If you work on a larger project and write multiple functions, it is best practice to have a function script and source it in your main script.
calculate_bmi_2(weight_kg = 67,
height_m = 1.70)
[1] 23.18339
You can also use mapply
as an alternative to calling the function in a for-loop:
mapply(FUN = calculate_bmi_2,
weight_kg = df$weight_kg,
height_m = df$height_m)
[1] 27.75510 30.91558 26.29849 19.73598 21.27600