Presentation 4: Main Script

If-else statments

Define variables.

num1 <- 8
num2 <- 5

Logical: 8 is larger than 5.

num1 > num2
[1] TRUE

Logical: 8 is not smaller than 5.

num1 < num2
[1] FALSE

A logical statement is used in an if statement to define a condition.

if (num1 > num2){
  statement <- paste(num1, 'is larger than', num2)
}

print(statement)
[1] "8 is larger than 5"

Redefine variables.

num2 <- 10

num2 <- 3

num2 <- 8

Use else if and else statements to test multiple conditions.

if (num1 > num2){
  statement <- paste(num1, 'is larger than', num2)
} else if (num1 < num2) {
  statement <- paste(num1, 'is smaller than', num2)
} else {
  statement <- paste(num1, 'is equal to', num2)
} 

print(statement)
[1] "8 is equal to 8"

For-loops

We first define a vector containing both numeric and character elements.

vector1 <- c(1, 2, 6, 3, 2, 'hello', 'world', 'yes', 7, 8, 12, 15)

To loop through vector1, we define a loop variable (here called element), which takes the value of each item in the vector, one at a time.

for (element in vector1) {
  print(element)
}
[1] "1"
[1] "2"
[1] "6"
[1] "3"
[1] "2"
[1] "hello"
[1] "world"
[1] "yes"
[1] "7"
[1] "8"
[1] "12"
[1] "15"

The loop variable name is arbitrary - you can call it anything. For example, we can use THIS_VARIABLE and get the same result. Point is, it does not matter what you call the variable, just avoid overwriting an important variable of your script.

for (THIS_VARIABLE in vector1) {
  print(THIS_VARIABLE)
}
[1] "1"
[1] "2"
[1] "6"
[1] "3"
[1] "2"
[1] "hello"
[1] "world"
[1] "yes"
[1] "7"
[1] "8"
[1] "12"
[1] "15"

After you loop through a vector or a list, the variable is always the last element of your vector. The variable is hence a global variable.

THIS_VARIABLE
[1] "15"

User defined Functions

We will use BMI calculation as an example for this part.

Define variables.

weight_kg <- 70
height_m <- 1.80

Calculate BMI.

bmi <- weight_kg/height_m^2
bmi
[1] 21.60494

If we plan to calculate BMI for multiple individuals it is convenient to write the calculation into a function.

  • Function name: calculate_bmi.

  • Function parameters: weight_kg and height_m.

  • The return value: bmi.

The return statement specifies the value that the function will return when called.

calculate_bmi <- function(weight_kg, height_m){
  
  bmi <- weight_kg/height_m^2
  
  return(bmi)
  
}

We can call the function using previously defined variables.

calculate_bmi(weight_kg = weight_kg, 
              height_m = height_m)
[1] 21.60494

We can also pass numbers directly to the function.

calculate_bmi(weight_kg = 100, 
              height_m = 1.90)
[1] 27.70083

Argument Order in Function Calls

If we specify the parameter names, the order can be changed.

calculate_bmi(height_m = 1.90, 
              weight_kg = 100)
[1] 27.70083

If we do not specify the parameter names, the arguments will be matched according to the position - so be careful with this.

calculate_bmi(1.90, 
              100)
[1] 0.00019

Combining function call with if-statement

Data on a single individual.

age <- 45
weight_kg <- 85
height_m <- 1.75

BMI should only be calculated for individuals over the age of 18.

if (age >= 18){
  calculate_bmi(weight_kg, height_m)
}
[1] 27.7551

Combining function call with for-loops

Data on 5 individuals.

df <- data.frame(row.names = 1:5, 
                 age = c(45, 16, 31, 56, 19), 
                 weight_kg = c(85, 65, 100, 45, 76), 
                 height_m = c(1.75, 1.45, 1.95, 1.51, 1.89)
                 )

Print ID, weight, and height of all individuals.

for (id in rownames(df)){
  
  weight <- df[id, 'weight_kg']
  
  height <- df[id, 'height_m']
  
  print(c(id, weight, height))
  
}
[1] "1"    "85"   "1.75"
[1] "2"    "65"   "1.45"
[1] "3"    "100"  "1.95"
[1] "4"    "45"   "1.51"
[1] "5"    "76"   "1.89"

Call function to calculate BMI for all individuals.

for (id in rownames(df)) {
  
  weight <- df[id, 'weight_kg']
  
  height <- df[id, 'height_m']
  
  bmi <- calculate_bmi(weight, height)
  
  print(c(id, bmi))
  
}
[1] "1"                "27.7551020408163"
[1] "2"                "30.9155766944114"
[1] "3"                "26.2984878369494"
[1] "4"                "19.7359764922591"
[1] "5"                "21.2760001119789"

Combination of function call, if-statement and for-loops.

Print BMI for individuals that are 18 years old or older.

for (id in rownames(df)) {
  
  if (df[id, 'age'] >= 18) {
    
    weight <- df[id, 'weight_kg']
  
    height <- df[id, 'height_m']
    
    bmi <- calculate_bmi(weight, height)
    
    print(c(id, bmi))

  } else {
    
    print(paste(id, 'is under 18.'))
    
  }
  
}
[1] "1"                "27.7551020408163"
[1] "2 is under 18."
[1] "3"                "26.2984878369494"
[1] "4"                "19.7359764922591"
[1] "5"                "21.2760001119789"

Add BMI to the data frame.

for (id in rownames(df)){
  
  if (df[id, 'age'] >= 18) {
    
    weight <- df[id, 'weight_kg']
  
    height <- df[id, 'height_m']
    
    bmi <- calculate_bmi(weight, height)

  } else {
    
    bmi <- NA
    
  }
  
  df[id, 'bmi'] <- bmi
  
}

Have a look at the data frame.

df
  age weight_kg height_m      bmi
1  45        85     1.75 27.75510
2  16        65     1.45       NA
3  31       100     1.95 26.29849
4  56        45     1.51 19.73598
5  19        76     1.89 21.27600

Out-sourcing functions to an Rscript you source

Remove calculate_bmi from the global environment.

rm(list = "calculate_bmi")

By sourcing a script, all global variables (including functions) in script will be loaded and appear in the Global environment in the top left corner. Here we source the functions.R script.

source('./presentation4_functions.R')

After we sourced the functions script the calculate_bmi function can be used just like if it was defined in the main script. If you work on a larger project and write multiple functions, it is best practice to have a function script and source it in your main script.

calculate_bmi(weight_kg = 67, 
              height_m = 1.70)
[1] 23.18339

Use mapply for alternative to calling function in for-loop.

mapply(FUN = calculate_bmi, 
       weight_kg = df$weight_kg, 
       height_m = df$height_m)
[1] 27.75510 30.91558 26.29849 19.73598 21.27600

Functions with error handling.

The function is in the functions script.

The BMI function with out error handling returns a meaningless BMI value if given a negative weight.

calculate_bmi(weight_kg = -50, height_m = 1.80)
[1] -15.4321

The BMI function with error handling returns an error if given a negative weight.

calculate_bmi_2(weight_kg = -50, height_m = 1.80)

The BMI function with error handling returns an warning if a BMI outside the normal range is calculated.

calculate_bmi_2(weight_kg = 25, height_m = 1.80)
Warning in calculate_bmi_2(weight_kg = 25, height_m = 1.8): The calculated BMI
is outside the normal range. Please check your input values.
[1] 7.716049