Exercise 4 - Scripting in R

In this exercise you will practice your scripting.

Getting started

Load libaries and data

library(tidyverse)
library(glue)
diabetes_glucose <- read_rds('../out/diabetes_glucose.rds')
diabetes_glucose
# A tibble: 490 × 12
   ID    Sex      Age BloodPressure   BMI PhysicalActivity Smoker  Diabetes
   <fct> <chr>  <dbl>         <dbl> <dbl>            <dbl> <chr>   <chr>   
 1 9046  Male      34            84  24.7               93 Unknown 0       
 2 51676 Male      25            74  22.5              102 Unknown 0       
 3 60182 Male      50            80  34.5               98 Unknown 1       
 4 1665  Female    27            60  26.3               82 Never   0       
 5 56669 Male      35            84  35                 58 Smoker  1       
 6 53882 Female    31            78  43.3               59 Smoker  1       
 7 10434 Male      52            86  33.3               58 Never   1       
 8 27419 Female    54            78  35.2               74 Former  1       
 9 60491 Female    41            90  39.8               67 Smoker  1       
10 12109 Female    36            82  30.8               81 Smoker  1       
# ℹ 480 more rows
# ℹ 4 more variables: Serum_ca2 <dbl>, Married <chr>, Work <chr>, OGTT <list>

If-else statements

In these exercises we don’t use the dataframe yet, that comes later when we have loops. For this part, just declare variables to test your statements, e.g. bp <- 120.

  1. Write an if-else statement that prints whether a person has high (more than 100), low (lower than 50) or normal blood pressure (between 50 and 100).

  2. Write an if-else statement that assigns people high, moderate or low diabetes risk based on their genetic risk score and BMI:

  • genetic Risk greater than 1 and BMI greater than 35 -> high risk

  • genetic Risk greater than 1 or BMI greater than 35 -> moderate risk

  • otherwise low risk

Verify that your statement works for different combinations of risk score and BMI

Loops

  1. Create a vector with at least 5 elements and loop over it.

  2. Loop over all column names of diabetes_glucose.

colnames(df) creates a vector of column names.

  1. Loop over all rows of diabetes_glucose and determine whether the person’s blood pressure is high, low or normal with the same conditions as in 1.

  2. Loop over all rows of diabetes_glucose and determine the risk based on genetic risk score and BMI, with the same conditions as in 2. Print the genetic risk score and BMI as well as the risk level to make it easier to see whether your code works correctly.

An easy way to printing several variables is to pass a vector into print: print(c(this,and_that,and_this_too))

User defined Functions

In this part we will write some functions that create plots.

Since we want to be able to pass the name of the column to plot as a variable we will need to use the syntax for aliased column names. We showed how to do that in the end of presentation 3 if you need a refresher.

  1. Create a variable plot_column and assign “Age” to it. Now make a boxplot of that column. Switch plot_column to a different column in diabetes_glucose. Does it work?

  2. Wrap your code for the boxplot into a function. The function should take two arguments: the dataframe to use and the name of the column to plot. Test your function. Add some customization to the plot like a theme or colors.

Functions are good at returning objects so make your plot into an object and return that.

  1. Add a check to your function whether the supplied column is numeric. Note here that you need to test the data type of the column you want to plot, not the data type of it’s name. Confirm that your check works.

  2. Write code to apply your boxplot function to each numerical column in the dataframe. There are different ways to achieve this.

  3. Create an R script file to contain your functions. Copy your functions there and remove them from your global environment with rm(list="name_of_your_function"). Now source the function R script in your quarto document and test that the functions work.


Extra exercises

First, unnest diabetes_glucose so you get back the Measurement and Glucose columns.

diabetes_glucose_unnest <- diabetes_glucose %>%
  unnest(OGTT)

e1. Calculate the mean Glucose (mmol/L) for each measuring time point (i.e. one value for 0, 60 and 120). Now stratify this mean by a second variable, Sex. You should have 6 mean values since there are 6 groups (0_female, 0_male, 60_female, ect). Now, create a variable category to which you pass the name of the column to stratify by (e.g. category <- 'Sex') and use category in your code instead of the literal variable name.

e2. We would like to make a plot that shows the means you calculated above. Again, use your category variable instead of the literal column name.

e3. Wrap the code from e1 and e2 into a function show_mean_by_catergory so that you can call: show_mean_by_catergory(diabetes_glucose_unnest, 'Sex') and it will make you the plot. Test with different columns.