library(tidyverse)
library(glue)
Exercise 4 - Scripting in R
In this exercise you will practice your scripting.
Getting started
Load libaries and data
<- read_rds('../out/diabetes_glucose.rds')
diabetes_glucose diabetes_glucose
# A tibble: 490 × 12
ID Sex Age BloodPressure BMI PhysicalActivity Smoker Diabetes
<fct> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 9046 Male 34 84 24.7 93 Unknown 0
2 51676 Male 25 74 22.5 102 Unknown 0
3 60182 Male 50 80 34.5 98 Unknown 1
4 1665 Female 27 60 26.3 82 Never 0
5 56669 Male 35 84 35 58 Smoker 1
6 53882 Female 31 78 43.3 59 Smoker 1
7 10434 Male 52 86 33.3 58 Never 1
8 27419 Female 54 78 35.2 74 Former 1
9 60491 Female 41 90 39.8 67 Smoker 1
10 12109 Female 36 82 30.8 81 Smoker 1
# ℹ 480 more rows
# ℹ 4 more variables: Serum_ca2 <dbl>, Married <chr>, Work <chr>, OGTT <list>
If-else statements
In these exercises we don’t use the dataframe yet, that comes later when we have loops. For this part, just declare variables to test your statements, e.g. bp <- 120
.
Write an if-else statement that prints whether a person has high (more than 100), low (lower than 50) or normal blood pressure (between 50 and 100).
Write an if-else statement that assigns people high, moderate or low diabetes risk based on their genetic risk score and BMI:
genetic Risk greater than 1 and BMI greater than 35 -> high risk
genetic Risk greater than 1 or BMI greater than 35 -> moderate risk
otherwise low risk
Verify that your statement works for different combinations of risk score and BMI
Loops
Create a vector with at least 5 elements and loop over it.
Loop over all column names of diabetes_glucose.
colnames(df)
creates a vector of column names.
Loop over all rows of diabetes_glucose and determine whether the person’s blood pressure is high, low or normal with the same conditions as in 1.
Loop over all rows of diabetes_glucose and determine the risk based on genetic risk score and BMI, with the same conditions as in 2. Print the genetic risk score and BMI as well as the risk level to make it easier to see whether your code works correctly.
An easy way to printing several variables is to pass a vector into print: print(c(this,and_that,and_this_too))
User defined Functions
In this part we will write some functions that create plots.
Since we want to be able to pass the name of the column to plot as a variable we will need to use the syntax for aliased column names. We showed how to do that in the end of presentation 3 if you need a refresher.
Create a variable
plot_column
and assign “Age” to it. Now make a boxplot of that column. Switchplot_column
to a different column indiabetes_glucose
. Does it work?Wrap your code for the boxplot into a function. The function should take two arguments: the dataframe to use and the name of the column to plot. Test your function. Add some customization to the plot like a theme or colors.
Functions are good at returning objects so make your plot into an object and return that.
Add a check to your function whether the supplied column is numeric. Note here that you need to test the data type of the column you want to plot, not the data type of it’s name. Confirm that your check works.
Write code to apply your boxplot function to each numerical column in the dataframe. There are different ways to achieve this.
Create an R script file to contain your functions. Copy your functions there and remove them from your global environment with
rm(list="name_of_your_function")
. Now source the function R script in your quarto document and test that the functions work.
Extra exercises
First, unnest diabetes_glucose so you get back the Measurement and Glucose columns.
<- diabetes_glucose %>%
diabetes_glucose_unnest unnest(OGTT)
e1. Calculate the mean Glucose (mmol/L)
for each measuring time point (i.e. one value for 0, 60 and 120). Now stratify this mean by a second variable, Sex. You should have 6 mean values since there are 6 groups (0_female, 0_male, 60_female, ect). Now, create a variable category
to which you pass the name of the column to stratify by (e.g. category <- 'Sex'
) and use category
in your code instead of the literal variable name.
e2. We would like to make a plot that shows the means you calculated above. Again, use your category
variable instead of the literal column name.
e3. Wrap the code from e1 and e2 into a function show_mean_by_catergory
so that you can call: show_mean_by_catergory(diabetes_glucose_unnest, 'Sex')
and it will make you the plot. Test with different columns.