pivot_longer(cols = LIST_WITH_COLUMNS_TO_PIVOT,
names_to = "NEW_COLUMN_CONTAINING_COLUMN_NAMES",
values_to = "NEW_COLUMN_CONTAINING_COLUMN_VALUES")
Exercise 2 - solutions: Advanced Tidyveres / Pivot longer, wider, and nesting
Introduction
In this exercise you will do some more advance tidyverse operations such as pivoting and nesting, as well as create plots to brush up on your ggplot skills.
First steps
Load packages.
Load the joined diabetes data set you created in exercise 1 and the glucose dataset from the data folder.
Have a look at the glucose dataset. The
OGTT
column contains measurements from a Oral Glucose Tolerance Test where blood glucose is measured at fasting (Glucose_0), 6 hours after glucose intake (Glucose_6), and 12 hours after (Glucose_12).Restructure the glucose dataset into a long format. How many rows are there per ID? Does that make sense?
Remember the flow:
Have a look at slide 16 for a visual overview.
- Change the glucose measurements to numeric variable.
The stringr
packages is a part of tidyverse and has many functions for manipulating strings. Find a function that can split the string so you can extract the numbers on the other side of the underscore.
- Nest the glucose measurements and values such that there is only one row per ID.
Remember the flow:
group_by() %>%
nest() %>%
ungroup()
Merge the nested glucose dataset with the joined diabetes.
Pull the glucose measurements (
OGTT
) from your favorite ID.
First filter
for your favorite ID and then pull
the nested column.
Create a figure that visualizes glucose measurements at each time point (Measurement), stratified by patient ID. Give the plot a meaningful title.
Calculate the mean glucose measure for each measurement.
You will need to use unnest()
, group_by()
, and summerize()
.
- Make the same calculation and stratify on Diabetes as well.
This next exercise might be a bit more challenging. It requires multiple operations and might involve some techniques that were not explicitly shown in the presentations.
- Recreate the plot you made in Exercise 10 and include the mean value for each glucose measurement for the two Diabetes statuses (0 and 1).
There are several ways to solve this task. There is a workflow suggestion:
The line in the plot is connected by ID. Create new IDs for the mean values that do not already exist in the dataset. Use
RANDOM_ID %in% df$ID
to check if an ID is already present.Data from another dataset can be added to the plot like this:
+ geom_point(DATA, aes(x = VAR1, y = VAR2, group = VAR3))
- Export the final dataset. Since the dataset is nested, you cannot export it as an excel file. Export the dataset as an
.rds
file. Have a guess at what the function is called.