We will have you make some basic plots to get your started with ggplot again.
Make a scatter plot of Age and Blood Pressure Do you see a trend?
diabetes_glucose %>%ggplot(aes(x = BloodPressure, y = Age)) +geom_point()
Make a scatter plot of PhysicalActivity and BMI. Do you see a trend?
diabetes_glucose %>%ggplot(aes(x = PhysicalActivity, y = BMI)) +geom_point()
Now make the same two plots as above, and color by Diabetes. Do you see a trend?
diabetes_glucose %>%ggplot(aes(x = BloodPressure, y = Age, color = Diabetes)) +geom_point()
diabetes_glucose %>%ggplot(aes(x = PhysicalActivity, y = BMI, color = Diabetes)) +geom_point()
Plot a boxplot of BMI across Diabaetes. Give the plot a meaningful title.
diabetes_glucose %>%ggplot(aes(y = BMI, x = Diabetes, color = Diabetes)) +geom_boxplot() +labs(title ='Distribution of BMI Across Diabetes')
Plot a boxplot of PhysicalActivity across Smoker. Give the plot a meaningful title.
diabetes_glucose %>%ggplot(aes(y = PhysicalActivity, x = Smoker, fill = Smoker)) +geom_boxplot() +labs(title ='Distribution of Physical Activity Across Smoker Status')
Plotting - Part 2
For plotting the data inside the nested variable (glucose measurements) the data needs to be unnested.
Boxplots of glucose measurements for time 0 across Diabetes. Give the plot a meaningful title.
diabetes_glucose %>%unnest(OGTT) %>%mutate(Measurement = Measurement %>%as.factor()) %>%filter(Measurement ==0) %>%ggplot(aes(y =`Glucose (mmol/L)`, x = Diabetes, color = Diabetes)) +geom_boxplot() +labs(title ='Glucose Measurement for Time Point 0 (fasted)')
Boxplots of glucose measurements across time points and Diabetes. Give the plot a meaningful title.
diabetes_glucose %>%unnest(OGTT) %>%mutate(Measurement = Measurement %>%as.factor()) %>%ggplot(aes(y =`Glucose (mmol/L)`, x = Diabetes, color = Diabetes)) +geom_boxplot() +facet_wrap(vars(Measurement)) +labs(title ='Glucose Measurements for Time Point 0, 60, and 120')
Create a figure that visualizes glucose measurements at each time point (Measurement), stratified by patient ID. Give the plot a meaningful title.
diabetes_glucose %>%unnest(OGTT) %>%ggplot(aes(x = Measurement,y =`Glucose (mmol/L)`)) +geom_point(aes(color = Diabetes)) +geom_line(aes(group = ID, color = Diabetes)) +labs(title ='Glucose Measurements Across Time Points by Diabetes Status')
Calculate the mean glucose measure for each measurement.
This next exercise might be a bit more challenging. It requires multiple operations and might involve some techniques that were not explicitly shown in the presentations.
Recreate the plot you made in Exercise 10 and include the mean value for each glucose measurement for the two Diabetes statuses (0 and 1).
There are several ways to solve this task. There is a workflow suggestion: - The line in the plot is connected by ID. Create new IDs for the mean values that do not already exist in the dataset. Use RANDOM_ID %in% df$ID to check if an ID is already present. - Data from another dataset can be added to the plot like this: + geom_point(DATA, aes(x = VAR1, y = VAR2, group = VAR3))
The points in the plot is connected by ID. Let’s find two ID’s (one for Diabetes == 0 and another for Diabetes == 1) that are not in the data. We can use the same numbers as the Diabetes status.
0%in% diabetes_glucose$ID
[1] FALSE
1%in% diabetes_glucose$ID
[1] FALSE
Change the class of Measurement to factor and add ID to the glucose mean data set.