#your packages here
Exercise 1: Data Cleanup (Base R and Tidyverse)
In this exercise you will practice your R skills by loading, inspecting and cleaning a dataset. You can use base R and/or tidyverse to solve the exercises, it is up to you.
Getting started
- Load the packages you think you will need. There is no need to spend too much time on this part. If you later realize you need another package just add it here and re-run the chunk.
- Load in the data set
diabetes_clinical_toy_messy.xlsx
.
Explore the data
How many missing values (NA’s) are there in each column?
Check the distribution of each of the variables. Consider that the variables are of different classes. Do any values strike you as odd?
Clean up the data
Now that we have had a look at the data, it is time to correct fixable mistakes and remove observations that cannot be corrected.
Consider the following:
What should we do with the rows that contain NAs? Do we remove them or keep them?
Which mistakes in the data can be corrected, and which cannot?
Are there zeros in the data? Are they true zeros or errors?
Do you want to change any of the data types of the variables?
- Clean the data according to your considerations.
Have a look at BloodPressure
, BMI
, Sex
, Diabetes
and ID
.
Meta Data
There is some metadata to accompany the dataset you have just cleaned in diabetes_meta_toy_messy.csv
. This is a csv file, not an excel sheet, so you need to use the read_delim
function to load it. Load in the dataset and inspect it.
- Now clean the metadata and do data exploration by repeating step 3-5 from above.
Join the datasets
We will combine both datasets together into one tibble.
- Consider what variable the datasets should be joined on.
The joining variable must be the same type in both datasets.
Join the datasets by the variable you selected above.
How many rows does the joined dataset have? Explain why.
Because we used left_join, only the IDs that are in diabetes_clinical_clean
are kept.
- Export the joined dataset. Think about which directory you want to save the file in.