Exercise 3: ggplot2 - Solutions

Getting started

Before you proceed with the exercises in this document, make sure to run the command library(tidyverse) in order to load the core tidyverse packages (including ggplot2).

library(tidyverse)
library(readxl)

The data set used in these exercises, climate.xlsx1, was compiled from data downloaded in 2017 from the website of the UK’s national weather service, the Met Office.

The spreadsheet contains data from five UK weather stations in 2016. The following variables are included in the data set:

Variable name Explanation
station Location of weather station
year Year
month Month
af Days of air frost
rain Rainfall in mm
sun Sunshine duration in hours
device Brand of sunshine recorder / sensor

The data set is the same as the one used for the Tidyverse exercise. If you have already imported the data, there is no need to import it again, unless you have made changes to the data assigned to climate since the original data set was imported.

climate <- read_xlsx('../../Data/climate.xlsx')
head(climate)
# A tibble: 6 Γ— 7
  station  year month    af  rain   sun device         
  <chr>   <dbl> <dbl> <dbl> <dbl> <dbl> <chr>          
1 armagh   2016     1     5 132.   44.5 Campbell Stokes
2 armagh   2016     2    10  62.6  71.3 Campbell Stokes
3 armagh   2016     3     4  43.8 117.  Campbell Stokes
4 armagh   2016     4     5  54   140.  Campbell Stokes
5 armagh   2016     5     0  41.4 210.  Campbell Stokes
6 armagh   2016     6     0  75.1 114.  Campbell Stokes

Need a little help? Consult the ggplot2 cheatsheet here: https://rstudio.github.io/cheatsheets/data-visualization.pdf

Scatter plot I

  1. Make a scatter (point) plot of rain against sun.
ggplot(climate, 
       aes(x = rain,
           y = sun)) +
  geom_point()

  1. Color the points in the scatter plot according to weather station. Save the plot in an object.
ggplot(climate, 
       aes(x = rain,
           y = sun,
           color = station)) +
  geom_point()

  1. Add the segment + facet_wrap(vars(station)) to the saved plot object from above, and update the plot. What happens?
ggplot(climate, 
       aes(x = rain,
           y = sun,
           color = station)) +
  geom_point() + 
  facet_wrap(vars(station))

  1. Is it necessary to have a legend in the faceted plot? How can you remove this legend? Hint: try adding a theme() with legend.position = "none" inside it.
ggplot(climate, 
       aes(x = rain,
           y = sun,
           color = station)) +
  geom_point() + 
  facet_wrap(vars(station)) + 
  theme(legend.position = "none")

Graphic files

  1. Use ggsave(file="weather.jpeg") to remake the last ggplot as a jpeg-file and save it. The file will be saved on your working directory. Locate this file on your computer and open it.

  2. Use ggsave(file="weather.png", width=10, height=8, units="cm") to remake the last ggplot as a png-file and save it. What do the three other options do? Look at the help page ?ggsave to get an overview of the possible options.

Scatter plot II: error bars

  1. Calculate the average and standard deviation for sunshine in each month and save it to a table called summary_stats. You will need group_by and summarize. Recall how to do this from the tidyverse exercise.
summary_stats <- climate %>%
  group_by(month) %>% 
  summarize(sun_avg = mean(sun), 
            sun_sd = sd(sun))

head(summary_stats)
# A tibble: 6 Γ— 3
  month sun_avg sun_sd
  <dbl>   <dbl>  <dbl>
1     1    45.3   9.19
2     2    86.2  19.5 
3     3   113.   21.8 
4     4   160.   16.0 
5     5   193.   19.1 
6     6   130.   40.3 
  1. Make a scatter plot of the summary_stats with month on the x-axis, and the average number of sunshine hours on the y-axis.
p <- ggplot(summary_stats, 
       aes(x = month,
           y = sun_avg)) + 
  geom_point()

p

  1. Add error bars to the plot, which represent the average number of sunshine hours plus/minus the standard deviation of the observations. The relevant geom is called geom_errorbar.

Hint:

geom_errorbar(aes(ymin = sun_avg - sun_sd, ymax = sun_avg + sun_sd), width = 0.2)
mapping: ymin = ~sun_avg - sun_sd, ymax = ~sun_avg + sun_sd 
geom_errorbar: na.rm = FALSE, orientation = NA, width = 0.2
stat_identity: na.rm = FALSE
position_identity 
p <- p + geom_errorbar(aes(ymin = sun_avg - sun_sd, ymax = sun_avg + sun_sd), width = 0.2)

p

  1. How could make the plot with horizontal error bars instead? Tip: Think about which of the two variables, month and average sunshine hours, can meaningfully have an error.
p + coord_flip()

Line plot (also known as a spaghetti plot)

  1. Make a line plot (find the correct geom_ for this) of the rainfall observations over time (month), such observations from the same station are connected in one line. Put month on the x-axis. Color the lines according to weather station as well.
ggplot(climate, 
       aes(x = month,
           y = rain, 
           color = station)) + 
  geom_line()

  1. The month variable was read into R as a numerical variable. Convert this variable to a factor and make the scatter plot from 8 again. What has changed?
climate$month <- as.factor(climate$month)
str(climate)
tibble [60 Γ— 7] (S3: tbl_df/tbl/data.frame)
 $ station: chr [1:60] "armagh" "armagh" "armagh" "armagh" ...
 $ year   : num [1:60] 2016 2016 2016 2016 2016 ...
 $ month  : Factor w/ 12 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ af     : num [1:60] 5 10 4 5 0 0 0 0 0 0 ...
 $ rain   : num [1:60] 131.9 62.6 43.8 54 41.4 ...
 $ sun    : num [1:60] 44.5 71.3 117.3 139.7 209.6 ...
 $ device : chr [1:60] "Campbell Stokes" "Campbell Stokes" "Campbell Stokes" "Campbell Stokes" ...
p <- ggplot(climate, 
            aes(x = month,
            y = rain, 
            color = station,
            group = station)) + 
  geom_line()

p

group = station needs to be added when month is a factor.The plot now shows the individual months instead of showing them on a continuous scale.

  1. Use theme(legend.position = ???) to move the color legend to the top of the plot.
p <- p + theme(legend.position = 'top')

p

Layering

We can add several geoms to the same plot to show several things at once.

  1. (Re)Make the line plot of monthly rainfall and add geom_point() to it.
p <- p + geom_point()
p

  1. Now, add geom_hline(yintercept = mean(climate$rain), linetype = "dashed") at the end of your code for the line plot, and update the plot again. Have a look at the code again and understand what it does and how. What do you think β€˜h’ in hline stands for?

hline = horizontal line.

p <- p + geom_hline(yintercept = mean(climate$rain), 
                    linetype = "dashed")
p

  1. Finally, try adding the following code and update the plot. What changed? Replace X, Y, COL, and TITLE with some more suitable (informative) text.
labs(x = "X", y = "Y", color = "COL", title = "TITLE")
p <- p + labs(x = "Month", y = "Rain", color = "Staion", title = "Rainfall over month")
p

Box plot I

  1. Make a box plot of sunshine per weather station.
ggplot(climate, 
       aes(y = sun, 
           x = station)) + 
  geom_boxplot()

  1. Color the boxes according to weather station.
p <- ggplot(climate, 
            aes(y = sun, 
                x = station,
                fill = station)) + 
  geom_boxplot()

p

Box plot II - Aesthetics

There are many ways in which you can manipulate the look of your plot. For this we will use the boxplot you made in the exercise above.

  1. Add a different legend title with labs(fill = "Custom Title").
p <- p + labs(fill = "Station")
p

  1. Change the theme of the ggplot grid. Suggestions: theme_minimal(), theme_bw(), theme_dark(), theme_void().
p <- p + theme_minimal()
p

  1. Instead of automatically chosen colors, pick your own colors for fill = station by adding the scale_fill_manual() command. You will need five colors, one for each station. What happens if you choose too few colors?
p <- p + scale_fill_manual(values = c('magenta', 'pink1', 'deeppink', 'violet', 'hotpink'))
p 

  1. Change the boxplot to a violin plot. Add the sunshine observations as scatter points to the plot. Include a boxplot inside the violin plot with geom_boxplot(width=.1).
ggplot(climate, 
       aes(y = sun, 
           x = station,
           fill = station)) + 
  geom_violin() + 
  geom_point() + 
  geom_boxplot(width=.1)

Histogram

  1. Make a histogram (find the correct geom_ for this) of rain from the climate dataset. Interpret the plot, what does it show?

The plot shows the distribution of accumulated rainfall across stations and months. For most months, the rainfall is around 50 mm.

ggplot(climate,
       aes(x = rain)) + 
  geom_histogram()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

  1. R suggests that you choose a different number of bins/bin width for the histogram. Use binwidth = inside the histogram geom to experiment with different values of bin width. Look at how the histogram changes.
ggplot(climate,
       aes(x = rain)) + 
  geom_histogram(binwidth = 25)

ggplot(climate,
       aes(x = rain)) + 
  geom_histogram(binwidth = 3)

  1. Color the entire histogram. Here we are not coloring/filling according to any attribute, just the entire thing so the argument needs to be outside aes().
ggplot(climate,
       aes(x = rain)) + 
  geom_histogram(binwidth = 25, fill = 'hotpink')

Bar chart I

  1. Make a bar chart (geom_col()) which visualizes the sunshine hours per month. If you have not done so in question 13, convert month to a factor now and remake the plot.
ggplot(climate, 
       aes(x = month, 
           y = sun)) +
  geom_col()

  1. Color, i.e. divide the bars according to weather station.
ggplot(climate, 
       aes(x = month, 
           y = sun,
           fill = station)) +
  geom_col()

  1. For better comparison, place the bars for each station next to each other instead of stacking them.
ggplot(climate, 
       aes(x = month, 
           y = sun,
           fill = station)) +
  geom_col(position = 'dodge')

  1. Make the axis labels, legend title, and title of the plot more informative by customizing them like you did for the line plot above.
ggplot(climate, 
       aes(x = month, 
           y = sun,
           fill = station)) +
  geom_col(position = 'dodge') + 
  labs(x = 'Month', 
       y = 'Sunshine', 
       fill = 'Weather station',
       title = 'Sunshine over month')

Bar chart II: Sorting bars

  1. Make a new bar chart showing the (total) annual rainfall recorded at each weather station. You will need to calculate this first. The format we need is a dataframe with summed up rain data per station.
rain_summary <- climate %>% 
  group_by(station) %>% 
  summarize(rain_sum = sum(rain))

rain_summary
# A tibble: 5 Γ— 2
  station   rain_sum
  <chr>        <dbl>
1 armagh        737.
2 camborne     1147.
3 lerwick      1218.
4 oxford        658.
5 sheffield     788.
ggplot(rain_summary, 
       aes(x = station,
           y = rain_sum)) +
  geom_col()

  1. Sort the stations in accordance to rainfall, either ascending or descending. This was shown in the ggplot lecture. Sort your rain dataframe from the question above by sum, then re-arrange the factor-levels of the β€˜station’ as shown in the lecture.
# Arrange
rain_summary <- rain_summary %>% 
  arrange(desc(rain_sum)) 
  
# Change station to factor
rain_summary$station <- factor(rain_summary$station, 
                               levels = rain_summary$station)

# Plot 
p <- rain_summary %>% 
  ggplot(aes(x = station,
             y = rain_sum)) +
  geom_col()

p

  1. Add labels to each bar that state the sum of the rainfall. You can do this by adding the label keyword to the aes() and adding geom_label() to the plot. Just like geoms like geom_scatter look at the aes() for knowing what to plot on the x and y axis, geom_label looks at it to know what to use for labels.
p + geom_label(aes(label = sum(rain_sum)))

  1. Adjust the label positions so that the labels are positioned above the bars instead of inside them.
p + geom_label(aes(label = rain_sum), 
               position = position_nudge(y = 35))

  1. To alter size of figure in report: {r, fig.width=10, fig.height=10}

Footnotes

  1. Contains public sector information licensed under the Open Government Licence v3.0.β†©οΈŽ