Exercise 1: Introduction

Author

HeaDS Data Science Lab, University of Copenhagen

Getting started

  1. Start RStudio.

  2. Make a new Quarto document. Go to the menu bar and click FileNew file…Quarto document…. Give the Quarto document a title that makes sense to you.

  3. Save your document. Go to the menu bar and click FileSave as….

  4. Check the working directory. Let’s check which working directory we are in with the getwd() function. When working with Quarto documents the working directory is always the same as where the file is located (where you saved it).

Write commands at the prompt

  1. Go to the R console (lower left window) and write a few commands, one at a time. You could for example try these commands:
6*12
x <- 100
x + 7

Notice how a new object, x, appears in the Global Environment window (upper right window) and can be used for new computations.

Commands written at the prompt are not saved for later use! It is fine to write commands that should never be used again at the prompt, and it is fine “to play at the prompt”, but in general you must organize your commands in R scripts (or Quarto files, which will be introduced in Lesson IV).

Working with Quarto Documents

To save your code, you need to store it in a file. There are several file formats available for this purpose, and in this course, we are using Quarto, the latest format in the code reproducibility domain.

  1. Create code chunk. In the Quarto document you just made, create a code chuck by pressing the green box with the white ‘C’ in the right of the panel.

  2. Write code. Write a command in the code chunk, similar to one of those from above.

  3. Run code. Click on the green button in the top right of the code chunk - or simply type Ctrl + Enter/option + Enter. Then the command is transferred to the prompt in the console and executed, just as if you had written the command directly at the prompt. Try with a new command in a new line.

  4. Create a comment. Put a hashtag (#) before one of the commands in the code chunk and run it. Nothing happens! Hence, you can use hashtags for writing comments in your code chunks.

  5. Save your files often. It happens that you ask R for something so weird that it shuts down, and then it is a pity to have lost your work.

R Packages

R is born with a lot of functionalities, but the enormous community of R users also contributes to R all the time by developing code and sharing it in R packages. An R package is simply a collection of R functions and/or datasets (including documentation). As of September 20, 2024, there are 21,361 packaged available at the CRAN repository (and there are many other repositories).

An R package needs be installed and loaded before you can use its functionalities. You only have to install a package once (until you re-install R), whereas you have to load it in every R session you want to use it. As an example, let’s install the package ContaminatedMixt.

  1. Install. Choose one of the two installation methods:
  1. Using the command line:
install.packages("ContaminatedMixt")

Note that for this approach you need to know the name of the package and spell it correctly, including capitalization!

  1. Using the graphical interface:

    Look at lower right of your Rstudio window where you have a window with several tabs. Click on Packages. You will see a list of your currently installed packages and their versions. To install ContaminatedMixt, click on Install and start to type the name. You will notice a drop down list appears from which you can select the correct package. This is useful if you are not quite sure of the correct spelling.

A lot of red text will be written in the console while the installation goes on. This usually does not mean there was a problem, unless the text reads ‘error’ or ‘exit’. In the end, the package is installed, or you will see an explanation of what went wrong.

  1. Loading. Load the package you just installed with the command library(ContaminatedMixt). If everything went well, you should now able to run the command data(package = 'ContaminatedMixt') which will show you an overview of the datasets included in the package.

Basic R commands

We will now have a look at basic commands in R that you just learned about. We will use an in-build dataset of R called trees. Copy-paste the commands into your document or write them yourself to get a hang of the coding! Start thinking about the structure of you document and when to create new code chunks.

  1. To start with, load the tidyverse package.
library(tidyverse)
  1. Now, load the trees dataset by copying and executing the following commands:
data("trees")
view(trees)
?trees
  1. Check what information is contained in the trees dataset by calling the summary function on it:
summary(trees)
  1. How many trees do we have data for?

  2. Now, extract the column called Volume from the dataset and assign it to a new variable called volume_col.

  3. Display the volume of the first 10 trees.

  4. Find the minimum, maximum and mean volume from the column you just extracted. Does it match what was stated in the summary?

  5. What class of data structure is trees? Make it into a tibble.

Making some graphics

Next, let’s also try to make some plots using the base graphics system in R (in Presentation III you will learn to make graphics using the ggplot2-package). We will continue to use the trees dataset.

  1. The dataset trees contains 31 observations of 3 variables (diameter, height and volume of black cherry trees). Insert the following two commands in a code chunk, and execute them to make two plots.
plot(Volume~Height, data=trees)
plot(Volume~Height, data=trees, log="y")

Shut down R

  1. Close RStudio. If changes were made either to the Quarto document (shown in the upper-left window) or to the workspace (called ‘Global Environment’ in the upper-right window) you will be prompted if you like to save those. Answer Yes to that; in particular it is important that you save your documents as they contain all relevant commands to reproduce your output and plots.

  2. Reopen file. Locate the Quarto document that you saved in question 3 (and question 10!). Start RStudio again, and open the file (via the File menu). Check that you can run it again.

Remark - You can also start RStudio by double clicking on a file with the .qmd extension (or other R-readable formats). One advantage of this is that the workspace and the R history will be saved to the same folder as the Quarto document when you later close RStudio, since the working directory will be set to the map where the file is located.

Getting help in R

Every R function comes with a help page, that gives a brief description of the function and describes its usage (input/arguments and output/value). Let’s use the function median() as example. It is, no surprise, computing the median from a vector of numbers.

  1. Try these commands:
x <- c(1, 3, 8, 9, 100, NA)
x
median(x)

The first command defines a vector with six elements, but where the last number is missing (NA = Not Available). Since the last number is missing, median returns NA. However, could we make median find the median of the remaining numbers. Perhaps the help page can help out!

  1. Look at the help for the median function:
?median

The help page for median appears in the lower right window. If we read it carefully, then we realize that the extra argument (input) na.rm may help us. We therefore try this:

median(x, na.rm=TRUE)

Admittedly, R help pages are often quite difficult to read, but be aware that there are examples of commands in the bottom of each help page. For more complicated functions, these examples can be very useful while trying to get to know the function and its functionalities.

In order to use the help pages as above, you need to know the name of the function, which obviously may not be the case: You want to compute the median but have no idea what function to use. The best way to proceed: Google! Use “R whatever-you-want-to-search-for”, and you often get exactly what you need.

While working with R, you will get a lot of error messages. Some are easy to understand, and you will readily be able to fix the problems, while others… Again, the best answer is: Google and ChatGPT! Copy the error message into Google or ChatGPT, and you will often find help.

Wrapping up

  1. Imagine you need to send your code to a collaborator. Review your code to ensure it is clear and well-structured, so your collaborator can easily understand and follow your work. You might consider emphasizing important points by making text bold, underlined or blue.
  2. Render your Quarto document by clicking the Render button with the blue arrow in the toolbar. Once the HTML file is generated, open it in a web browser. Does the document look as you expected? Iterate on your document until it meets your desired format and appearance.

Lessons learnt

  • You must write you code in a document, like the Quarto document, such that you can save the work and return to it some other day.

  • Save your files often.

  • If you forget to save your files before you close RStudio, then RStudio will prompt you if you want to save your work.

  • Have structure in your document by using headers, text, and code chunks (maybe with comments).