Using R

Author

Mark Andrews

Abstract

This guide covers the fundamentals of using R: running commands in the console, working with functions, storing results with assignment, chaining operations with the pipe operator, writing and running R scripts, and reading data from files. These are the building blocks for everything else in the course.

The console

The Console pane is R’s direct interface. Click in it, type a command after the > prompt, and press Enter. R evaluates the command and prints the result.

Start with arithmetic, which works exactly as you would expect:

2 + 2

[1] 4

5 - 10

[1] -5

2 * 3

[1] 6

4 / 6

[1] 0.6666667

10 ^ 2

[1] 100

Spacing is flexible — 2+2 and 2 + 2 give the same result. Brackets group expressions and R follows the standard order of operations:

(10 + 4) * (2 ^ 3) / (8 - 1)

[1] 16

Functions

Most of R’s capability comes from functions. A function takes input, does something with it, and returns a result. You call a function by writing its name followed by parentheses containing the input.

log(10)    # natural logarithm

[1] 2.302585

sqrt(25)   # square root

[1] 5

abs(-3.4)  # absolute value

[1] 3.4

Some functions take more than one argument:

log(10, 2)  # logarithm of 10 to base 2

[1] 3.321928

Some return more than one value:

rnorm(3)    # three random draws from a normal distribution

[1] -1.4107604 -0.2606666 -0.4340136

To find out what a function does, type ? followed by its name in the console — for example, ?log — and the Help pane opens the documentation.

Assignment

Results printed to the console disappear from memory as soon as the next command runs. To keep a result, assign it to a name using the arrow operator <-:

x <- 2 + 2

The result is stored silently. Type the name to print it:

[1] 4

Use the stored value in further calculations:

y <- x * 3
y

[1] 12

Any R expression can appear on the right of <-:

my_number  <- 10
log_result <- log(100)

Names can contain letters, digits, underscores, and periods, must start with a letter, and are case-sensitive. The convention in R is to use snake_case (all lowercase, words separated by underscores). Avoid names that match existing functions like mean or sum.

The pipe

The pipe operator |> passes the result of one expression directly into the next function. These two lines are equivalent:

log(42)

[1] 3.73767

42 |> log()

[1] 3.73767

The pipe becomes useful when several steps are chained together. Compare the nested form with the piped form for the same calculation:

log(sqrt(abs(log(42))))

[1] 0.6592312

42 |> log() |> abs() |> sqrt() |> log()

[1] 0.6592312

Read the piped version left to right: take 42, then log, then absolute value, then square root, then log again. The piped form is easier to write, read, and extend. It will appear throughout this course when chaining data operations together.

If you set up the native pipe shortcut in Global Options (see the R and RStudio guide), the shortcut Ctrl+Shift+M or Cmd+Shift+M inserts |>.

R scripts

The console is useful for quick experiments but results are not saved anywhere. An R script is a plain-text file that stores a sequence of commands you want to keep.

Creating and saving a script

Choose File > New File > R Script. A blank tab opens in the Editor pane. Type some commands:

x <- rnorm(50)
mean(x)
sd(x)

Save with Ctrl+S (or Cmd+S) and give the file a name ending in .R, for example my_analysis.R. If you are working inside an RStudio Project the file is saved into the project folder automatically.

Running code from a script

Ctrl+Enter (Cmd+Enter): runs the line the cursor is on, or the selected block, and moves to the next line.
Source (Ctrl+Shift+Enter): sends the entire file to R in one pass.

The typical workflow is to use Ctrl+Enter while writing and experimenting, and Source occasionally to confirm the whole script still runs cleanly from top to bottom.

Comments

Comments are notes to anyone reading the code — including your future self. In R, anything on a line after a # is ignored by R when running the code.

# generate 50 random values from a standard normal distribution
x <- rnorm(50)

mean(x)  # average of the sample
sd(x)    # standard deviation

Use comments to explain why you did something, not just to restate what the code does.

Sections

You can divide a script into named sections, which appear in the Document Outline and can be collapsed in the editor. Create a section by following a comment with several dashes:

# Load packages -----------------------------------------------------------

library(tidyverse)

# Read data ---------------------------------------------------------------

weight_df <- read_csv("weight.csv")

# Analysis ----------------------------------------------------------------

lm(weight ~ height, data = weight_df)

Insert a section quickly with Ctrl+Shift+R (Cmd+Shift+R).

The console versus the script

Think of the console as a scratch pad: good for trying things out, not for keeping results. Once a line of code proves useful in the console, paste it into the script, add a comment, and save. The script is the permanent record. Run the whole thing with Source any time you want to verify it works end to end.

Importing data

Most real analyses start by reading data from a file. The readr package, which loads as part of tidyverse, provides read_csv() for comma-separated files — the most common format.

First, load the packages:

library(tidyverse)

Place the data file in your project folder. Then read it:

weight_df <- read_csv("weight.csv")

read_csv() prints a short message showing how many rows were read and what column types were detected. This is normal feedback, not a warning.

To look at the data, just type its name:

weight_df

R shows the first ten rows and the column types. This rectangular object — rows of observations, columns of variables — is a data frame, or more precisely a tibble. It is the standard way to hold data in R, and almost everything in this course revolves around working with them.

The glimpse() function gives a compact summary of the data frame, with one line per column:

glimpse(weight_df)

Other file formats work similarly. For Excel: read_excel() from the readxl package. For SPSS: read_sav() from the haven package. For Stata: read_dta(), also from haven. Once the data is in a data frame, you work with it the same way regardless of its original format.