2  Fundamental R Objects and Functions

See the video r.mp4 for a visual explanation of this document.

R is a powerful statistical computing language designed for efficient data analysis. It is free, open-source, and operates primarily on objects, which makes it flexible for data manipulation.

R can be downloaded from CRAN, and RStudio, the recommended integrated development environment (IDE), can be obtained from RStudio. Once installed, open RStudio to explore its interface.

2.1 Getting Help

To view the help documentation for a function, use either the ? operator or the help() function:

?setwd     # Opens help for setwd()
help("lm")  # Alternative way to get help for lm()

2.2 Libraries

Libraries in R provide additional functions and features. For example:

  • Syntax: library(packageName)

  • Example:

    library(ggplot2)  # Loads the ggplot2 package for data visualization

2.3 Reading in Data

There are many ways to read data into R. For example, to read data from a CSV file:

data <- read.csv("data/yourfile.csv")

Other packages such as readr, readxl, and openxlsx offer additional methods and options for reading various data formats.

2.4 Assigning Values to Objects

In R, you assign values to objects using <- (preferred) or =:

x <- 1               # Assigns the value 1 to x
msg <- "Hello, world!"  # Assigns a character string to msg
print(x)            # Prints the value of x

To check an object’s type, use the class() function:

class(x)   # Returns "numeric"
class(msg) # Returns "character"

2.5 Basic Computations

R supports basic arithmetic following standard mathematical precedence:

2 + 3 * 5      # Multiplication happens before addition
log(10)        # Natural logarithm of 10
4^2            # Exponentiation: 4 squared
sqrt(16)       # Square root of 16
abs(-7)        # Absolute value of -7

For integer division and modulus operations:

15 %/% 4  # Integer division: returns 3
15 %% 4   # Modulus: returns 3

2.6 Working with Vectors

Vectors store multiple values of the same type and are a fundamental data structure in R:

x <- c(1, 3, 2, 10, 5)  # Creates a numeric vector
y <- 1:5                # Creates a sequence from 1 to 5

Mathematical operations can be applied directly to vectors:

y + 2   # Adds 2 to each element
2 * y   # Multiplies each element by 2
x + y   # Adds corresponding elements of x and y
x * y   # Multiplies corresponding elements of x and y
x^y     # Raises each element of x to the power given in y

To extract specific elements from a vector:

x[2]      # Returns the second element
x[3:5]    # Returns elements from index 3 to 5
x[-2]     # Returns all elements except the second one
x[x > 3]  # Returns elements greater than 3

2.7 Working with Character Vectors

Character vectors store text data:

colours <- c("green", "blue", "orange", "yellow", "red")
colours[2]  # Returns "blue"
colours[5]  # Returns "red"

Note that character vectors do not support arithmetic operations.

2.8 Matrices

Matrices are two-dimensional arrays where all elements are of the same type:

x <- c(1, 3, 2, 10, 5)
y <- 1:5

m1 <- cbind(x, y)  # Column-binding creates a matrix
m2 <- rbind(x, y)  # Row-binding creates a matrix

Matrix operations include:

t(m1)       # Transposes the matrix
m1 + m2     # Adds corresponding elements
m1 %*% m2   # Matrix multiplication
solve(m1)   # Computes the inverse of a square matrix (if possible)

2.9 Lists

Lists can store elements of different types, making them flexible for various data:

my_list <- list(num = 1, vec = c(1, 2, 3), mat = matrix(1:6, nrow = 2))

Access elements of a list using the $ operator:

my_list$num  # Retrieves the numeric element
my_list$vec  # Retrieves the vector
my_list$mat  # Retrieves the matrix

2.10 Data Frames

Data frames store tabular data, similar to spreadsheets:

df <- data.frame(name = c("Alice", "Bob"), age = c(25, 30))

Extract data from a data frame:

df$name           # Extracts the "name" column
df[1, ]           # Returns the first row
df[df$age > 25, ] # Filters rows where age is greater than 25

2.11 Working with File Paths

R provides several functions to work with file paths, allowing you to build, explore, and manipulate directories and files in a platform-independent way. Here are some common functions along with examples and explanations:

2.11.1 Constructing File Paths with file.path()

  • Purpose:
    file.path() constructs file paths by combining directory and file names. It automatically uses the correct file separator for the operating system.

  • Example:

    # Combine folder, subfolder, and file name
    full_path <- file.path("data", "raw", "myfile.txt")
    print(full_path)

    This will produce a path like "data/raw/myfile.txt" on Unix-like systems or "data\\raw\\myfile.txt" on Windows.

2.11.2 Listing Files with list.files()

  • Purpose:
    list.files() returns a character vector of the file names in a specified directory. It can also be used to filter files by pattern.

  • Example:

    # List all files in the "data" directory
    files <- list.files(path = "data")
    print(files)
    
    # List only CSV files in the "data" directory
    csv_files <- list.files(path = "data", pattern = "\\.csv$")
    print(csv_files)

    This function is useful for dynamically accessing the contents of a directory.

2.11.2.1 Listing Directories with list.dirs()

  • Purpose:
    list.dirs() provides a recursive list of directories (subdirectories included) within a specified path.

  • Example:

    # List all directories within the "data" folder, including subdirectories
    directories <- list.dirs(path = "data")
    print(directories)

    This function helps you understand the structure of your file system under a particular directory.

2.11.2.2 Creating Directories with dir.create()

  • Purpose:
    dir.create() is used to create new directories. When creating nested directories (sub-directories that do not yet exist), you must set the recursive argument to TRUE.

  • Example:

    # Create a nested directory structure "data/raw"
    dir.create(path = file.path("data", "raw"), recursive = TRUE)

    Setting recursive = TRUE ensures that if the parent directory "data" does not exist, it will be created along with "raw".

2.11.2.3 Reading and Writing Text Files with readLines() and writeLines()

  • readLines():
    • Purpose:
      Reads text files line by line into a character vector, which is useful for processing or analyzing text.

    • Example:

      # Read the content of a text file into a vector of lines
      lines <- readLines("data/raw/myfile.txt")
      print(lines)
  • writeLines():
    • Purpose:
      Writes a character vector to a text file, with each element being written as a separate line.

    • Example:

      # Create a vector of text lines
      output_lines <- c("This is the first line.", "This is the second line.")
      
      # Write the vector to a file called "output.txt"
      writeLines(output_lines, con = "data/output/output.txt")

      These functions are ideal for simple text processing, such as reading logs, configuration files, or exporting simple reports.

2.12 Writing Functions

Functions in R allow you to encapsulate operations and reuse code. Define a function using function():

increment <- function(x) {
  x + 1  # Adds 1 to the input value and returns it
}

increment(3)  # Returns 4

Note that functions always return the last value computed:

increment <- function(x) {
  x + 2
  x + 1
}

increment(3)  # Returns 4

Functions can also take multiple arguments:

add_numbers <- function(x, y) {
  return(x + y)
}

add_numbers(2, 5)  # Returns 7

A common example is a function to calculate the hypotenuse of a right triangle (you will use this daily):

hypotenuse <- function(a, b) {
  sqrt(a^2 + b^2)  # Uses the Pythagorean theorem
}

hypotenuse(3, 4)  # Returns 5

2.13 Basic Statistical Functions

R includes built-in functions for summary statistics:

mean(df$age)     # Computes the mean
sd(df$age)       # Computes the standard deviation
summary(df)      # Provides a summary of the data frame

To create frequency tables:

table(df$name)  # Counts occurrences of each name