R Basics

Modified

May 27, 2024

R (R Core Team 2024) is a programming language oriented to statistical computing. R has become the de facto programming language in the social network community due to the large number of packages available for network analysis. R packages are collections of functions, data, and documentation that extend R. A good reference book for both novice and advanced users is “The Art of R programming” Matloff (2011)1.

Getting R

You can get R from the Comprehensive R Archive Network website [CRAN] (link). CRAN is a network of servers worldwide that store identical, up-to-date versions of code and documentation for R. CRAN website also has a lot of information about R, including manuals, FAQs, and mailing lists.

Although R comes with a Graphical User Interface [GUI], I recommend getting an alternative like RStudio or VSCode. RStudio and VSCode are excellent companions for programming in R. While RStudio is more common among R users, VSCode is a more general-purpose IDE that can be used for many other programming languages, including Python and C++.

How to install packages

Nowadays, there are two ways of installing R packages (that I’m aware of), either using install.packages, which is a function shipped with R, or using the devtools R package to install a package from some remote repository other than CRAN, here are a few examples:

# This will install the igraph package from CRAN
> install.packages("netdiffuseR")

# This will install the bleeding-edge version from the project's GitHub repo!
> devtools::install_github("USCCANA/netdiffuseR")

The first one, using install.packages, installs the CRAN version of netdiffuseR, whereas the line of code installs whatever version is published on https://github.com/USCCANA/netdiffuseR, which is usually called the development version.

In some cases, users may want/need to install packages from the command line as some packages need extra configuration to be installed. But we won’t need to look at it now.

A gentle Quick n’ Dirty Introduction to R

Some common tasks in R

  1. Getting help (and reading the manual) is THE MOST IMPORTANT thing you should know about. For example, if you want to read the manual (help file) of the read.csv function, you can type either of these:

    ?read.csv
    ?"read.csv"
    help(read.csv)
    help("read.csv")

    If you are not fully aware of what is the name of the function, you can always use the fuzzy search

    help.search("linear regression")
    ??"linear regression"
  2. In R, you can create new objects by either using the assign operator (<-) or the equal sign =, for example, the following two are equivalent: r a <- 1 a = 1 Historically, the assign operator is the most commonly used.

  3. R has several types of objects. The most basic structures in R are vectors, matrix, list, data.frame. Here is an example of creating several of these (each line is enclosed with parenthesis so that R prints the resulting element):

    (a_vector     <- 1:9)
    [1] 1 2 3 4 5 6 7 8 9
    (another_vect <- c(1, 2, 3, 4, 5, 6, 7, 8, 9))
    [1] 1 2 3 4 5 6 7 8 9
    (a_string_vec <- c("I", "like", "netdiffuseR"))
    [1] "I"           "like"        "netdiffuseR"
    (a_matrix     <- matrix(a_vector, ncol = 3))
         [,1] [,2] [,3]
    [1,]    1    4    7
    [2,]    2    5    8
    [3,]    3    6    9
    # Matrices can be of strings too
    (a_string_mat <- matrix(letters[1:9], ncol=3)) 
         [,1] [,2] [,3]
    [1,] "a"  "d"  "g" 
    [2,] "b"  "e"  "h" 
    [3,] "c"  "f"  "i" 
    # The `cbind` operator does "column bind"
    (another_mat  <- cbind(1:4, 11:14)) 
         [,1] [,2]
    [1,]    1   11
    [2,]    2   12
    [3,]    3   13
    [4,]    4   14
    # The `rbind` operator does "row bind"
    (another_mat2 <- rbind(1:4, 11:14))
         [,1] [,2] [,3] [,4]
    [1,]    1    2    3    4
    [2,]   11   12   13   14
    (a_string_mat <- matrix(letters[1:9], ncol = 3))
         [,1] [,2] [,3]
    [1,] "a"  "d"  "g" 
    [2,] "b"  "e"  "h" 
    [3,] "c"  "f"  "i" 
    (a_list       <- list(a_vector, a_matrix))
    [[1]]
    [1] 1 2 3 4 5 6 7 8 9
    
    [[2]]
         [,1] [,2] [,3]
    [1,]    1    4    7
    [2,]    2    5    8
    [3,]    3    6    9
    # same but with names!
    (another_list <- list(my_vec = a_vector, my_mat = a_matrix)) 
    $my_vec
    [1] 1 2 3 4 5 6 7 8 9
    
    $my_mat
         [,1] [,2] [,3]
    [1,]    1    4    7
    [2,]    2    5    8
    [3,]    3    6    9
    # Data frames can have multiple types of elements; it
    # is a collection of lists
    (a_data_frame <- data.frame(x = 1:10, y = letters[1:10]))
        x y
    1   1 a
    2   2 b
    3   3 c
    4   4 d
    5   5 e
    6   6 f
    7   7 g
    8   8 h
    9   9 i
    10 10 j
  4. Depending on the type of object, we can access its components using indexing:

    # First 3 elements
    a_vector[1:3]
    [1] 1 2 3
    # Third element
    a_string_vec[3]
    [1] "netdiffuseR"
    # A sub matrix
    a_matrix[1:2, 1:2]
         [,1] [,2]
    [1,]    1    4
    [2,]    2    5
    # Third column
    a_matrix[,3]
    [1] 7 8 9
    # Third row
    a_matrix[3,]
    [1] 3 6 9
    # First 6 elements of the matrix. R stores matrices
    # by column.
    a_string_mat[1:6]
    [1] "a" "b" "c" "d" "e" "f"
    # These three are equivalent
    another_list[[1]]
    [1] 1 2 3 4 5 6 7 8 9
    another_list$my_vec
    [1] 1 2 3 4 5 6 7 8 9
    another_list[["my_vec"]]
    [1] 1 2 3 4 5 6 7 8 9
    # Data frames are just like lists
    a_data_frame[[1]]
     [1]  1  2  3  4  5  6  7  8  9 10
    a_data_frame[,1]
     [1]  1  2  3  4  5  6  7  8  9 10
    a_data_frame[["x"]]
     [1]  1  2  3  4  5  6  7  8  9 10
    a_data_frame$x
     [1]  1  2  3  4  5  6  7  8  9 10
  5. Control-flow statements

    # The oldfashion forloop
    for (i in 1:10) {
      print(paste("I'm step", i, "/", 10))
    }
    [1] "I'm step 1 / 10"
    [1] "I'm step 2 / 10"
    [1] "I'm step 3 / 10"
    [1] "I'm step 4 / 10"
    [1] "I'm step 5 / 10"
    [1] "I'm step 6 / 10"
    [1] "I'm step 7 / 10"
    [1] "I'm step 8 / 10"
    [1] "I'm step 9 / 10"
    [1] "I'm step 10 / 10"
    # A nice ifelse
    
    for (i in 1:10) {
    
      if (i %% 2) # Modulus operand
        print(paste("I'm step", i, "/", 10, "(and I'm odd)"))
      else
        print(paste("I'm step", i, "/", 10, "(and I'm even)"))
    
    }
    [1] "I'm step 1 / 10 (and I'm odd)"
    [1] "I'm step 2 / 10 (and I'm even)"
    [1] "I'm step 3 / 10 (and I'm odd)"
    [1] "I'm step 4 / 10 (and I'm even)"
    [1] "I'm step 5 / 10 (and I'm odd)"
    [1] "I'm step 6 / 10 (and I'm even)"
    [1] "I'm step 7 / 10 (and I'm odd)"
    [1] "I'm step 8 / 10 (and I'm even)"
    [1] "I'm step 9 / 10 (and I'm odd)"
    [1] "I'm step 10 / 10 (and I'm even)"
    # A while
    i <- 10
    while (i > 0) {
      print(paste("I'm step", i, "/", 10))
      i <- i - 1
    }
    [1] "I'm step 10 / 10"
    [1] "I'm step 9 / 10"
    [1] "I'm step 8 / 10"
    [1] "I'm step 7 / 10"
    [1] "I'm step 6 / 10"
    [1] "I'm step 5 / 10"
    [1] "I'm step 4 / 10"
    [1] "I'm step 3 / 10"
    [1] "I'm step 2 / 10"
    [1] "I'm step 1 / 10"
  6. R has a compelling set of pseudo-random number generation functions. In general, distribution functions have the following name structure:

    1. Random Number Generation: r[name-of-the-distribution], e.g., rnorm for normal, runif for uniform.
    2. Density function: d[name-of-the-distribution], e.g. dnorm for normal, dunif for uniform.
    3. Cumulative Distribution Function (CDF): p[name-of-the-distribution], e.g., pnorm for normal, punif for uniform.
    4. Inverse (quantile) function: q[name-of-the-distribution], e.g. qnorm for the normal, qunif for the uniform.

    Here are some examples:

    # To ensure reproducibility
    set.seed(1231)
    
    # 100,000 Unif(0,1) numbers
    x <- runif(1e5)
    hist(x)

    # 100,000 N(0,1) numbers
    x <- rnorm(1e5)
    hist(x)

    # 100,000 N(10,25) numbers
    x <- rnorm(1e5, mean = 10, sd = 5)
    hist(x)

    # 100,000 Poisson(5) numbers
    x <- rpois(1e5, lambda = 5)
    hist(x)

    # 100,000 rexp(5) numbers
    x <- rexp(1e5, 5)
    hist(x)

    More distributions are available at ??Distributions.

For a nice intro to R, take a look at “The Art of R Programming” by Norman Matloff. For more advanced users, take a look at “Advanced R” by Hadley Wickham.


  1. Here a free pdf version distributed by the author.↩︎