• 1 The Art of Benchmarking
    • 1.1 Introduction
    • 1.2 Benchmarking
      • 1.2.1 Comparing read times of CSV and RDS files
      • 1.2.2 Operational differences: “<-” and “=”
      • 1.2.3 Elapsed time
      • 1.2.4 Relative time
    • 1.3 How good is your machine?
  • 2 Fine Tuning: Efficient Base R
  • 3 Diagnosing Problems: Code Profiling
  • 4 Turbo Charged Code: Parallel Programming

1 The Art of Benchmarking

1.1 Introduction

1.1.1 R version

One of the relatively easy optimizations available is to use an up-to-date version of R. In general, R is very conservative, so upgrading doesn’t break existing code. However, a new version will often provide free speed boosts for key functions.

The version command returns a list that contains (among other things) the major and minor version of R currently being used.

# Print the R version details using version
Warning messages:
1: In readChar(file, size, TRUE) : truncating string with embedded nuls
2: In readChar(file, size, TRUE) : truncating string with embedded nuls
3: In readChar(file, size, TRUE) : truncating string with embedded nuls
version
               _                           
platform       x86_64-pc-linux-gnu         
arch           x86_64                      
os             linux-gnu                   
system         x86_64, linux-gnu           
status                                     
major          4                           
minor          0.0                         
year           2020                        
month          04                          
day            24                          
svn rev        78286                       
language       R                           
version.string R version 4.0.0 (2020-04-24)
nickname       Arbor Day                   
# Assign the variable major to the major component
major <- version$major

# Assign the variable minor to the minor component
minor <- version$minor

Keeping R up to date also makes it easier to get help when you are stuck.

1.2 Benchmarking

1.2.1 Comparing read times of CSV and RDS files

One of the most common tasks we perform is reading in data from CSV files. However, for large CSV files this can be slow. One neat trick is to read in the data and save as an R binary file (rds) using saveRDS(). To read in the rds file, we use readRDS().

Note: Since rds is R’s native format for storing single objects, you have not introduced any third-party dependencies that may change in the future.

To benchmark the two approaches, you can use system.time(). This function returns the time taken to evaluate any R expression. For example, to time how long it takes to calculate the square root of the numbers from one to ten million, you would write the following:

system.time(sqrt(1:1e7))
   user  system elapsed 
  0.154   0.047   0.202 
# How long does it take to read movies from CSV?
system.time(read.csv("movies.csv"))
   user  system elapsed 
  0.284   0.004   0.289 
# How long does it take to read movies from RDS?
system.time(readRDS("movies.rds"))
   user  system elapsed 
  0.051   0.000   0.052 

Reading in RDS files are much quicker than reading in CSV files.

1.2.2 Operational differences: “<-” and “=”

There are a number of ways to assign variables to objects. The two standard ways are to use the = or <- operators. Using the <- operator inside a function call will create a new (or overwrite an existing) object.

This makes ‘<-’ useful inside system.time() since we can store the result.

1.2.3 Elapsed time

Using system.time() is convenient, but it does have its drawbacks when comparing multiple function calls. The microbenchmark package solves this problem with the microbenchmark() function.

# Load the microbenchmark package
library(microbenchmark)

# Compare the two functions
compare <- microbenchmark(read.csv("movies.csv"), 
                          readRDS("movies.rds"), 
                          times = 10)

# Print compare
compare
Unit: milliseconds
ABCDEFGHIJ0123456789
min
<dbl>
lq
<dbl>
mean
<dbl>
median
<dbl>
uq
<dbl>
max
<dbl>
neval
<dbl>
283.70181287.35011299.52388290.25690297.31405376.5994910
50.7212550.8198652.1039851.1619552.3181958.7141710

The microbenchmark() function makes it easier to compare multiple function calls at once by compiling all the relevant information in a data frame. It does this by running each function call multiple times, recording the time it took for the function to run each time, then computing summary statistics for each expression as you can see here.

1.2.4 Relative time

When benchmarking it’s important to consider both absolute and relative times. Using the timings below, on average a single call to read.csv() is 720 - 80 = 640 milliseconds slower than that of readRDS(). Aprox, 9 times slower.

1.3 How good is your machine?

1.3.1 DataCamp hardware

For many problems your time is the expensive part. If having a faster computer makes you more productive, it can be cost effective to buy one. However, before you splash out on new toys for yourself, your boss/partner may want to see some numbers to justify the expense. Measuring the performance of your computer is called benchmarking, and you can do that with the benchmarkme package.

# Load the benchmarkme package
library(benchmarkme)

# Assign the variable ram to the amount of RAM on this machine
ram <- get_ram()
ram
32.6 GB
# Assign the variable cpu to the cpu specs
cpu <- get_cpu()
cpu
$vendor_id
[1] "GenuineIntel"

$model_name
[1] "Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz"

$no_of_cores
[1] 16

1.3.2 Benchmark DataCamp’s machine

The benchmarkme package allows you to run a set of standardized benchmarks and compare your results to other users. One set of benchmarks tests is reading and writing speeds.

The function call

res = benchmark_io(runs = 1, size = 5)

records the length of time it takes to read and write a 5MB file.

# Run the io benchmark
res <- benchmark_io(runs = 1, size = 5)
Preparing read/write io
# IO benchmarks (2 tests) for size 5 MB:
     Writing a csv with 625000 values: 0.57 (sec).
     Reading a csv with 625000 values: 0.228 (sec).
# Plot the results
plot(res)
You are ranked 15 out of 135 machines.

# Run each benchmark 3 times
Warning message:
In readChar(file, size, TRUE) : truncating string with embedded nuls
res <- benchmark_std(runs = 3)
# Programming benchmarks (5 tests):
    3,500,000 Fibonacci numbers calculation (vector calc): 0.603 (sec).
    Grand common divisors of 1,000,000 pairs (recursion): 0.772 (sec).
    Creation of a 3,500 x 3,500 Hilbert matrix (matrix calc): 0.424 (sec).
    Creation of a 3,000 x 3,000 Toeplitz matrix (loops): 1.25 (sec).
    Escoufier's method on a 60 x 60 matrix (mixed): 0.924 (sec).
# Matrix calculation benchmarks (5 tests):
    Creation, transp., deformation of a 5,000 x 5,000 matrix: 0.673 (sec).
    2,500 x 2,500 normal distributed random matrix^1,000: 0.501 (sec).
    Sorting of 7,000,000 random values: 0.793 (sec).
    2,500 x 2,500 cross-product matrix (b = a' * a): 1.27 (sec).
    Linear regr. over a 5,000 x 500 matrix (c = a \ b'): 0.124 (sec).
# Matrix function benchmarks (5 tests):
    Cholesky decomposition of a 3,000 x 3,000 matrix: 0.887 (sec).
    Determinant of a 2,500 x 2,500 random matrix: 0.919 (sec).
    Eigenvalues of a 640 x 640 random matrix: 0.468 (sec).
    FFT over 2,500,000 random values: 0.351 (sec).
    Inverse of a 1,600 x 1,600 random matrix: 0.806 (sec).
plot(res)
You are ranked 258 out of 749 machines.

2 Fine Tuning: Efficient Base R

2.1 Memory allocation

2.1.1 Timings - growing a vector

Growing a vector is one of the deadly sins in R; you should always avoid it.

The growing() function defined below generates n random standard normal numbers, but grows the size of the vector each time an element is added!

Note: Standard normal numbers are numbers drawn from a normal distribution with mean 0 and standard deviation 1.

n <- 30000
# Slow code
growing <- function(n) {
    x <- NULL
    for(i in 1:n)
        x <- c(x, rnorm(1))
    x
}
# Use <- with system.time() to store the result as res_grow
system.time(res_grow <- growing(n))
   user  system elapsed 
  1.444   0.000   1.449 

That was really slow for such a small vector. Next you’ll see ways to speed up that code.

2.1.2 Timings - pre-allocation

In the previous exercise, growing the vector took around 2 seconds. How long does it take when we pre-allocate the vector? The pre_allocate() function is defined below.

# Fast code
pre_allocate <- function(n) {
    x <- numeric(n) # Pre-allocate
    for(i in 1:n) 
        x[i] <- rnorm(1)
    x
}
# Use <- with system.time() to store the result as res_allocate
system.time(res_allocate <- pre_allocate(n))
   user  system elapsed 
  0.051   0.024   0.074 

Pre-allocating the vector is significantly faster than growing the vector!

2.2 Importance of vectorizing your code

2.2.1 Vectorized code: multiplication

The following piece of code is written like traditional C or Fortran code. Instead of using the vectorized version of multiplication, it uses a for loop.

x <- rnorm(10)
x2 <- numeric(length(x))
for(i in 1:10)
    x2[i] <- x[i] * x[i]

Your job is to make this code more “R-like” by vectorizing it.

# Store your answer as x2_imp
x2_imp <- x * x

The new version of the code is quicker to type, easier to understand, and it runs faster!

2.2.2 Vectorized code: calculating a log-sum

A common operation in statistics is to calculate the sum of log probabilities. The following code calculates the log-sum (the sum of the logs).

# x is a vector of probabilities
total <- 0
for(i in seq_along(x)) 
    total <- total + log(x[i])
NaNs producedNaNs producedNaNs producedNaNs producedNaNs produced
# Initial code
n <- 100
total <- 0
x <- runif(n)
for(i in 1:n) 
    total <- total + log(x[i])

# Rewrite in a single line. Store the result in log_sum
log_sum <- sum(log(x))

log() can take in a vector and outputs a vector. sum() takes in a vector and returns the sum of all the values in the vector.

2.3 Data frames and matrices

2.3.1 Data frames and matrices - column selection

All values in a matrix must have the same data type, which has efficiency implications when selecting rows and columns.

# Which is faster, mat[, 1] or df[, 1]? 
microbenchmark(mat[, 1], df[ ,1])
         expr   min    lq    mean median    uq    max neval
mat[, 1] 1.098 1.286 1.49122 1.3695 1.556  6.631   100
df[, 1] 6.064 6.321 7.11863 6.5265 6.791 55.580   100

Because all values in a matrix object must be the same data type, it is much faster to access the first column of a matrix than it is to access that of a data frame.

2.3.2 Row timings

Using microbenchmark(), how long does it take to select the first row from each of these objects?

To make the comparison fair, just use mat[1, ] and df[1, ].

# Which is faster, mat[1, ] or df[, 1]? 
microbenchmark(mat[1, ], df[ ,1])
Unit: microseconds
    expr      min        lq       mean   median        uq      max neval
mat[1, ]    4.122    4.5695   11.69504    9.305   18.2145   36.885   100
 df[1, ] 4397.701 4530.5430 4837.12408 4589.192 4700.0570 7809.431   100

Accessing a row of a data frame is much slower than accessing that of a matrix, more so than when accessing a column from each data type. This is because the values of a column of a data frame must be the same data type, whereas that of a row doesn’t have to be.

3 Diagnosing Problems: Code Profiling

3.1 What is code profiling

3.1.1 Profvis in action

# Load the data set
data(movies, package = "ggplot2movies") 

# Load the profvis package
library(profvis)
Registered S3 method overwritten by 'htmlwidgets':
  method           from         
  print.htmlwidget tools:rstudio
# Profile the following code with the profvis function
profvis({
  # Load and select data
  comedies <- movies[movies$Comedy == 1, ]

  # Plot data of interest
  plot(comedies$year, comedies$rating)

  # Loess regression line
  model <- loess(rating ~ year, data = comedies)
  j <- order(comedies$year)
  
  # Add fitted line to the plot
  lines(comedies$year[j], model$fitted[j], col = "red")
})     ## Remember the closing brackets!

3.2 Profvis: Larger example

3.2.1 Change the data frame to a matrix

One of the parts of the code that profvis highlighted was the line where we generated the possible dice rolls and stored the results in a data frame:

df <- data.frame(d1 = sample(1:6, 3, replace = TRUE),
            d2 = sample(1:6, 3, replace = TRUE))

We can optimize this code by making two improvements:

  • Switching from a data frame to a matrix
  • Generating the 6 dice rolls in a single step

This gives:

m <- matrix(sample(1:6, 6, replace = TRUE), ncol = 2)
# Use microbenchmark to time m() and d()
Warning messages:
1: In readChar(file, size, TRUE) : truncating string with embedded nuls
2: In readChar(file, size, TRUE) : truncating string with embedded nuls
3: In readChar(file, size, TRUE) : truncating string with embedded nuls
microbenchmark(
 data.frame_solution = d(),
 matrix_solution     = m()
)
Unit: microseconds
ABCDEFGHIJ0123456789
expr
<fctr>
min
<dbl>
lq
<dbl>
mean
<dbl>
median
<dbl>
uq
<dbl>
max
<dbl>
neval
<dbl>
data.frame_solution121.415123.4675129.08917125.2945127.2285291.692100
matrix_solution4.8875.72107.415777.27758.156527.494100

3.2.2 Calculating row sums

The second bottleneck identified was calculating the row sums.

total <- apply(d, 1, sum)

In the previous example we switched the underlying object to a matrix. This makes the above apply operation three times faster. But there’s one further optimization we can use - switch apply() with rowSums()

# Compare the methods
Warning messages:
1: In readChar(file, size, TRUE) : truncating string with embedded nuls
2: In readChar(file, size, TRUE) : truncating string with embedded nuls
3: In readChar(file, size, TRUE) : truncating string with embedded nuls
microbenchmark(
    app_sol = app(rolls),
    r_sum_sol = r_sum(rolls)
)
Unit: microseconds
ABCDEFGHIJ0123456789
expr
<fctr>
min
<dbl>
lq
<dbl>
mean
<dbl>
median
<dbl>
uq
<dbl>
max
<dbl>
neval
<dbl>
app_sol14.95115.321517.1039915.590516.50359.484100
r_sum_sol3.1683.42204.008293.70453.93522.644100

3.2.3 Use && instead of &

To determine if both dice are the same, the move_square() function uses if statements.

if (is_double[1] & is_double[2] & is_double[3]) {
current <- 11 # Go To Jail - square 11 == Jail
}

The & operator will always evaluate both its arguments. That is, if you type x & y, R will always try to work out what x and y are. There are some cases where this is inefficient. For example, if x is FALSE, then x & y will always be FALSE, regardless of the value of y. Thus, you can save a little processing time by not calculating it. The && operator takes advantage of this trick, and doesn’t bother to calculate y if it doesn’t make a difference to the overall result.

In this code, if is_double[1] is FALSE we don’t need to evaluate is_double[2] or is_double[3], so we can get a speedup by swapping & for &&.

One thing to note is that && only works on single logical values, i.e., logical vectors of length 1 (like you would pass into an if condition), but & also works on vectors of length greater than 1.

is_double = c(FALSE, TRUE, TRUE)
Warning messages:
1: In readChar(file, size, TRUE) : truncating string with embedded nuls
2: In readChar(file, size, TRUE) : truncating string with embedded nuls
3: In readChar(file, size, TRUE) : truncating string with embedded nuls
is_double
[1] FALSE  TRUE  TRUE
Warning messages:
1: In readChar(file, size, TRUE) : truncating string with embedded nuls
2: In readChar(file, size, TRUE) : truncating string with embedded nuls
3: In readChar(file, size, TRUE) : truncating string with embedded nuls
# Define the previous solution
move <- function(is_double) {
    if (is_double[1] & is_double[2] & is_double[3]) {
        current <- 11 # Go To Jail
    }
}

# Define the improved solution
improved_move <- function(is_double) {
    if (is_double[1] && is_double[2] && is_double[3]) {
        current <- 11 # Go To Jail
    }
}
# microbenchmark both solutions
# Very occassionally the improved solution is actually a little slower
# This is just random chance
microbenchmark(move, improved_move, times = 1e5)
Unit: nanoseconds
ABCDEFGHIJ0123456789
expr
<fctr>
min
<dbl>
lq
<dbl>
mean
<dbl>
median
<dbl>
uq
<dbl>
max
<dbl>
neval
<dbl>
move222526.03437252653191e+05
improved_move212324.05335242458901e+05

4 Turbo Charged Code: Parallel Programming

4.1 CPUs - why do we have more than one

4.1.1 How many cores does this machine have?

The parallel package has a function detectCores() that determines the number of cores in a machine.

# Load the parallel package
library(parallel)

# Store the number of cores in the object no_of_cores
no_of_cores <- detectCores()

# Print no_of_cores
no_of_cores
[1] 16

4.2 What sort of problems benefit from parallel computing?

4.2.1 Can this loop run in parallel?

The following piece of code implements a simple dice game. The game is as follows:

  • Initialize: total <- 0.
  • Roll a single die and add it to total.
  • If total is even, reset total to zero.
  • If total is greater than 10, the game finishes.
total <- no_of_rolls <- 0 # Initialise
while(total < 10) {
  total <- total + sample(1:6, 1)

  if(total %% 2 == 0) total <- 0  # If even. Reset to 0

  no_of_rolls <- no_of_rolls + 1
}
no_of_rolls
[1] 3

Do you think this algorithm can be (easily) run in parallel? No: it’s a sequential algorithm. The ith value depends on the previous value. press

If we wrap the game in a function:

play <- function() {
  total <- no_of_rolls <- 0
  while(total < 10) {
    total <- total + sample(1:6, 1)

    # If even. Reset to 0
    if(total %% 2 == 0) total <- 0 
    no_of_rolls <- no_of_rolls + 1
  }
  no_of_rolls
}

And construct a loop to play the game

results <- numeric(100)
for(i in seq_along(results)) 
    results[i] <- play()
results
  [1] 16  5  8  8 69  8  5 10  8  6 17 21 16 14  3  5  5  6  9  2 10  2  8
 [24] 17  3 29  4 21 10  7 10 15  6 28 19 17  9  8 55  7  8 16 29  7  9  6
 [47]  4 25 12 29 40 54 21  7 14  4 17  3 10 13 16  7  5 30 12 10 15 36 10
 [70] 20 11  7 11 11 12 61 13 27 68 28  9 19  4  3  3  4  5 32 15 20  8  2
 [93] 17 17  6 18 10  3 14  5
results <- numeric(100)
for(i in seq_along(-results)) 
    results[i] <- play()
results
  [1]  4 12 16 21  5 59 63 10  3 11 10  9 43  3 21 51  4 13 15 67  3 49 18
 [24] 14 22  8 17 29 13  5 10 21 22  3 11 20 10 11 14 14 16 13 11  4 21  8
 [47] 18 26  4 16 37 39 31 15 11 14  4 17  6 28  7  7  6  6 36  2  9  8 18
 [70]  2 23  6  4 34 11  7 12 19 20 18 20  3  4  6 10 27 34 31  4  5 13  5
 [93] 18 14  5  3 16 18  7  2

Yes, this is embarrassingly parallel. We can simulate the games in any order. press Independent Monte Carlo simulations are ideal for parallel computation.

4.3 The parallel package parApply

Sometimes parApply() is faster; it just depends on the problem.

4.3.1 Moving to parApply

To run code in parallel using the parallel package, the basic workflow has three steps.

  1. Create a cluster using makeCluster().
  2. Do some work.
  3. Stop the cluster using stopCluster().

The simplest way to make a cluster is to pass a number to makeCluster(). This creates a cluster of the default type, running the code on that many cores.

dd <- readRDS("dd.rds")

The object dd is a data frame with 10 columns and 100 rows. The following code uses apply() to calculate the column medians:

apply(dd, 2, median)

To run this in parallel, you swap apply() for parApply(). The arguments to this function are the same, except that it takes a cluster argument before the usual apply() arguments.

# Determine the number of available cores
detectCores()
[1] 16
# Create a cluster via makeCluster
cl <- makeCluster(2)

# Parallelize this code
parApply(cl, dd, 2, median)
 [1]  0.01375112 -0.07879187  0.02337513 -0.02880553 -0.05578824
 [6] -0.07316231 -0.14109027 -0.25877556 -0.01040861  0.05407892
# Stop the cluster
stopCluster(cl)

4.4 The parallel package - parSapply

4.4.1 Using parSapply()

We previously played the following game:

  • Initialize: total = 0.
  • Roll a single die and add it to total.
  • If total is even, reset total to zero.
  • If total is greater than 10. The game finishes.

The game could be simulated using the play() function:

play <- function() {
Warning messages:
1: In readChar(file, size, TRUE) : truncating string with embedded nuls
2: In readChar(file, size, TRUE) : truncating string with embedded nuls
3: In readChar(file, size, TRUE) : truncating string with embedded nuls
  total <- no_of_rolls <- 0
  while(total < 10) {
    total <- total + sample(1:6, 1)

    # If even. Reset to 0
    if(total %% 2 == 0) total <- 0 
    no_of_rolls <- no_of_rolls + 1
  }
  no_of_rolls
}

To simulate the game 100 times, we could use a for loop or sapply():

res <- sapply(1:100, function(i) play())

This is perfect for running in parallel!

To make functions available on a cluster, you use the clusterExport() function. This takes a cluster and a string naming the function.

clusterExport(cl, "some_function")
library("parallel")
# Create a cluster via makeCluster (2 cores)
cl <- makeCluster(2)

# Export the play() function to the cluster
clusterExport(cl, "play")

# Re-write sapply as parSapply
res <- parSapply(cl, 1:100, function(i) play())

# Stop the cluster
stopCluster(cl)

clusterExport() makes it easy to run your functions inside a cluster.

4.4.2 Timings parSapply()

Running the dice game is embarrassingly parallel. These types of simulations usually (but not always) produce a good speed-up. As before, we can use microbenchmark() or system.time(). For simplicity, we’ll use system.time() in this exercise.

# Set the number of games to play
no_of_games <- 1e5

## Time serial version
system.time(serial <- sapply(1:no_of_games, function(i) play()))
   user  system elapsed 
  7.008   0.014   7.048 
## Set up cluster
cl <- makeCluster(2)
clusterExport(cl, "play")

## Time parallel version
system.time(par <- parSapply(cl, 1:no_of_games, function(i) play()))
   user  system elapsed 
  0.039   0.009   7.066 
## Stop cluster
stopCluster(cl)
---
title: "Writing Efficient R Code"
output:
  html_notebook:
    toc: true
    toc_float: true
    toc_collapsed: false
    number_sections: true
    
toc_depth: 3
---
# The Art of Benchmarking


## Introduction

### R version

One of the relatively easy optimizations available is to use an up-to-date version of R. In general, R is very conservative, so upgrading doesn't break existing code. However, a new version will often provide free speed boosts for key functions.

The version command returns a list that contains (among other things) the major and minor version of R currently being used.
```{r}
# Print the R version details using version
version

# Assign the variable major to the major component
major <- version$major

# Assign the variable minor to the minor component
minor <- version$minor
```
Keeping R up to date also makes it easier to get help when you are stuck.

## Benchmarking

### Comparing read times of CSV and RDS files

One of the most common tasks we perform is reading in data from CSV files. However, for large CSV files this can be slow. One neat trick is to read in the data and save as an R binary file (rds) using saveRDS(). To read in the rds file, we use readRDS().

Note: Since rds is R's native format for storing single objects, you have not introduced any third-party dependencies that may change in the future.

To benchmark the two approaches, you can use system.time(). This function returns the time taken to evaluate any R expression. For example, to time how long it takes to calculate the square root of the numbers from one to ten million, you would write the following:
```{r}
system.time(sqrt(1:1e7))
```


```{r}
# How long does it take to read movies from CSV?
system.time(read.csv("movies.csv"))

# How long does it take to read movies from RDS?
system.time(readRDS("movies.rds"))
```
Reading in RDS files are much quicker than reading in CSV files.

### Operational differences: "<-" and "="

There are a number of ways to assign variables to objects. The two standard ways are to use the = or <- operators. Using the <- operator inside a function call will create a new (or overwrite an existing) object.

This makes '<-' useful inside system.time() since we can store the result.

### Elapsed time

Using system.time() is convenient, but it does have its drawbacks when comparing multiple function calls. The microbenchmark package solves this problem with the microbenchmark() function.
```{r}
# Load the microbenchmark package
library(microbenchmark)

# Compare the two functions
compare <- microbenchmark(read.csv("movies.csv"), 
                          readRDS("movies.rds"), 
                          times = 10)

# Print compare
compare
```
The microbenchmark() function makes it easier to compare multiple function calls at once by compiling all the relevant information in a data frame. It does this by running each function call multiple times, recording the time it took for the function to run each time, then computing summary statistics for each expression as you can see here.

### Relative time

When benchmarking it's important to consider both absolute and relative times. Using the timings below, on average a single call to read.csv() is 720 - 80 = 640 milliseconds slower than that of readRDS(). Aprox, 9 times slower.

## How good is your machine?

### DataCamp hardware

For many problems your time is the expensive part. If having a faster computer makes you more productive, it can be cost effective to buy one. However, before you splash out on new toys for yourself, your boss/partner may want to see some numbers to justify the expense. Measuring the performance of your computer is called benchmarking, and you can do that with the benchmarkme package.
```{r}
# Load the benchmarkme package
library(benchmarkme)

# Assign the variable ram to the amount of RAM on this machine
ram <- get_ram()
ram

# Assign the variable cpu to the cpu specs
cpu <- get_cpu()
cpu
```
### Benchmark DataCamp's machine


The benchmarkme package allows you to run a set of standardized benchmarks and compare your results to other users. One set of benchmarks tests is reading and writing speeds.

The function call

    res = benchmark_io(runs = 1, size = 5)

records the length of time it takes to read and write a 5MB file.

```{r}
# Run the io benchmark
res <- benchmark_io(runs = 1, size = 5)

# Plot the results
plot(res)
```
```{r}
# Run each benchmark 3 times
res <- benchmark_std(runs = 3)
plot(res)
```
# Fine Tuning: Efficient Base R

## Memory allocation

### Timings - growing a vector

Growing a vector is one of the deadly sins in R; you should always avoid it.

The growing() function defined below generates n random standard normal numbers, but grows the size of the vector each time an element is added!

Note: Standard normal numbers are numbers drawn from a normal distribution with mean 0 and standard deviation 1.
```{r}
n <- 30000
# Slow code
growing <- function(n) {
    x <- NULL
    for(i in 1:n)
        x <- c(x, rnorm(1))
    x
}
```
```{r}
# Use <- with system.time() to store the result as res_grow
system.time(res_grow <- growing(n))
```
That was really slow for such a small vector. Next you'll see ways to speed up that code.

### Timings - pre-allocation

In the previous exercise, growing the vector took around 2 seconds. How long does it take when we pre-allocate the vector? The pre_allocate() function is defined below.
```{r}
# Fast code
pre_allocate <- function(n) {
    x <- numeric(n) # Pre-allocate
    for(i in 1:n) 
        x[i] <- rnorm(1)
    x
}
```
```{r}
# Use <- with system.time() to store the result as res_allocate
system.time(res_allocate <- pre_allocate(n))
```
Pre-allocating the vector is significantly faster than growing the vector!

## Importance of vectorizing your code

### Vectorized code: multiplication

The following piece of code is written like traditional C or Fortran code. Instead of using the vectorized version of multiplication, it uses a for loop.
```{r}
x <- rnorm(10)
x2 <- numeric(length(x))
for(i in 1:10)
    x2[i] <- x[i] * x[i]
```
Your job is to make this code more "R-like" by vectorizing it.
```{r}
# Store your answer as x2_imp
x2_imp <- x * x
```
The new version of the code is quicker to type, easier to understand, and it runs faster!

### Vectorized code: calculating a log-sum

A common operation in statistics is to calculate the sum of log probabilities. The following code calculates the log-sum (the sum of the logs).
```{r}
# x is a vector of probabilities
total <- 0
for(i in seq_along(x)) 
    total <- total + log(x[i])
```
```{r}
# Initial code
n <- 100
total <- 0
x <- runif(n)
for(i in 1:n) 
    total <- total + log(x[i])

# Rewrite in a single line. Store the result in log_sum
log_sum <- sum(log(x))
```
log() can take in a vector and outputs a vector. sum() takes in a vector and returns the sum of all the values in the vector.

## Data frames and matrices

### Data frames and matrices - column selection

All values in a matrix must have the same data type, which has efficiency implications when selecting rows and columns.

    # Which is faster, mat[, 1] or df[, 1]? 
    microbenchmark(mat[, 1], df[ ,1])
             expr   min    lq    mean median    uq    max neval
    mat[, 1] 1.098 1.286 1.49122 1.3695 1.556  6.631   100
    df[, 1] 6.064 6.321 7.11863 6.5265 6.791 55.580   100

Because all values in a matrix object must be the same data type, it is much faster to access the first column of a matrix than it is to access that of a data frame.

### Row timings

Using microbenchmark(), how long does it take to select the first row from each of these objects?

To make the comparison fair, just use mat[1, ] and df[1, ].

    # Which is faster, mat[1, ] or df[, 1]? 
    microbenchmark(mat[1, ], df[ ,1])
    Unit: microseconds
        expr      min        lq       mean   median        uq      max neval
    mat[1, ]    4.122    4.5695   11.69504    9.305   18.2145   36.885   100
     df[1, ] 4397.701 4530.5430 4837.12408 4589.192 4700.0570 7809.431   100

Accessing a row of a data frame is much slower than accessing that of a matrix, more so than when accessing a column from each data type. This is because the values of a column of a data frame must be the same data type, whereas that of a row doesn't have to be.  

# Diagnosing Problems: Code Profiling

## What is code profiling

### Profvis in action

```{r}
# Load the data set
data(movies, package = "ggplot2movies") 

# Load the profvis package
library(profvis)

# Profile the following code with the profvis function
profvis({
  # Load and select data
  comedies <- movies[movies$Comedy == 1, ]

  # Plot data of interest
  plot(comedies$year, comedies$rating)

  # Loess regression line
  model <- loess(rating ~ year, data = comedies)
  j <- order(comedies$year)
  
  # Add fitted line to the plot
  lines(comedies$year[j], model$fitted[j], col = "red")
})     ## Remember the closing brackets!
```
## Profvis: Larger example

### Change the data frame to a matrix

One of the parts of the code that profvis highlighted was the line where we generated the possible dice rolls and stored the results in a data frame:

    df <- data.frame(d1 = sample(1:6, 3, replace = TRUE),
                d2 = sample(1:6, 3, replace = TRUE))

We can optimize this code by making two improvements:

- Switching from a data frame to a matrix
- Generating the 6 dice rolls in a single step

This gives:

    m <- matrix(sample(1:6, 6, replace = TRUE), ncol = 2)

```{r}
# Load the microbenchmark package
library(microbenchmark)

# The previous data frame solution is defined
# d() Simulates 6 dices rolls
d <- function() {
  data.frame(
    d1 = sample(1:6, 3, replace = TRUE),
    d2 = sample(1:6, 3, replace = TRUE)
  )
}

# Complete the matrix solution
m <- function() {
  matrix(sample(1:6, 6, replace = TRUE), ncol = 2)
}

# Use microbenchmark to time m() and d()
microbenchmark(
 data.frame_solution = d(),
 matrix_solution     = m()
)
```
### Calculating row sums

The second bottleneck identified was calculating the row sums.

    total <- apply(d, 1, sum)

In the previous example we switched the underlying object to a matrix. This makes the above apply operation three times faster. But there's one further optimization we can use - switch apply() with rowSums()

```{r}
# Example data
rolls <- matrix(c(3, 6, 5, 1, 5, 4), ncol = 2)
rolls
# Define the previous solution 
app <- function(x) {
    apply(x, 1, sum)
}

# Define the new solution
r_sum <- function(x) {
    rowSums(x)
}

# Compare the methods
microbenchmark(
    app_sol = app(rolls),
    r_sum_sol = r_sum(rolls)
)
```
### Use && instead of &

To determine if both dice are the same, the move_square() function uses if statements.

    if (is_double[1] & is_double[2] & is_double[3]) {
    current <- 11 # Go To Jail - square 11 == Jail
    }

The & operator will always evaluate both its arguments. That is, if you type x & y, R will always try to work out what x and y are. There are some cases where this is inefficient. For example, if x is FALSE, then x & y will always be FALSE, regardless of the value of y. Thus, you can save a little processing time by not calculating it. The && operator takes advantage of this trick, and doesn't bother to calculate y if it doesn't make a difference to the overall result.

In this code, if is_double[1] is FALSE we don't need to evaluate is_double[2] or is_double[3], so we can get a speedup by swapping & for &&.

One thing to note is that && only works on single logical values, i.e., logical vectors of length 1 (like you would pass into an if condition), but & also works on vectors of length greater than 1.

```{r}
is_double = c(FALSE, TRUE, TRUE)
```


```{r}
is_double

# Define the previous solution
move <- function(is_double) {
    if (is_double[1] & is_double[2] & is_double[3]) {
        current <- 11 # Go To Jail
    }
}

# Define the improved solution
improved_move <- function(is_double) {
    if (is_double[1] && is_double[2] && is_double[3]) {
        current <- 11 # Go To Jail
    }
}
```


```{r}
# microbenchmark both solutions
# Very occassionally the improved solution is actually a little slower
# This is just random chance
microbenchmark(move, improved_move, times = 1e5)
```
# Turbo Charged Code: Parallel Programming

## CPUs - why do we have more than one

### How many cores does this machine have?

The parallel package has a function detectCores() that determines the number of cores in a machine.

```{r}
# Load the parallel package
library(parallel)

# Store the number of cores in the object no_of_cores
no_of_cores <- detectCores()

# Print no_of_cores
no_of_cores
```
## What sort of problems benefit from parallel computing?

### Can this loop run in parallel?


The following piece of code implements a simple dice game. The game is as follows:

- Initialize: total <- 0.
- Roll a single die and add it to total.
- If total is even, reset total to zero.
- If total is greater than 10, the game finishes.

```{r}
total <- no_of_rolls <- 0 # Initialise
while(total < 10) {
  total <- total + sample(1:6, 1)

  if(total %% 2 == 0) total <- 0  # If even. Reset to 0

  no_of_rolls <- no_of_rolls + 1
}
no_of_rolls
```
Do you think this algorithm can be (easily) run in parallel?
No: it's a sequential algorithm. The ith value depends on the previous value.
press

If we wrap the game in a function:
```{r}
play <- function() {
  total <- no_of_rolls <- 0
  while(total < 10) {
    total <- total + sample(1:6, 1)

    # If even. Reset to 0
    if(total %% 2 == 0) total <- 0 
    no_of_rolls <- no_of_rolls + 1
  }
  no_of_rolls
}
```


And construct a loop to play the game
```{r}
results <- numeric(100)
for(i in seq_along(results)) 
    results[i] <- play()
results
```
```{r}
#reverse
results <- numeric(100)
for(i in seq_along(-results)) 
    results[i] <- play()
results
```
Yes, this is embarrassingly parallel. We can simulate the games in any order.
press
Independent Monte Carlo simulations are ideal for parallel computation.

## The parallel package parApply

Sometimes parApply() is faster; it just depends on the problem.

### Moving to parApply

To run code in parallel using the parallel package, the basic workflow has three steps.

1. Create a cluster using makeCluster().
2. Do some work.
3. Stop the cluster using stopCluster().

The simplest way to make a cluster is to pass a number to makeCluster(). This creates a cluster of the default type, running the code on that many cores.

```{r}
dd <- readRDS("dd.rds")
```
The object dd is a data frame with 10 columns and 100 rows. The following code uses apply() to calculate the column medians:

    apply(dd, 2, median)
    
To run this in parallel, you swap apply() for parApply(). The arguments to this function are the same, except that it takes a cluster argument before the usual apply() arguments.
```{r}
# Determine the number of available cores
detectCores()

# Create a cluster via makeCluster
cl <- makeCluster(2)

# Parallelize this code
parApply(cl, dd, 2, median)

# Stop the cluster
stopCluster(cl)
```
## The parallel package - parSapply

### Using parSapply()

We previously played the following game:

- Initialize: total = 0.
- Roll a single die and add it to total.
- If total is even, reset total to zero.
- If total is greater than 10. The game finishes.

The game could be simulated using the play() function:
```{r}
play <- function() {
  total <- no_of_rolls <- 0
  while(total < 10) {
    total <- total + sample(1:6, 1)

    # If even. Reset to 0
    if(total %% 2 == 0) total <- 0 
    no_of_rolls <- no_of_rolls + 1
  }
  no_of_rolls
}
```
To simulate the game 100 times, we could use a for loop or sapply():
```{r}
res <- sapply(1:100, function(i) play())
```
This is perfect for running in parallel!

To make functions available on a cluster, you use the clusterExport() function. This takes a cluster and a string naming the function.

    clusterExport(cl, "some_function")

```{r}
library("parallel")
# Create a cluster via makeCluster (2 cores)
cl <- makeCluster(2)

# Export the play() function to the cluster
clusterExport(cl, "play")

# Re-write sapply as parSapply
res <- parSapply(cl, 1:100, function(i) play())

# Stop the cluster
stopCluster(cl)
```
clusterExport() makes it easy to run your functions inside a cluster.

### Timings parSapply()

Running the dice game is embarrassingly parallel. These types of simulations usually (but not always) produce a good speed-up. As before, we can use microbenchmark() or system.time(). For simplicity, we'll use system.time() in this exercise.
```{r}
# Set the number of games to play
no_of_games <- 1e5

## Time serial version
system.time(serial <- sapply(1:no_of_games, function(i) play()))

## Set up cluster
cl <- makeCluster(2)
clusterExport(cl, "play")

## Time parallel version
system.time(par <- parSapply(cl, 1:no_of_games, function(i) play()))

## Stop cluster
stopCluster(cl)
```

