What is Object-Oriented Programming?

When is OOP a good idea?
Nine Systems
How does R Distinguish Variables?
Assigning Classes

Using S3

Using R6

R6 Inheritance

Advanced R6

What is Object-Oriented Programming?

Most of the time when you use R, you use a functional programming style(i.e. you start with some data, to manipulate it you apply a function. This returns some new data then you apply another function, and you repeat this until you get an answer.).

With a functional mindset you typically start with thinking about what you want a function to do, then you worry about the objects that get passed to the functions(i.e. arguments).Finally you worry about objects that come out the other end i.e. return values.

Object Oriented Programming has a different approach. In it you start by thinking about objects that you have to work with(ex. Teapot), Then you think about what data you need to describe the object(ex - you might consider the toal capacity of the teapot and how much liquid is currently stored in the teapot). Next you think about the functionality of that object(ex. the main purpose of the teapot is to pour tea so you add pour function)

In OOP functions are known as methods.(basically methods are functions in Object Oriented context). There are two variable types that are important in OOP.

Lists
Environments

Because these variable types can contain many other variables (& types) you can use them to create many other more complex types. Functional Programming approach is most preferred one for data analysis.

When is OOP a good idea?

It works best when you have limited number of objects that you completely understand behavior of. So OOP is preferred/good for building tools that are used for data analysis but bad for data analysis itself.One of the principles of OOP is that functions can behave differently for different kinds of object(ex. summary function).

# Create these variables
a_numeric_vector <- rlnorm(50)
a_factor <- factor(
  sample(c(LETTERS[1:5], NA), 50, replace = TRUE)
)
a_data_frame <- data.frame(
  n = a_numeric_vector,
  f = a_factor
)
a_linear_model <- lm(dist ~ speed, cars)

# Call summary() on the numeric vector
summary(a_numeric_vector)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.1103  0.5207  0.9501  1.3263  2.0918  4.6469

# Do the same for the other three objects
summary(a_factor)

##    A    B    C    D    E NA's 
##   10    7    9    6    7   11

summary(a_data_frame)

##        n             f     
##  Min.   :0.1103   A   :10  
##  1st Qu.:0.5207   B   : 7  
##  Median :0.9501   C   : 9  
##  Mean   :1.3263   D   : 6  
##  3rd Qu.:2.0918   E   : 7  
##  Max.   :4.6469   NA's:11

summary(a_linear_model)

## 
## Call:
## lm(formula = dist ~ speed, data = cars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -29.069  -9.525  -2.272   9.215  43.201 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -17.5791     6.7584  -2.601   0.0123 *  
## speed         3.9324     0.4155   9.464 1.49e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 15.38 on 48 degrees of freedom
## Multiple R-squared:  0.6511, Adjusted R-squared:  0.6438 
## F-statistic: 89.57 on 1 and 48 DF,  p-value: 1.49e-12

Nine Systems

Out of the nine packages availabale for OOP in R not all are used due to some limitaions. Only two packages are of real use which are S3 and R6.

Knowing how to use S3 is a fundamental R skill.
R6 and ReferenceClasses are powerful OOP frameworks.
S4 is useful for working with Bioconductor.

How does R Distinguish Variables?

str() and class() are the functions used to examine the structure and class of the variable respectively. But sometimes however you need to dig deeper and there are other functions that you need to consider.

typeof() - It returns type of the variable in its internal C language code. This will allow you to check the contents of your variable regardless of the class.
mode()
storage.mode()

Both the functions mode and storage.mode exists solely for their compatibility with the older S code so while you need to know they exists you should never really use them.

(int_mat <- matrix(1:12,3))

##      [,1] [,2] [,3] [,4]
## [1,]    1    4    7   10
## [2,]    2    5    8   11
## [3,]    3    6    9   12

(num_mat <- matrix(rnorm(12),3))

##            [,1]       [,2]       [,3]       [,4]
## [1,] 0.74181895  1.0518520 -0.9652027 -1.0421293
## [2,] 0.04219933 -1.1993195  1.5072984  0.2748198
## [3,] 2.37485335 -0.2832735  0.8662924  0.6323595

class(int_mat)

## [1] "matrix"

class(num_mat)

## [1] "matrix"

typeof(int_mat)

## [1] "integer"

typeof(num_mat)

## [1] "double"

There are some rarer variable types that you may not have come across yet.

array: Generalization of a matrix with an arbitrary number of dimensions.
formula: Used by modeling and plotting functions to define relationships between variables.

Also note that there are three kinds of functions in R.

Most of the functions that you come across are called closures. A few important functions, like length() are known as builtin functions, which use a special evaluation mechanism to make them go faster. Language constructs, like if and while are also functions! They are known as special functions.

# Create a function
type_info <- function(x)
{
  c(
    class = class(x), 
    typeof = typeof(x), 
    mode = mode(x), 
    storage.mode = storage.mode(x)
  )
}

# Create list of example variables
some_vars <- list(
  an_integer_vector = rpois(24, lambda = 5),
  a_numeric_vector = rbeta(24, shape1 = 1, shape2 = 1),
  an_integer_array = array(rbinom(24, size = 8, prob = 0.5), dim = c(2, 3, 4)),
  a_numeric_array = array(rweibull(24, shape = 1, scale = 1), dim = c(2, 3, 4)),
  a_data_frame = data.frame(int = rgeom(24, prob = 0.5), num = runif(24)),
  a_factor = factor(month.abb),
  a_formula = y ~ x,
  a_closure_function = mean,
  a_builtin_function = length,
  a_special_function = `if`
)

# Loop over some_vars calling type_info() on each element to explore them
lapply(some_vars,type_info)

## $an_integer_vector
##        class       typeof         mode storage.mode 
##    "integer"    "integer"    "numeric"    "integer" 
## 
## $a_numeric_vector
##        class       typeof         mode storage.mode 
##    "numeric"     "double"    "numeric"     "double" 
## 
## $an_integer_array
##        class       typeof         mode storage.mode 
##      "array"    "integer"    "numeric"    "integer" 
## 
## $a_numeric_array
##        class       typeof         mode storage.mode 
##      "array"     "double"    "numeric"     "double" 
## 
## $a_data_frame
##        class       typeof         mode storage.mode 
## "data.frame"       "list"       "list"       "list" 
## 
## $a_factor
##        class       typeof         mode storage.mode 
##     "factor"    "integer"    "numeric"    "integer" 
## 
## $a_formula
##        class       typeof         mode storage.mode 
##    "formula"   "language"       "call"   "language" 
## 
## $a_closure_function
##        class       typeof         mode storage.mode 
##   "function"    "closure"   "function"   "function" 
## 
## $a_builtin_function
##        class       typeof         mode storage.mode 
##   "function"    "builtin"   "function"   "function" 
## 
## $a_special_function
##        class       typeof         mode storage.mode 
##   "function"    "special"   "function"   "function"

Assigning Classes

Class() function can also be used to override the class of an object along with retrieving the class of the object and without breaking the existing functionality.

Note: Overriding the class doesn’t change the type(), mode(), or storage.mode() of the object(because this is fundamental property of an R object).

In the below example you can see that class overrides the class of an object and not the type of the object.

(x <- rexp(10))

##  [1] 0.4714675 1.8163304 0.3964557 0.5707266 0.1212334 3.4097801 0.5029485
##  [8] 0.8229876 0.2082152 1.1661194

class(x) <- "random_numbers"
typeof(x)

## [1] "double"

class(x)

## [1] "random_numbers"

Using S3

Generics and Methods or Function Overload

Previously we saw how summary function behaved differently based on the input parameter/argument type. Having different behaviors for functions under different kinds of input is called as function Overloading(input dependent function behavior). Main purpose is to simplify your code (you might had to learn more functions)

The S3 systems exists entirely to solve this problem. It does this by splitting the function into two parts

generic function (ex. summary, print)
method functions for each class

print

## function (x, ...) 
## UseMethod("print")
## <bytecode: 0x0000000011f81c38>
## <environment: namespace:base>

As you can see above print() function is really very simple it is only one line long. This is very typical with S3 generic. All the function needs to do is call UseMethod, with its own Name. That is print is passed to UseMethod as a string.

There are two conditions you must follow for S3 methods

Name of each method must be [Name of the generic].[Class of the variable] (ex. print.Date(print method for date objects), summary.factor(summary method for factor objects), unique.array )
Arguments to the method must include all the arguments to the generic

In the below example the arguments to print are x and ellipsis where as arguments to print.Date are the arguments to the generic with an extra MAX argument

args(print)

## function (x, ...) 
## NULL

args(print.Date)

## function (x, max = NULL, ...) 
## NULL

The ellipsis argument allows arguments to be passed from one method to another. It is good practise to include an ellipsis argument in both the generic and the methods. All the methods corresponding to generic are completely independent. In the below example you can see that print.function and print.Date are completely unrelated. Becuase S3 requires a dot to separate the name of the generic and the class of the input it is a bad idea to include a dot in the name of your variables. Variable names separated by dots are sometimes are called the leopard case. Don’t use this naming convention. Better conventions are lower_snake_case where lower case words are separted by underscores or lowerCamelCase where first word is lower case and subsequent words start with a capital letter.

print.function

## function (x, useSource = TRUE, ...) 
## .Internal(print.function(x, useSource, ...))
## <bytecode: 0x0000000018673f40>
## <environment: namespace:base>

print.Date

## function (x, max = NULL, ...) 
## {
##     if (is.null(max)) 
##         max <- getOption("max.print", 9999L)
##     if (max < length(x)) {
##         print(format(x[seq_len(max)]), max = max + 1, ...)
##         cat(" [ reached 'max' / getOption(\"max.print\") -- omitted", 
##             length(x) - max, "entries ]\n")
##     }
##     else if (length(x)) 
##         print(format(x), max = max, ...)
##     else cat(class(x)[1L], "of length 0\n")
##     invisible(x)
## }
## <bytecode: 0x0000000018889a60>
## <environment: namespace:base>

What’s in a Name? S3 uses a strict naming convention: all S3 methods have a name of the form generic.class.

The converse is not true: a function can have a name containing a dot without being an S3 method. This is the case with many of the functions that have been around since the early days of the S language. For example, all.equal() is actually an S3 generic, not a method. (This is an example of how leopard.case can be confusing.)

You can check if a function is an S3 generic by calling is_s3_generic() from the pryr package. You can also print it (by typing its name in the console), then looking to see if it calls UseMethod().

Similarly, you can check if a function is an S3 method by calling is_s3_method() from pryr. For example,

library(pryr)
is_s3_generic("t")           # generic transpose function

## [1] TRUE

is_s3_method("t.data.frame") # transpose method for data.frames

## [1] TRUE

is_s3_method("t.test")       # a function for Student's t-tests

## [1] FALSE

Creating a Generic Function You can create your own S3 functions. The first step is to write the generic. This is typically a single line function that calls UseMethod(), passing its name as a string.

The first argument to an S3 generic is usually called x, though this isn’t compulsory. It is also good practice to include a … (“ellipsis”, or “dot-dot-dot”) argument, in case arguments need to be passed from one method to another.

Overall, the structure of an S3 generic looks like this.

an_s3_generic <- function(x, maybe = "some", other = "arguments", ...) {
  UseMethod("an_s3_generic")
}

# Create get_n_elements
get_n_elements <- function(x, ...)
{
  UseMethod("get_n_elements")
}

Creating an S3 Method

By itself, the generic function doesn’t do anything. For that, you need to create methods, which are just regular functions with two conditions:

The name of the method must be of the form generic.class. The method signature - that is, the arguments that are passed in to the method - must contain the signature of the generic.

The syntax is:

generic.class <- function(some, arguments, ...) {
  # Do something
}

# View get_n_elements
get_n_elements

## function(x, ...)
## {
##   UseMethod("get_n_elements")
## }

# Create a data.frame method for get_n_elements
get_n_elements.data.frame <- function(x, ...) 
{
  nrow(x) * ncol(x) # or prod(dim(x))
}

# Call the method on the sleep dataset
n_elements_sleep <- get_n_elements(sleep)

# View the result
n_elements_sleep

## [1] 60

Creating an S3 method (2) If no suitable method is found for a generic, then an error is thrown. For example, at the moment, get_n_elements() only has a method available for data.frames. If you pass a matrix to get_n_elements() instead, you’ll see an error.

get_n_elements(matrix())

## Error in UseMethod("get_n_elements"): no applicable method for 'get_n_elements' applied to an object of class "c('matrix', 'logical')"

Rather than having to write dozens of methods for every kind of input, you can create a method that handles all types that don’t have a specific method. This is called the default method; it always has the name generic.default. For example, print.default() will print any type of object that doesn’t have its own print() method.

# View predefined objects
ls.str()

## a_data_frame : 'data.frame': 50 obs. of  2 variables:
##  $ n: num  0.896 0.199 2.845 0.508 4.474 ...
##  $ f: Factor w/ 5 levels "A","B","C","D",..: 2 NA NA 4 1 4 NA 5 2 4 ...
## a_factor :  Factor w/ 5 levels "A","B","C","D",..: 2 NA NA 4 1 4 NA 5 2 4 ...
## a_linear_model : List of 12
##  $ coefficients : Named num [1:2] -17.58 3.93
##  $ residuals    : Named num [1:50] 3.85 11.85 -5.95 12.05 2.12 ...
##  $ effects      : Named num [1:50] -303.914 145.552 -8.115 9.885 0.194 ...
##  $ rank         : int 2
##  $ fitted.values: Named num [1:50] -1.85 -1.85 9.95 9.95 13.88 ...
##  $ assign       : int [1:2] 0 1
##  $ qr           :List of 5
##  $ df.residual  : int 48
##  $ xlevels      : Named list()
##  $ call         : language lm(formula = dist ~ speed, data = cars)
##  $ terms        :Classes 'terms', 'formula'  language dist ~ speed
##  $ model        :'data.frame':   50 obs. of  2 variables:
## a_numeric_vector :  num [1:50] 0.896 0.199 2.845 0.508 4.474 ...
## get_n_elements : function (x, ...)  
## get_n_elements.data.frame : function (x, ...)  
## int_mat :  int [1:3, 1:4] 1 2 3 4 5 6 7 8 9 10 ...
## n_elements_sleep :  int 60
## num_mat :  num [1:3, 1:4] 0.7418 0.0422 2.3749 1.0519 -1.1993 ...
## some_vars : List of 10
##  $ an_integer_vector : int [1:24] 1 8 3 9 6 7 2 3 6 8 ...
##  $ a_numeric_vector  : num [1:24] 0.2128 0.0198 0.9113 0.9216 0.2297 ...
##  $ an_integer_array  : int [1:2, 1:3, 1:4] 5 4 5 5 3 6 6 5 0 4 ...
##  $ a_numeric_array   : num [1:2, 1:3, 1:4] 1.2425 0.0126 1.6071 0.299 1.1228 ...
##  $ a_data_frame      :'data.frame':  24 obs. of  2 variables:
##  $ a_factor          : Factor w/ 12 levels "Apr","Aug","Dec",..: 5 4 8 1 9 7 6 2 12 11 ...
##  $ a_formula         :Class 'formula'  language y ~ x
##  $ a_closure_function:function (x, ...)  
##  $ a_builtin_function:function (x)  
##  $ a_special_function:.Primitive("if") 
## type_info : function (x)  
## x :  'random_numbers' num [1:10] 0.471 1.816 0.396 0.571 0.121 ...

# Create a default method for get_n_elements
get_n_elements.default <- function(x, ...){
  
  length(unlist(x))
  
}

# Call the method on the ability.cov dataset
n_elements_ability.cov <- get_n_elements(ability.cov)

Methodical Thinking

There are lot of s3 functions in R and now you are going to leanr how to find out what is available. When you have a generic function in R it is often useful to know which methods are available for that generic. To answer this you can use the methods() function. To use it you pass the function or a string naming that function.

methods(mean) # or methods("mean")

## [1] mean.Date     mean.default  mean.difftime mean.POSIXct  mean.POSIXlt 
## [6] mean.quosure*
## see '?methods' for accessing help and source code

What methods are availabe for a given class of an object?. You can find out even this using the methods function using the class argument(with or wthout the quotes)

methods(class = "glm") # or methods(class = glm)

##  [1] add1           anova          coerce         confint        cooks.distance
##  [6] deviance       drop1          effects        extractAIC     family        
## [11] formula        influence      initialize     logLik         model.frame   
## [16] nobs           predict        print          residuals      rstandard     
## [21] rstudent       show           slotsFromS3    summary        vcov          
## [26] weights       
## see '?methods' for accessing help and source code

Actually methods is more generous with its return value than giving just the S3 methods for a given generic or class. It will return both S3 methods and S4 methods. To find only the S3 methods for a given generic or class use .S3methods function and for s4 use .S4methods.

.S3methods(class = glm)

##  [1] add1           anova          confint        cooks.distance deviance      
##  [6] drop1          effects        extractAIC     family         formula       
## [11] influence      logLik         model.frame    nobs           predict       
## [16] print          residuals      rstandard      rstudent       summary       
## [21] vcov           weights       
## see '?methods' for accessing help and source code

.S4methods(class = "glm")

## [1] coerce      initialize  show        slotsFromS3
## see '?methods' for accessing help and source code

Method Lookup for Primitive Generics

For Many data analysis the time consuming tasks are

Writing
Debugging
Maintaining

This means that R is optimized to make these tasks as quick as possible. In some cases however the speed of the code is more important.

Running Code

Functions for whom speed is a critical factor aren’t actually written in R, instead they are written in C. The reason for this is that C code typically runs faster than R code so writing in C increases peroformance. The tradeoff is that C code is longer to write and harder to debug.

R has several interfaces to the C language and the highest performance of these is known as the primitive interface. This is reserved for few fundamental features in Base R. Functions that use the primitive interface are called as Primitive Functions(ex.exp, sin, +, -, for, if).

Primitive functions can also be generic and it is important to note that these behave slightly different than other generic functions. You can see the complet list of primitvie S3 generics using .S3PrimitiveGenerics(30 functions). The big difference between primitive generic and regular generic is what happens when a sutiable method can’t be found.

.S3PrimitiveGenerics

##  [1] "anyNA"          "as.character"   "as.complex"     "as.double"     
##  [5] "as.environment" "as.integer"     "as.logical"     "as.numeric"    
##  [9] "as.raw"         "c"              "dim"            "dim<-"         
## [13] "dimnames"       "dimnames<-"     "is.array"       "is.finite"     
## [17] "is.infinite"    "is.matrix"      "is.na"          "is.nan"        
## [21] "is.numeric"     "length"         "length<-"       "levels<-"      
## [25] "names"          "names<-"        "rep"            "seq.int"       
## [29] "xtfrm"

all_of_time <- c("1970-01-01","2012-12-21")
as.Date(all_of_time)

## [1] "1970-01-01" "2012-12-21"

class(all_of_time) <- "date_strings"
as.Date(all_of_time)

## Error in as.Date.default(all_of_time): do not know how to convert 'all_of_time' to class "Date"

length(all_of_time)

## [1] 2

As as.Date is not primitive generic, when you override the class to date_strings no method can be found and an error is thrown. By contrast look at what happens with length function. Length is primitive generic its so important that it shouldn’t break just because the class has changed.

For primitive functions rather than throwing an error when no suitable method is found those functions will directly go directly to C code using typeOf to determine the type of variable/input.

Too Much Class

Variables can have more than one class. In this case ratherthan class being a single string it is a character vector. In the example below the vector if numbers is described using three or more classes. The order of the class is important. The most specific class is first and gradually get less specific as you move from left to right. It is good practise to keep original class as the final class(i.e. numeric).

To test for arbitary classes you can use the general purpose inherits function. As you can see in below example x inherits from triangular_numbers, and from natural_numbers and from numeric.

x <- c(1, 3, 6, 10, 15)

class(x) <- c("triangular_numbers","natural_numbers","numeric")

is.numeric(x)

## [1] TRUE

is.triangular_numbers(x)

## Error in is.triangular_numbers(x): could not find function "is.triangular_numbers"

inherits(x,"triangular_numbers")

## [1] TRUE

inherits(x,"natural_numbers")

## [1] TRUE

# will return the same thing as calling is.numeric but the more general function is much slower. For this reason you should use the specific
# function if available

inherits(x,"numeric")

## [1] TRUE

If your object has multiple classes then you can call multiple S3 methods using NextMethod function.

what_am_i <- function(x, ...){
  UseMethod("what_am_i")
}

what_am_i.triangular_numbers <- function(x, ...){
  message("I'm triangular numbers")
  NextMethod("what_am_i")
}

what_am_i.natural_numbers <- function(x, ...){
  message("I'm natural numbers")
  NextMethod("what_am_i")
}

what_am_i.numeric <- function(x, ...){
  message("I'm numeric")
}

what_am_i(x)

## I'm triangular numbers

## I'm natural numbers

## I'm numeric

Using R6

Object Factory

The R6 system provides a way of storing data and objects within the same variable.

The first step in working with R6 is to create a class generator for each of your objects. A class generator is a template that describes what data can be stored in the object and what functions can be applied to the object. It is also used to create the specified objects. For this reason class generators are called as factories.

Factories are defined using R6Class function. The first argument to the R6 Class is the name of the class. By convention this should be in UpperCamelCase. The second arument is called private which stores object’s data. It is always a list and each of the elements of the list must be named. There are two more arguments public and active which will be discussed later.

The second step to working with R6 is to create some objects. You can do this by calling the new() method of the factory. Since it is a factory you can churn out as many objects as you like.

library(R6)
thing_factory <- R6Class(
  "Thing",
   private = list(
    a_field = "a value"
  , another_field = 123  
  )
)

a_thing <- thing_factory$new()
another_thing <- thing_factory$new()
yet_another_thing <- thing_factory$new()

Hiding Complexity with Encapsulation

In OOP the separating the implementation of the object from its user interface is called Encapsulation. In R6 all the implementation details are stored in the private element of the class. By contrast the user interface details are stored in the element public.

The public element is also specified as a named list and its content are mostly functions.

The data fields in the private elements can be accessed using the prefix private$.

In example below private field door_is_open is accessed in the function open_door using private$door_is_open.

It is also possible to access other public elements of a class using the self$ prefix or (…).

# Define microwave_oven_factory
microwave_oven_factory <- R6Class(
  "MicrowaveOven"
  ,private = list(
    power_rating_watts = 800
   ,door_is_open = FALSE
  )
  
  ,public = list(
    open_door = function(){
      
    private$door_is_open <- TRUE
    
    }
   ,close_door = function() {
     
      private$door_is_open <- FALSE
      
    }
   ,cook = function(time_seconds){
      
      Sys.sleep(time_seconds)
      print("Your food is cooked!")
      
    }
  ) 
)

# Create microwave oven object
a_microwave_oven <- microwave_oven_factory$new()

# Call cook method for 1 second
a_microwave_oven$cook(1)

## [1] "Your food is cooked!"

Initialize()

There is one special public method named initialize() (note the American English spelling). This is not called directly by the user. Instead, it is called automatically when an object is created; that is, when the user calls new().

initialize() lets you set the values of the private fields when you create an R6 object. The pattern for an initialize() function is as follows:

thing_factory <- R6Class(
  "Thing",
  private = list(
    a_field = "a value",
    another_field = 123
  ),
  public = list(
    initialize = function(a_field, another_field) {
      if(!missing(a_field)) {
        private$a_field <- a_field
      }
      if(!missing(another_field)) {
        private$another_field <- another_field
      }
    }
  )
)

Notice the use of missing(). This returns TRUE if an argument wasn’t passed in the function call.

Arguments to the factory’s new() method are passed to initialize().

a_thing <- thing_factory$new(
  a_field = "a different value", 
  another_field = 456
)

# Add an initialize method
microwave_oven_factory <- R6Class(
  "MicrowaveOven",
  private = list(
    power_rating_watts = 800,
    door_is_open = FALSE
  ),
  public = list(
    cook = function(time_seconds) {
      Sys.sleep(time_seconds)
      print("Your food is cooked!")
    },
    open_door = function() {
      private$door_is_open <- TRUE
    },
    close_door = function() {
      private$door_is_open <- FALSE
    },
    # Add initialize() method here
    initialize = function(power_rating_watts, door_is_open) {
      if(!missing(power_rating_watts)) {
        private$power_rating_watts <- power_rating_watts
      }
      if(!missing(door_is_open)) {
        private$door_is_open <- door_is_open
      }
      
      
    }
  )
)

# Make a microwave
a_microwave_oven <- microwave_oven_factory$new(
    power_rating_watts = 650,
    door_is_open = TRUE
)

Getting and Setting with Active Bindings

Data values stored in the private element of an R6 class are not directly acessible by the user. However sometimes you may wish to provide controlled access to these data fields. There are two access cases you may want to retrieve the data field or you may want to change it. In OOP this is known as Getting the data or Setting the data.

In R6 this controlled access to private fields is achieved through Active Bindings. Active Bindings are defined like functions but are accessed like data variables.

Active Bindings are added to the active element of a class. The active element must be a named list. One of the R6 restrictions is that elements of private, public and active must all have different names.

A useful convention to distinguish private and active elements is to start all private fields with a double dot. For you as a programmer this makes the private field stand out so you have a quick visual way of signifying that these variables are not available for consumption by user.

The simplest case is to create a read only active binding. That means that you only want to retrieve a data field rather being able to change it.In this case the function takes no arguemnt and you can simply return the corresponding private field. In the example below the active binding a_field returns the private field ..a_field

Since the a_field binding is a function you can apply/include custom logic. For example if the data field was missing you can return a default value.

library(assertive)

## 
## Attaching package: 'assertive'

## The following objects are masked from 'package:pryr':
## 
##     is_s3_generic, is_s3_method

thing_factory <- R6Class(
  "Thing",
  private = list(
    ..a_field = "a value",
    ..another_field = 123
  ),

active = list(
  a_field = function(){
    
    if(is.na(private$..a_field)){
      
      return("a missing value")
    }
    
    private$..a_field
  
  }
  
 ,another_field = function(value){
    
   if(missing(value)){
    private$..another_field
   } else {
     
    assert_is_a_number(value)
    private$..another_field <- value

    }
 }
 )
)

A more complex case is when you want the users to be able to change the value of data field as well. In this case the bidning function should take a single argument, by convention named value. If value is missing the function just returns the private data field as before. However when value is passed to the active binding you need some logic to set the private value

The purpose of active bindings is to allow controlled access to the private fields. This means that you can add custom logic to check the value before you assign it. For example if another_field should only contain a single number you can use assert_is_a_number from the assertive package to check this condition and throw an error if the value is something else. Notice you are accessing it as a data variable although it is a function(no paranthesis at the end).The active binding is called like a data variable, not a function. Since a_field was defined as read-only variable if you try to change it you will get an error.

By contrast you can set another_field but however the logic in the binding states that value must be a single number.

a_thing <- thing_factory$new()
a_thing$a_field

## [1] "a value"

a_thing$a_field <- "a new value"

## Error in (function () : unused argument (.Primitive("quote")("a new value"))

a_thing$another_field <- 756
a_thing$another_field <- "756"

## Error in (function (value) : is_a_number : value is not of class 'numeric'; it has class 'character'.

# Add a binding for power rating
microwave_oven_factory <- R6Class(
  "MicrowaveOven",
  private = list(
    ..power_rating_watts = 800
  ),
  active = list(
    # Add the binding here
    power_rating_watts = function(){
      
      private$..power_rating_watts
    }

    
  )
)

# Make a microwave 
a_microwave_oven <- microwave_oven_factory$new()

# Get the power rating
a_microwave_oven$power_rating_watts

## [1] 800

# Add a binding for power rating
microwave_oven_factory <- R6Class(
  "MicrowaveOven",
  private = list(
    ..power_rating_watts = 800,
    ..power_level_watts = 800
  ),
  # Add active list containing an active binding
  active = list(
    power_level_watts = function(value) {
      if(missing(value)) {
        # Return the private value
        private$..power_level_watts
      } else {
        # Assert that value is a number
        assert_is_a_number(value)
        # Assert that value is in a closed range from 0 to power rating
        assert_all_are_in_closed_range(value,0,private$..power_rating_watts)
        # Set the private power level to value
        private$..power_level_watts <- value
      }
    }
  )
)

# Make a microwave 
a_microwave_oven <- microwave_oven_factory$new()

# Get the power level
a_microwave_oven$power_level_watts

## [1] 800

# Try to set the power level to "400"
a_microwave_oven$power_level_watts <- "400"

## Error in (function (value) : is_a_number : value is not of class 'numeric'; it has class 'character'.

# Try to set the power level to 1600 watts
a_microwave_oven$power_level_watts <- 1600

## Error in (function (value) : is_in_closed_range : value are not all in the range [0,800].
## There was 1 failure:
##   Position Value    Cause
## 1        1  1600 too high

# Set the power level to 400 watts
a_microwave_oven$power_level_watts <- 400

R6 Inheritance

Propagating Functionality with Inheritance

Copying and Pasting is really a big source of bugs and usually a sign that you are writing bad code. If you made any changes in the parent class you want those changes to be mirrored in the child class.To implement inheritance, R6 uses inherit argument. The classes that inherits from the original class(parent class) are called as child classes. All the data and the functionality of the parent class is passed to the child class i,e, all the fields from private, public and active elements.You can also add any additioanl functionality to the child.The important thing to remember that inheritance only works in one direction. The parent class does not inherit the traits of its child.

Inheritance means that the methods of the child class are exact copies of those in the parent class and you can add additional methods in the child class.

child_thing_factory <- R6Class(
  "ChildThing",
  inherit = thing_factory
  )

a_thing <- thing_factory$new()
class(a_thing)

## [1] "Thing" "R6"

inherits(a_thing, "Thing")

## [1] TRUE

inherits(a_thing, "R6")

## [1] TRUE

a_child_thing <- child_thing_factory$new()
class(a_child_thing)

## [1] "ChildThing" "Thing"      "R6"

inherits(a_child_thing, "ChildThing")

## [1] TRUE

inherits(a_child_thing, "Thing")

## [1] TRUE

inherits(a_child_thing, "R6")

## [1] TRUE

# Explore the microwave oven class
microwave_oven_factory

# Define a fancy microwave class inheriting from microwave oven
fancy_microwave_oven_factory <- R6Class(
  "FancyMicrowaveOven",
  inherit = microwave_oven_factory
)

# Explore microwave oven classes
microwave_oven_factory
fancy_microwave_oven_factory

# Instantiate both types of microwave
a_microwave_oven <- microwave_oven_factory$new()
a_fancy_microwave <- fancy_microwave_oven_factory$new()

# Get power rating for each microwave
microwave_power_rating <- a_microwave_oven$power_rating_watts
fancy_microwave_power_rating <- a_fancy_microwave$power_rating_watts

# Verify that these are the same
identical(microwave_power_rating, fancy_microwave_power_rating)

# Cook with each microwave
a_microwave_oven$cook(1)
a_fancy_microwave$cook(1)

Embrace, Extend, Override

Simply creating a new class that inherits from another class isn’t useful by itself. What you really want the child class to do is add new functionality.

This can be done in two ways

Override the existing functionality extended from the parent
Extended the class to add brand new functionality

To override the functionality you define elements with the same name as those in the parent. To extend the functionality you simply define new public methods or private data fields.

Public methods can call other public methods by prefixing their name with self$.

# Explore microwave oven class
microwave_oven_factory

# Extend the class definition
fancy_microwave_oven_factory <- R6Class(
  "FancyMicrowaveOven",
  inherit = microwave_oven_factory,
  # Add a public list with a cook baked potato method
  public = list(
    cook_baked_potato = function(){
      
      self$cook(3)
    }
    
    
  )
)

# Instantiate a fancy microwave
a_fancy_microwave <- fancy_microwave_oven_factory$new()

# Call the cook_baked_potato() method
a_fancy_microwave$cook_baked_potato()

Child classes can access public methods from their parent class by prefixing the name with super$.

# Explore microwave oven class
microwave_oven_factory

# Update the class definition
fancy_microwave_oven_factory <- R6Class(
  "FancyMicrowaveOven",
  inherit = microwave_oven_factory,
  # Add a public list with a cook method
  public = list(
    cook = function(time_seconds){
      
      super$cook(time_seconds)
      message("Enjoy your dinner!")
    }
    
  )
)

# Instantiate a fancy microwave
a_fancy_microwave <- fancy_microwave_oven_factory$new()

# Call the cook() method
a_fancy_microwave$cook(1)

Multiple Levels of Inheritance

R6 allows multiple levels of inheritance. But, R6 objects only have access to functionality from their direct parent class. To access functionality across multiple generations the intermediate generations must expose their parents using an active binding. This active binding is conventionally names super_ and simply returns the super object.

thing_factory <- R6Class(
  "Thing",
   public = list(
   do_something = function(){
   message("the parent do_something method")
   }
   )
)

child_thing_factory <- R6Class(
  "ChildThing",
   inherit = thing_factory,
   public = list(
   do_something = function(){
   message("the child do_something method")
   }
   ),
  active = list(
    super_ = function() super
  )
)

grand_child_thing_factory <- R6Class(
  "GrandChildThing",
   inherit = child_thing_factory,
   public = list(
   do_something = function(){
   message("the grand-child do_something method")
     super$do_something()
     super$super_$do_something()
   }
   )
)

a_grand_child_thing <- grand_child_thing_factory$new()
a_grand_child_thing$do_something()

## the grand-child do_something method

## the child do_something method

## the parent do_something method

# Expose the parent functionality
fancy_microwave_oven_factory <- R6Class(
  "FancyMicrowaveOven",
  inherit = microwave_oven_factory,
  public = list(
    cook_baked_potato = function() {
      self$cook(3)
    },
    cook = function(time_seconds) {
      super$cook(time_seconds)
      message("Enjoy your dinner!")
    }
  ),
  # Add an active element with a super_ binding
  active = list(
    super_ = function() super
  )
)

# Instantiate a fancy microwave
a_fancy_microwave <- fancy_microwave_oven_factory$new()

# Call the super_ binding
a_fancy_microwave$super_

# Explore other microwaves
microwave_oven_factory
fancy_microwave_oven_factory

# Define a high-end microwave oven class
high_end_microwave_oven_factory <- R6Class(
 "HighEndMicrowaveOven",
  inherit = fancy_microwave_oven_factory,
  public = list(
    cook = function(time_seconds){
      
      super$super_$cook(time_seconds)
      message(ascii_pizza_slice)
    }
  )
)

# Instantiate a high-end microwave oven
a_high_end_microwave <- high_end_microwave_oven_factory$new()

# Use it to cook for one second
a_high_end_microwave$cook(1)

Advanced R6

Environments, Reference Behavior, & Shared Fields

To create a new environment you call the new.env() function. Unlike lists where it is common to fill them with elements when you create them environments are always created empty and you add their contents afterwards. The syntax for adding variables to an environment is the same as for a list. For example you can use the $ operator or double-square brackets operator.

env <- new.env()
env$x <- pi ^ (1:5)
env[["y"]] <- matrix(month.abb, 3)
ls.str(env)

## x :  num [1:5] 3.14 9.87 31.01 97.41 306.02
## y :  chr [1:3, 1:4] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" ...

There is one way that environments behave differently compared to lists which becomes important when working with R6 Classes. Most R variables use a copying strategy called copy by value i.e. when you copy by value each version of the variable has its own copies of the values. By contrast, environments use copy by reference. This means that when you copy them each version refers to the same copy of the values. R6 classes can take advantage of copying by reference to share data between all instances of a class.

# Assign lst
lst <- list(
  perfect = c(6, 28, 496),
  bases = c("A", "C", "G", "T")
)

# Copy lst
lst2 <- lst
  
# Change lst's bases element
lst$bases[[4]] <- "U"
  
# Test lst and lst2 identical
identical(lst,lst2)

## [1] FALSE

# Assign lst and env
lst <- list(
  perfect = c(6, 28, 496),
  bases = c("A", "C", "G", "T")
)
env <- list2env(lst)

# Copy env
env2 <- env
  
# Change env's bases element
env$bases[[4]] <- "U" 
  
# Test env and env2 identical
identical(env,env2)

## [1] TRUE

There is one simple trick to this which involves defining a private element by convention named shared. The shared element takes several lines of code ot define, so it needs braces. To access the shared fileds you need to use active bindings but this time you need to use private$shared$ prefix.

thing_factory <- R6Class(
  "Thing",
  private = list(
    shared = {
      e = new.env()
      e$a_shared_field = 123
      e
    }
  ),
  active = list(
    a_shared_field = function(value){
      if(missing(value)) {
        private$shared$a_shared_field
      } else {
      private$shared$a_shared_field <- value
      }
    }
  )
)

a_thing <- thing_factory$new()
another_thing <- thing_factory$new()

a_thing$a_shared_field

## [1] 123

another_thing$a_shared_field

## [1] 123

a_thing$a_shared_field <- 456

another_thing$a_shared_field

## [1] 456

# Complete the class definition
microwave_oven_factory <- R6Class(
  "MicrowaveOven",
  private = list(
    shared = {
      # Create a new environment named e
      e <- new.env()
      # Assign safety_warning into e
      e$safety_warning <- "Warning. Do not try to cook metal objects."
      # Return e
      e
    }
  ),
  active = list(
    # Add the safety_warning binding
    safety_warning = function(value) {
      if(missing(value)) {
        private$shared$safety_warning
      } else {
      private$shared$safety_warning <- value
    }
    }
  )
)

# Create two microwave ovens
a_microwave_oven <- microwave_oven_factory$new()
another_microwave_oven <- microwave_oven_factory$new()
  
# Change the safety warning for a_microwave_oven
a_microwave_oven$safety_warning <- "Warning. If the food is too hot you may scald yourself."
  
# Verify that the warning has change for another_microwave
identical(a_microwave_oven$safety_warning, another_microwave_oven$safety_warning)

## [1] TRUE

another_microwave_oven$safety_warning

## [1] "Warning. If the food is too hot you may scald yourself."

Cloning R6 Objects

As you saw earlier environments have special copy by reference behavior. Since R6 objects are built using environments, they also use copy by reference.If you create an object then use assignment to copy it, changing a filed in one object changes it for all objects.

thing_factory <- R6Class(
  "Thing",
  private = list(

      ..a_field = 123

  ),
  active = list(
    a_field = function(value){
      if(missing(value)) {
        private$..a_field
      } else {
      private$..a_field <- value
      }
    }
  )
)

a_thing <- thing_factory$new()

a_copy <- a_thing

a_thing$a_field <- 456

a_copy$a_field

## [1] 456

Sometimes this isn’t the behavior that you want, so all R6 objects have a method named clone() to allow independent copies(or copy by value). You don’t need to define this method yourself it will be automatically generated. To copy the object using the more standard copy by value behavior just call the clone method without any arguments.

a_clone <- a_thing$clone()

a_thing$a_field <- 789

a_clone$a_field

## [1] 456

One special case is when R6 classes contain other R6 classes.

container_factory <- R6Class(
  "Container",
  private = list(
  ..thing = thing_factory$new()
  ),
   active = list(
    thing = function(value){
      if(missing(value)) {
        private$..thing
      } else {
      private$..thing <- value
      }
    }
  )
)

a_container <- container_factory$new()

a_clone <- a_container$clone()

a_container$thing$a_field <- "a new value"

a_clone$thing$a_field

## [1] "a new value"

To use copy by value for the internal R6 object. You need to call clone with the argument deep = TRUE . Because of this changes to thing$a_field aren’t propogated along to deep_copy. So if an R6 object contains other R6 objects you have to pass argument deep = TRUE to provide copy by value behavior for those fields.

a_deep_clone <- a_container$clone(deep = TRUE)

a_container$thing$a_field <- "a different value"

a_deep_clone$thing$a_field

## [1] "a new value"

# Create a microwave oven
a_microwave_oven <- microwave_oven_factory$new()

# Copy a_microwave_oven using <-
assigned_microwave_oven <- a_microwave_oven
  
# Copy a_microwave_oven using clone()
cloned_microwave_oven <- a_microwave_oven$clone()
  
# Change a_microwave_oven's power level  
a_microwave_oven$power_level_watts <- 400
  
# Check a_microwave_oven & assigned_microwave_oven same 
identical(a_microwave_oven$power_level_watts, assigned_microwave_oven$power_level_watts)

# Check a_microwave_oven & cloned_microwave_oven different 
identical(a_microwave_oven$power_level_watts, cloned_microwave_oven$power_level_watts)

If an R6 object contains another R6 object in one or more of its fields, then by default clone() will copy the R6 fields by reference. To copy those R6 fields by value, the clone() method must be called with the argument deep = TRUE.

# Create a microwave oven
a_microwave_oven <- microwave_oven_factory$new()

# Look at its power plug
a_microwave_oven$power_plug

# Copy a_microwave_oven using clone(), no args
cloned_microwave_oven <- a_microwave_oven$clone()
  
# Copy a_microwave_oven using clone(), deep = TRUE
deep_cloned_microwave_oven <- a_microwave_oven$clone(deep = TRUE)
  
# Change a_microwave_oven's power plug type  
a_microwave_oven$power_plug$type <- "British"
  
# Check a_microwave_oven & cloned_microwave power plug types same 
identical(a_microwave_oven$power_plug$type, cloned_microwave_oven$power_plug$type)

# Check a_microwave_oven & deep_cloned_microwave power plug types different 
identical(a_microwave_oven$power_plug$type, deep_cloned_microwave_oven$power_plug$type)

Shut it down

If an R6 objects connects to a database or a file then it can be dangerous to delete it without making sure that you close the connections first. Similarly, if the R6objects has any side effects such as changing global options or changing global plotting parameters, then it is good practise to return those settings back to their previous state.

initialize method customizes behavior when an object is created(customizes startup). Similarly initialize has a counterpart object named finalize that allows custom behavior when an R6 object is destroyed(custom cleanup).

Finalize is always a function with no arguments defined in the public element of an R6 class. When you delete the object of the R6Class finalize method isn’t called immediately. That happends when the object is garbage collected by R’s Memory management system. You can force this to occur by calling the gc() function.

So in summary it is used for cleanup when objects gets destroyed. Also useful for R6Classes that connect to databases or files since it is important that these connections eventually get closed. Finalized gets called when the object us garbage collected by R.

thing_factory <- R6Class(
  "Thing",
  private = list(
   ..a_field = 123
  ),
  public = list(
    initialize = function(a_field){
      if(!missing(a_field)){
        private$a_field = a_field
      }
    },
    finalize = function(){
      message("Finalize this thing")
    }
  )
)

a_thing <- thing_factory$new()

rm(a_thing)

gc()

##           used (Mb) gc trigger (Mb) max used (Mb)
## Ncells  626831 33.5    1244284 66.5  1244284 66.5
## Vcells 1221428  9.4    8388608 64.0  2140529 16.4

library(RSQLite)
database_manager_factory <- R6Class(
 "DatabaseManager",
 private = list(
 conn = NULL
 ),
 public = list(
 initialize = function(a_field) {
 private$conn <- dbConnect("some-database.sqlite")
 }
 )
)
 ,
 finalize = function() {
 dbDisc

# From previous step
smart_microwave_oven_factory <- R6Class(
  "SmartMicrowaveOven",
  inherit = microwave_oven_factory, 
  private = list(
    conn = NULL
  ),
  public = list(
    initialize = function() {
      private$conn <- dbConnect(SQLite(), "cooking-times.sqlite")
    },
    get_cooking_time = function(food) {
      dbGetQuery(
        private$conn,
        sprintf("SELECT time_seconds FROM cooking_times WHERE food = '%s'", food)
      )
    },
    finalize = function() {
      message("Disconnecting from the cooking times database.")
      dbDisconnect(private$conn)
    }
  )
)
a_smart_microwave <- smart_microwave_oven_factory$new()

# Remove the smart microwave
rm(a_smart_microwave)

# Force garbage collection
gc()

Object Oriented Programming with S3 and R6