• What is Object-Oriented Programming?
    • When is OOP a good idea?
    • Nine Systems
    • How does R Distinguish Variables?
    • Assigning Classes
  • Using S3
  • Using R6
  • R6 Inheritance
  • Advanced R6

What is Object-Oriented Programming?

Most of the time when you use R, you use a functional programming style(i.e. you start with some data, to manipulate it you apply a function. This returns some new data then you apply another function, and you repeat this until you get an answer.).

With a functional mindset you typically start with thinking about what you want a function to do, then you worry about the objects that get passed to the functions(i.e. arguments).Finally you worry about objects that come out the other end i.e. return values.

Object Oriented Programming has a different approach. In it you start by thinking about objects that you have to work with(ex. Teapot), Then you think about what data you need to describe the object(ex - you might consider the toal capacity of the teapot and how much liquid is currently stored in the teapot). Next you think about the functionality of that object(ex. the main purpose of the teapot is to pour tea so you add pour function)

In OOP functions are known as methods.(basically methods are functions in Object Oriented context). There are two variable types that are important in OOP.

  1. Lists
  2. Environments

Because these variable types can contain many other variables (& types) you can use them to create many other more complex types. Functional Programming approach is most preferred one for data analysis.

When is OOP a good idea?

It works best when you have limited number of objects that you completely understand behavior of. So OOP is preferred/good for building tools that are used for data analysis but bad for data analysis itself.One of the principles of OOP is that functions can behave differently for different kinds of object(ex. summary function).

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.1103  0.5207  0.9501  1.3263  2.0918  4.6469
##    A    B    C    D    E NA's 
##   10    7    9    6    7   11
##        n             f     
##  Min.   :0.1103   A   :10  
##  1st Qu.:0.5207   B   : 7  
##  Median :0.9501   C   : 9  
##  Mean   :1.3263   D   : 6  
##  3rd Qu.:2.0918   E   : 7  
##  Max.   :4.6469   NA's:11
## 
## Call:
## lm(formula = dist ~ speed, data = cars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -29.069  -9.525  -2.272   9.215  43.201 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -17.5791     6.7584  -2.601   0.0123 *  
## speed         3.9324     0.4155   9.464 1.49e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 15.38 on 48 degrees of freedom
## Multiple R-squared:  0.6511, Adjusted R-squared:  0.6438 
## F-statistic: 89.57 on 1 and 48 DF,  p-value: 1.49e-12

Nine Systems

Out of the nine packages availabale for OOP in R not all are used due to some limitaions. Only two packages are of real use which are S3 and R6.

  1. Knowing how to use S3 is a fundamental R skill.
  2. R6 and ReferenceClasses are powerful OOP frameworks.
  3. S4 is useful for working with Bioconductor.

How does R Distinguish Variables?

str() and class() are the functions used to examine the structure and class of the variable respectively. But sometimes however you need to dig deeper and there are other functions that you need to consider.

  1. typeof() - It returns type of the variable in its internal C language code. This will allow you to check the contents of your variable regardless of the class.
  2. mode()
  3. storage.mode()

Both the functions mode and storage.mode exists solely for their compatibility with the older S code so while you need to know they exists you should never really use them.

##      [,1] [,2] [,3] [,4]
## [1,]    1    4    7   10
## [2,]    2    5    8   11
## [3,]    3    6    9   12
##            [,1]       [,2]       [,3]       [,4]
## [1,] 0.74181895  1.0518520 -0.9652027 -1.0421293
## [2,] 0.04219933 -1.1993195  1.5072984  0.2748198
## [3,] 2.37485335 -0.2832735  0.8662924  0.6323595
## [1] "matrix"
## [1] "matrix"
## [1] "integer"
## [1] "double"

There are some rarer variable types that you may not have come across yet.

  1. array: Generalization of a matrix with an arbitrary number of dimensions.
  2. formula: Used by modeling and plotting functions to define relationships between variables.

Also note that there are three kinds of functions in R.

Most of the functions that you come across are called closures. A few important functions, like length() are known as builtin functions, which use a special evaluation mechanism to make them go faster. Language constructs, like if and while are also functions! They are known as special functions.

## $an_integer_vector
##        class       typeof         mode storage.mode 
##    "integer"    "integer"    "numeric"    "integer" 
## 
## $a_numeric_vector
##        class       typeof         mode storage.mode 
##    "numeric"     "double"    "numeric"     "double" 
## 
## $an_integer_array
##        class       typeof         mode storage.mode 
##      "array"    "integer"    "numeric"    "integer" 
## 
## $a_numeric_array
##        class       typeof         mode storage.mode 
##      "array"     "double"    "numeric"     "double" 
## 
## $a_data_frame
##        class       typeof         mode storage.mode 
## "data.frame"       "list"       "list"       "list" 
## 
## $a_factor
##        class       typeof         mode storage.mode 
##     "factor"    "integer"    "numeric"    "integer" 
## 
## $a_formula
##        class       typeof         mode storage.mode 
##    "formula"   "language"       "call"   "language" 
## 
## $a_closure_function
##        class       typeof         mode storage.mode 
##   "function"    "closure"   "function"   "function" 
## 
## $a_builtin_function
##        class       typeof         mode storage.mode 
##   "function"    "builtin"   "function"   "function" 
## 
## $a_special_function
##        class       typeof         mode storage.mode 
##   "function"    "special"   "function"   "function"

Assigning Classes

Class() function can also be used to override the class of an object along with retrieving the class of the object and without breaking the existing functionality.

Note: Overriding the class doesn’t change the type(), mode(), or storage.mode() of the object(because this is fundamental property of an R object).

In the below example you can see that class overrides the class of an object and not the type of the object.

##  [1] 0.4714675 1.8163304 0.3964557 0.5707266 0.1212334 3.4097801 0.5029485
##  [8] 0.8229876 0.2082152 1.1661194
## [1] "double"
## [1] "random_numbers"

Using S3

Generics and Methods or Function Overload

Previously we saw how summary function behaved differently based on the input parameter/argument type. Having different behaviors for functions under different kinds of input is called as function Overloading(input dependent function behavior). Main purpose is to simplify your code (you might had to learn more functions)

The S3 systems exists entirely to solve this problem. It does this by splitting the function into two parts

  1. generic function (ex. summary, print)
  2. method functions for each class
## function (x, ...) 
## UseMethod("print")
## <bytecode: 0x0000000011f81c38>
## <environment: namespace:base>

As you can see above print() function is really very simple it is only one line long. This is very typical with S3 generic. All the function needs to do is call UseMethod, with its own Name. That is print is passed to UseMethod as a string.

There are two conditions you must follow for S3 methods

  1. Name of each method must be [Name of the generic].[Class of the variable] (ex. print.Date(print method for date objects), summary.factor(summary method for factor objects), unique.array )
  2. Arguments to the method must include all the arguments to the generic

In the below example the arguments to print are x and ellipsis where as arguments to print.Date are the arguments to the generic with an extra MAX argument

## function (x, ...) 
## NULL
## function (x, max = NULL, ...) 
## NULL

The ellipsis argument allows arguments to be passed from one method to another. It is good practise to include an ellipsis argument in both the generic and the methods. All the methods corresponding to generic are completely independent. In the below example you can see that print.function and print.Date are completely unrelated. Becuase S3 requires a dot to separate the name of the generic and the class of the input it is a bad idea to include a dot in the name of your variables. Variable names separated by dots are sometimes are called the leopard case. Don’t use this naming convention. Better conventions are lower_snake_case where lower case words are separted by underscores or lowerCamelCase where first word is lower case and subsequent words start with a capital letter.

## function (x, useSource = TRUE, ...) 
## .Internal(print.function(x, useSource, ...))
## <bytecode: 0x0000000018673f40>
## <environment: namespace:base>
## function (x, max = NULL, ...) 
## {
##     if (is.null(max)) 
##         max <- getOption("max.print", 9999L)
##     if (max < length(x)) {
##         print(format(x[seq_len(max)]), max = max + 1, ...)
##         cat(" [ reached 'max' / getOption(\"max.print\") -- omitted", 
##             length(x) - max, "entries ]\n")
##     }
##     else if (length(x)) 
##         print(format(x), max = max, ...)
##     else cat(class(x)[1L], "of length 0\n")
##     invisible(x)
## }
## <bytecode: 0x0000000018889a60>
## <environment: namespace:base>

What’s in a Name? S3 uses a strict naming convention: all S3 methods have a name of the form generic.class.

The converse is not true: a function can have a name containing a dot without being an S3 method. This is the case with many of the functions that have been around since the early days of the S language. For example, all.equal() is actually an S3 generic, not a method. (This is an example of how leopard.case can be confusing.)

You can check if a function is an S3 generic by calling is_s3_generic() from the pryr package. You can also print it (by typing its name in the console), then looking to see if it calls UseMethod().

Similarly, you can check if a function is an S3 method by calling is_s3_method() from pryr. For example,

## [1] TRUE
## [1] TRUE
## [1] FALSE

Creating a Generic Function You can create your own S3 functions. The first step is to write the generic. This is typically a single line function that calls UseMethod(), passing its name as a string.

The first argument to an S3 generic is usually called x, though this isn’t compulsory. It is also good practice to include a … (“ellipsis”, or “dot-dot-dot”) argument, in case arguments need to be passed from one method to another.

Overall, the structure of an S3 generic looks like this.

Creating an S3 Method

By itself, the generic function doesn’t do anything. For that, you need to create methods, which are just regular functions with two conditions:

The name of the method must be of the form generic.class. The method signature - that is, the arguments that are passed in to the method - must contain the signature of the generic.

The syntax is:

## function(x, ...)
## {
##   UseMethod("get_n_elements")
## }
## [1] 60

Creating an S3 method (2) If no suitable method is found for a generic, then an error is thrown. For example, at the moment, get_n_elements() only has a method available for data.frames. If you pass a matrix to get_n_elements() instead, you’ll see an error.

## Error in UseMethod("get_n_elements"): no applicable method for 'get_n_elements' applied to an object of class "c('matrix', 'logical')"

Rather than having to write dozens of methods for every kind of input, you can create a method that handles all types that don’t have a specific method. This is called the default method; it always has the name generic.default. For example, print.default() will print any type of object that doesn’t have its own print() method.

## a_data_frame : 'data.frame': 50 obs. of  2 variables:
##  $ n: num  0.896 0.199 2.845 0.508 4.474 ...
##  $ f: Factor w/ 5 levels "A","B","C","D",..: 2 NA NA 4 1 4 NA 5 2 4 ...
## a_factor :  Factor w/ 5 levels "A","B","C","D",..: 2 NA NA 4 1 4 NA 5 2 4 ...
## a_linear_model : List of 12
##  $ coefficients : Named num [1:2] -17.58 3.93
##  $ residuals    : Named num [1:50] 3.85 11.85 -5.95 12.05 2.12 ...
##  $ effects      : Named num [1:50] -303.914 145.552 -8.115 9.885 0.194 ...
##  $ rank         : int 2
##  $ fitted.values: Named num [1:50] -1.85 -1.85 9.95 9.95 13.88 ...
##  $ assign       : int [1:2] 0 1
##  $ qr           :List of 5
##  $ df.residual  : int 48
##  $ xlevels      : Named list()
##  $ call         : language lm(formula = dist ~ speed, data = cars)
##  $ terms        :Classes 'terms', 'formula'  language dist ~ speed
##  $ model        :'data.frame':   50 obs. of  2 variables:
## a_numeric_vector :  num [1:50] 0.896 0.199 2.845 0.508 4.474 ...
## get_n_elements : function (x, ...)  
## get_n_elements.data.frame : function (x, ...)  
## int_mat :  int [1:3, 1:4] 1 2 3 4 5 6 7 8 9 10 ...
## n_elements_sleep :  int 60
## num_mat :  num [1:3, 1:4] 0.7418 0.0422 2.3749 1.0519 -1.1993 ...
## some_vars : List of 10
##  $ an_integer_vector : int [1:24] 1 8 3 9 6 7 2 3 6 8 ...
##  $ a_numeric_vector  : num [1:24] 0.2128 0.0198 0.9113 0.9216 0.2297 ...
##  $ an_integer_array  : int [1:2, 1:3, 1:4] 5 4 5 5 3 6 6 5 0 4 ...
##  $ a_numeric_array   : num [1:2, 1:3, 1:4] 1.2425 0.0126 1.6071 0.299 1.1228 ...
##  $ a_data_frame      :'data.frame':  24 obs. of  2 variables:
##  $ a_factor          : Factor w/ 12 levels "Apr","Aug","Dec",..: 5 4 8 1 9 7 6 2 12 11 ...
##  $ a_formula         :Class 'formula'  language y ~ x
##  $ a_closure_function:function (x, ...)  
##  $ a_builtin_function:function (x)  
##  $ a_special_function:.Primitive("if") 
## type_info : function (x)  
## x :  'random_numbers' num [1:10] 0.471 1.816 0.396 0.571 0.121 ...

Methodical Thinking

There are lot of s3 functions in R and now you are going to leanr how to find out what is available. When you have a generic function in R it is often useful to know which methods are available for that generic. To answer this you can use the methods() function. To use it you pass the function or a string naming that function.

## [1] mean.Date     mean.default  mean.difftime mean.POSIXct  mean.POSIXlt 
## [6] mean.quosure*
## see '?methods' for accessing help and source code

What methods are availabe for a given class of an object?. You can find out even this using the methods function using the class argument(with or wthout the quotes)

##  [1] add1           anova          coerce         confint        cooks.distance
##  [6] deviance       drop1          effects        extractAIC     family        
## [11] formula        influence      initialize     logLik         model.frame   
## [16] nobs           predict        print          residuals      rstandard     
## [21] rstudent       show           slotsFromS3    summary        vcov          
## [26] weights       
## see '?methods' for accessing help and source code

Actually methods is more generous with its return value than giving just the S3 methods for a given generic or class. It will return both S3 methods and S4 methods. To find only the S3 methods for a given generic or class use .S3methods function and for s4 use .S4methods.

##  [1] add1           anova          confint        cooks.distance deviance      
##  [6] drop1          effects        extractAIC     family         formula       
## [11] influence      logLik         model.frame    nobs           predict       
## [16] print          residuals      rstandard      rstudent       summary       
## [21] vcov           weights       
## see '?methods' for accessing help and source code
## [1] coerce      initialize  show        slotsFromS3
## see '?methods' for accessing help and source code

Method Lookup for Primitive Generics

For Many data analysis the time consuming tasks are

  1. Writing
  2. Debugging
  3. Maintaining

This means that R is optimized to make these tasks as quick as possible. In some cases however the speed of the code is more important.

  1. Running Code

Functions for whom speed is a critical factor aren’t actually written in R, instead they are written in C. The reason for this is that C code typically runs faster than R code so writing in C increases peroformance. The tradeoff is that C code is longer to write and harder to debug.

R has several interfaces to the C language and the highest performance of these is known as the primitive interface. This is reserved for few fundamental features in Base R. Functions that use the primitive interface are called as Primitive Functions(ex.exp, sin, +, -, for, if).

Primitive functions can also be generic and it is important to note that these behave slightly different than other generic functions. You can see the complet list of primitvie S3 generics using .S3PrimitiveGenerics(30 functions). The big difference between primitive generic and regular generic is what happens when a sutiable method can’t be found.

##  [1] "anyNA"          "as.character"   "as.complex"     "as.double"     
##  [5] "as.environment" "as.integer"     "as.logical"     "as.numeric"    
##  [9] "as.raw"         "c"              "dim"            "dim<-"         
## [13] "dimnames"       "dimnames<-"     "is.array"       "is.finite"     
## [17] "is.infinite"    "is.matrix"      "is.na"          "is.nan"        
## [21] "is.numeric"     "length"         "length<-"       "levels<-"      
## [25] "names"          "names<-"        "rep"            "seq.int"       
## [29] "xtfrm"
## [1] "1970-01-01" "2012-12-21"
## Error in as.Date.default(all_of_time): do not know how to convert 'all_of_time' to class "Date"
## [1] 2

As as.Date is not primitive generic, when you override the class to date_strings no method can be found and an error is thrown. By contrast look at what happens with length function. Length is primitive generic its so important that it shouldn’t break just because the class has changed.

For primitive functions rather than throwing an error when no suitable method is found those functions will directly go directly to C code using typeOf to determine the type of variable/input.

Too Much Class

Variables can have more than one class. In this case ratherthan class being a single string it is a character vector. In the example below the vector if numbers is described using three or more classes. The order of the class is important. The most specific class is first and gradually get less specific as you move from left to right. It is good practise to keep original class as the final class(i.e. numeric).

To test for arbitary classes you can use the general purpose inherits function. As you can see in below example x inherits from triangular_numbers, and from natural_numbers and from numeric.

## [1] TRUE
## Error in is.triangular_numbers(x): could not find function "is.triangular_numbers"
## [1] TRUE
## [1] TRUE
## [1] TRUE

If your object has multiple classes then you can call multiple S3 methods using NextMethod function.

## I'm triangular numbers
## I'm natural numbers
## I'm numeric

Using R6

Object Factory

The R6 system provides a way of storing data and objects within the same variable.

The first step in working with R6 is to create a class generator for each of your objects. A class generator is a template that describes what data can be stored in the object and what functions can be applied to the object. It is also used to create the specified objects. For this reason class generators are called as factories.

Factories are defined using R6Class function. The first argument to the R6 Class is the name of the class. By convention this should be in UpperCamelCase. The second arument is called private which stores object’s data. It is always a list and each of the elements of the list must be named. There are two more arguments public and active which will be discussed later.

The second step to working with R6 is to create some objects. You can do this by calling the new() method of the factory. Since it is a factory you can churn out as many objects as you like.

Hiding Complexity with Encapsulation

In OOP the separating the implementation of the object from its user interface is called Encapsulation. In R6 all the implementation details are stored in the private element of the class. By contrast the user interface details are stored in the element public.

The public element is also specified as a named list and its content are mostly functions.

The data fields in the private elements can be accessed using the prefix private$.

In example below private field door_is_open is accessed in the function open_door using private$door_is_open.

It is also possible to access other public elements of a class using the self$ prefix or (…).

## [1] "Your food is cooked!"

Initialize()

There is one special public method named initialize() (note the American English spelling). This is not called directly by the user. Instead, it is called automatically when an object is created; that is, when the user calls new().

initialize() lets you set the values of the private fields when you create an R6 object. The pattern for an initialize() function is as follows:

Notice the use of missing(). This returns TRUE if an argument wasn’t passed in the function call.

Arguments to the factory’s new() method are passed to initialize().

Getting and Setting with Active Bindings

Data values stored in the private element of an R6 class are not directly acessible by the user. However sometimes you may wish to provide controlled access to these data fields. There are two access cases you may want to retrieve the data field or you may want to change it. In OOP this is known as Getting the data or Setting the data.

In R6 this controlled access to private fields is achieved through Active Bindings. Active Bindings are defined like functions but are accessed like data variables.

Active Bindings are added to the active element of a class. The active element must be a named list. One of the R6 restrictions is that elements of private, public and active must all have different names.

A useful convention to distinguish private and active elements is to start all private fields with a double dot. For you as a programmer this makes the private field stand out so you have a quick visual way of signifying that these variables are not available for consumption by user.

The simplest case is to create a read only active binding. That means that you only want to retrieve a data field rather being able to change it.In this case the function takes no arguemnt and you can simply return the corresponding private field. In the example below the active binding a_field returns the private field ..a_field

Since the a_field binding is a function you can apply/include custom logic. For example if the data field was missing you can return a default value.

## 
## Attaching package: 'assertive'
## The following objects are masked from 'package:pryr':
## 
##     is_s3_generic, is_s3_method

A more complex case is when you want the users to be able to change the value of data field as well. In this case the bidning function should take a single argument, by convention named value. If value is missing the function just returns the private data field as before. However when value is passed to the active binding you need some logic to set the private value

The purpose of active bindings is to allow controlled access to the private fields. This means that you can add custom logic to check the value before you assign it. For example if another_field should only contain a single number you can use assert_is_a_number from the assertive package to check this condition and throw an error if the value is something else. Notice you are accessing it as a data variable although it is a function(no paranthesis at the end).The active binding is called like a data variable, not a function. Since a_field was defined as read-only variable if you try to change it you will get an error.

By contrast you can set another_field but however the logic in the binding states that value must be a single number.

## [1] "a value"
## Error in (function () : unused argument (.Primitive("quote")("a new value"))
## Error in (function (value) : is_a_number : value is not of class 'numeric'; it has class 'character'.
## [1] 800
## [1] 800
## Error in (function (value) : is_a_number : value is not of class 'numeric'; it has class 'character'.
## Error in (function (value) : is_in_closed_range : value are not all in the range [0,800].
## There was 1 failure:
##   Position Value    Cause
## 1        1  1600 too high

R6 Inheritance

Propagating Functionality with Inheritance

Copying and Pasting is really a big source of bugs and usually a sign that you are writing bad code. If you made any changes in the parent class you want those changes to be mirrored in the child class.To implement inheritance, R6 uses inherit argument. The classes that inherits from the original class(parent class) are called as child classes. All the data and the functionality of the parent class is passed to the child class i,e, all the fields from private, public and active elements.You can also add any additioanl functionality to the child.The important thing to remember that inheritance only works in one direction. The parent class does not inherit the traits of its child.

Inheritance means that the methods of the child class are exact copies of those in the parent class and you can add additional methods in the child class.

## [1] "Thing" "R6"
## [1] TRUE
## [1] TRUE
## [1] "ChildThing" "Thing"      "R6"
## [1] TRUE
## [1] TRUE
## [1] TRUE

Embrace, Extend, Override

Simply creating a new class that inherits from another class isn’t useful by itself. What you really want the child class to do is add new functionality.

This can be done in two ways

  1. Override the existing functionality extended from the parent
  2. Extended the class to add brand new functionality

To override the functionality you define elements with the same name as those in the parent. To extend the functionality you simply define new public methods or private data fields.

Public methods can call other public methods by prefixing their name with self$.

Child classes can access public methods from their parent class by prefixing the name with super$.

Multiple Levels of Inheritance

R6 allows multiple levels of inheritance. But, R6 objects only have access to functionality from their direct parent class. To access functionality across multiple generations the intermediate generations must expose their parents using an active binding. This active binding is conventionally names super_ and simply returns the super object.

## the grand-child do_something method
## the child do_something method
## the parent do_something method

Advanced R6

Environments, Reference Behavior, & Shared Fields

To create a new environment you call the new.env() function. Unlike lists where it is common to fill them with elements when you create them environments are always created empty and you add their contents afterwards. The syntax for adding variables to an environment is the same as for a list. For example you can use the $ operator or double-square brackets operator.

## x :  num [1:5] 3.14 9.87 31.01 97.41 306.02
## y :  chr [1:3, 1:4] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" ...

There is one way that environments behave differently compared to lists which becomes important when working with R6 Classes. Most R variables use a copying strategy called copy by value i.e. when you copy by value each version of the variable has its own copies of the values. By contrast, environments use copy by reference. This means that when you copy them each version refers to the same copy of the values. R6 classes can take advantage of copying by reference to share data between all instances of a class.

## [1] FALSE
## [1] TRUE

There is one simple trick to this which involves defining a private element by convention named shared. The shared element takes several lines of code ot define, so it needs braces. To access the shared fileds you need to use active bindings but this time you need to use private$shared$ prefix.

## [1] 123
## [1] 123
## [1] 456
## [1] TRUE
## [1] "Warning. If the food is too hot you may scald yourself."

Cloning R6 Objects

As you saw earlier environments have special copy by reference behavior. Since R6 objects are built using environments, they also use copy by reference.If you create an object then use assignment to copy it, changing a filed in one object changes it for all objects.

## [1] 456

Sometimes this isn’t the behavior that you want, so all R6 objects have a method named clone() to allow independent copies(or copy by value). You don’t need to define this method yourself it will be automatically generated. To copy the object using the more standard copy by value behavior just call the clone method without any arguments.

## [1] 456

One special case is when R6 classes contain other R6 classes.

## [1] "a new value"

To use copy by value for the internal R6 object. You need to call clone with the argument deep = TRUE . Because of this changes to thing$a_field aren’t propogated along to deep_copy. So if an R6 object contains other R6 objects you have to pass argument deep = TRUE to provide copy by value behavior for those fields.

## [1] "a new value"

If an R6 object contains another R6 object in one or more of its fields, then by default clone() will copy the R6 fields by reference. To copy those R6 fields by value, the clone() method must be called with the argument deep = TRUE.

Shut it down

If an R6 objects connects to a database or a file then it can be dangerous to delete it without making sure that you close the connections first. Similarly, if the R6objects has any side effects such as changing global options or changing global plotting parameters, then it is good practise to return those settings back to their previous state.

initialize method customizes behavior when an object is created(customizes startup). Similarly initialize has a counterpart object named finalize that allows custom behavior when an R6 object is destroyed(custom cleanup).

Finalize is always a function with no arguments defined in the public element of an R6 class. When you delete the object of the R6Class finalize method isn’t called immediately. That happends when the object is garbage collected by R’s Memory management system. You can force this to occur by calling the gc() function.

So in summary it is used for cleanup when objects gets destroyed. Also useful for R6Classes that connect to databases or files since it is important that these connections eventually get closed. Finalized gets called when the object us garbage collected by R.

##           used (Mb) gc trigger (Mb) max used (Mb)
## Ncells  626831 33.5    1244284 66.5  1244284 66.5
## Vcells 1221428  9.4    8388608 64.0  2140529 16.4