Introduction to R for Finance

Lore Dirick - DataCamp

Course Description

In this finance-oriented introduction to R, you will learn essential data structures such as lists and data frames and have the chance to apply that knowledge to real-world financial examples. By the end of the course, you will be comfortable with the basics of manipulating your data to perform financial analysis in R.

1 The Basics

Get comfortable with the very basics of R and learn how to use it as a calculator. Also, create your first variables in R and explore some of the base data types such as numerics and characters.

1.1 Welcome to Introduction to R for Finance!

1.1.1 Your first R script

Welcome! In the script to the right you will type R code to solve the exercises. When you hit the Submit Answer button, every line of code in the script is executed by R and you get a message that indicates whether or not your code was correct. The output of your submission is shown in the R console.

You can also execute code directly in the R Console. When you type in the console, your submission will not be checked for correctness! Try, for example, to type in 3 + 4 and hit Enter. R should return [1] 7.

An addition example has already been created for you.

# Addition!
3 + 5

## [1] 8

Add another line of code in the script to calculate the difference of 6 and 4.

# Subtraction!
6 - 4

## [1] 2

Note: Check out the # symbol in the script! This denotes a comment in your code. Comments are a great way to document your code, and are not run when you submit your answer.

One down, many more to go! Great job!

1.1.2 Arithmetic in R (1)

Let’s play around with your new calculator. First, check out these arithmetic operators, most of them should look familiar:

Addition: +

Subtraction: -

Multiplication: *

Division: /

Exponentiation: ^ or **

Modulo: %%

You might be unfamiliar with the last two. The ^ operator raises the number to its left to the power of the number to its right. For example, 3^2 is 9. The modulo returns the remainder of the division of the number to the left by the number on the right, for example 5 modulo 3 or 5 %% 3 is 2.

Lastly, there is another useful way to execute your code besides typing in the R Console or pressing Submit Answer. Clicking on a line of code in the script, and then pressing Command + Enter will execute just that line in the R Console. Try it out with the 2 + 2 line already in the script!

Some examples for addition, subtraction, and multiplication are shown for you.

# Addition 
2 + 2

## [1] 4

# Subtraction
4 - 1

## [1] 3

# Multiplication
3 * 4

## [1] 12

Type 4 / 2 in the script to perform division.

# Division
4 / 2

## [1] 2

Type 2^4 to raise 2 to the power of 4.

# Exponentiation
2^4

## [1] 16

Type 7 %% 3 to calculate 7 modulo 3.

# Modulo
7 %% 3

## [1] 1

Don’t forget to press Submit Answer when you finish!

You’re crushing it!

1.1.3 Arithmetic in R (2)

The order in which you perform your mathematical operations is critical to get the correct answer. The correct sequence of “order of operation” is:

Parenthesis, Exponentiation, Multiplication and Division, Addition and Subtraction

Or PEMDAS for short!

This means that when you come along the expression: 20 - 8 * 2 , you know to do the multiplication first, then the subtraction, to get the correct answer of 4.

Which of these expressions would evaluate to 6?

4 + 8 / 2
(14 - 2) / 2
(2^3 * 2) / 4
6 - 3 * 2

That’s it! Isn’t PEMDAS great?

1.1.4 Assignment and variables (1)

It looks like you’re becoming an expert at using R as a calculator! Time to take it one step further. These numbers you are calculating haven’t been very descriptive. 5? 5 what? 5 apples? 5 monkeys? What if you could assign that 5 a descriptive name like number_of_apples, and then simply type that name whenever you want to use 5? Enter, variables.

A variable allows you to store a value or an object in R. You can then later use this variable’s name to easily access the value or the object that is stored within this variable. You use <- to assign a variable:

my_money <- 100

Assign a value of 200 to the savings variable in the script.

# Assign 200 to savings
savings <- 200

Press Submit Answer and note how simply typing savings in the script asks R prints the value to the console!

# Print the value of savings to the console
savings

## [1] 200

Excellent job creating a meaningfully named variable and printing it to the console (which, in R, you can do without any call to a print function!)

1.1.5 Assignment and variables (2)

Suppose you have $100 stored in my_money, and your friend Dan has $200 dollars. To be clear, you decide to give Dan’s money a variable name too. You want to know how much money the two of you have together. Now that each variable has a descriptive name, this is easy using the arithmetic you learned earlier:

my_money + dans_money

my_money has been defined for you.

# Assign 100 to my_money
my_money <- 100

Assign 200 to Dan’s money.

# Assign 200 to dans_money
dans_money <- 200

Follow the example in the exercise text and add your money to Dan’s money.

# Add my_money and dans_money
my_money + dans_money

## [1] 300

Add your money to Dan’s money again, but this time save the result to our_money!

# Add my_money and dans_money again, save the result to our_money
our_money <- my_money + dans_money

Now you’re getting it!!

1.2 Financial returns

1.2.1 Financial returns (1)

Time for some application! Earlier, Lore taught you about financial returns. Now, its time for you to put that knowledge to work! But first, a quick review.

Assume you have $100. During January, you make a 5% return on that money. How much do you have at the end of January? Well, you have 100% of your starting money, plus another 5%: 100% + 5% = 105%. In decimals, this is 1 + .05 = 1.05. This 1.05 is the return multiplier for January, and you multiply your original $100 by it to get the amount you have at the end of January.

105 = 100 * 1.05

Or in terms of variables:

post_jan_cash <- starting_cash * jan_mult

A quick way to get the multiplier is:

multiplier = 1 + (return / 100)

Your new starting cash, January’s return, and January’s return multiplier have been defined for you.

# Variables for starting_cash and 5% return during January
starting_cash <- 200
jan_ret <- 5
jan_mult <- 1 + (jan_ret / 100)

Use them to calculate post_jan_cash.

# How much money do you have at the end of January?
post_jan_cash <- starting_cash * jan_mult

Print post_jan_cash.

# Print post_jan_cash
post_jan_cash

## [1] 210

What if the return for January was 10%? Calculate the new jan_mult_10.

# January 10% return multiplier
jan_ret_10 <- 10
jan_mult_10 <- 1 + (jan_ret_10 / 100)

Calculate post_jan_cash_10 using the new multiplier!

# How much money do you have at the end of January now?
post_jan_cash_10 <- starting_cash * jan_mult_10

Print post_jan_cash_10 to see the impact of different interest rates!

# Print post_jan_cash_10
post_jan_cash_10

## [1] 220

Great! Wouldn’t it be nice to always have 10% returns?

1.2.2 Financial returns (2)

Let’s make you some more money. If, in February, you earn another 2% on your cash, how would you calculate the total amount at the end of February? You already know that the amount at the end of January is $100 * 1.05 = $105. To get from the end of January to the end of February, just use another multiplier!

$105 * 1.02 = $107.1

Which is equivalent to:

$100 * 1.05 * 1.02 = $107.1

In this last form, you see the effect of both multipliers on your original $100. In fact, this form can help you find the total return over both months. The correct way to do this is by multiplying the two multipliers together: 1.05 * 1.02 = 1.071. This means you earned 7.1% in total over the 2 month period.

Your starting cash, and the returns for January and February have been given.

# Starting cash and returns 
starting_cash <- 200
jan_ret <- 4
feb_ret <- 5

Use them to calculate the January and February return multipliers: jan_mult and feb_mult.

# Multipliers
jan_mult <- 1 + (jan_ret / 100)
feb_mult <- 1 + (feb_ret / 100)

Use those multipliers and starting_cash to find your total_cash at the end of the two months.

# Total cash at the end of the two months
total_cash <- starting_cash * jan_mult * feb_mult

Print total_cash to see how your money has grown!

# Print total_cash
total_cash

## [1] 218.4

Fantastic! It feels good to make some money.

1.3 Basic data types

1.3.1 Data type exploration

To get started, here are some of R’s most basic data types:

Numerics are decimal numbers like 4.5. A special type of numeric is an integer, which is a numeric without a decimal piece. Integers must be specified like 4L.

Logicals are the boolean values TRUE and FALSE. Capital letters are important here; true and false are not valid.

Characters are text values like “hello world”.

Assign the numeric 150.45 to apple_stock.

# Apple's stock price is a numeric
apple_stock <- 150.45

Assign the character “AAA” to credit_rating.

# Bond credit ratings are characters
credit_rating <- "AAA"

Answer the final question with either TRUE or FALSE, we won’t judge!

# You like the stock market. TRUE or FALSE?
my_answer <- TRUE

Print my_answer!

# Print my_answer
my_answer

## [1] TRUE

Great job!

1.3.2 What’s that data type?

Up until now, you have been determining what data type a variable is just by looks. There is actually a better way to check this.

class(my_var)

This will return the data type (or class) of whatever variable you pass in.

The variables a, b, and c have already been defined for you. You can type ls() in the console at any time to “list” the variables currently available to you. Use the console, and class() to decide which statement below is correct.

a is a numeric, b is a character, c is a logical
a is a logical, b is a numeric, c is a character
a is a numeric, b is a numeric, c is a logical
a is a character, b is a character, c is a numeric

2 Vectors and Matrices

In this chapter, you will learn all about vectors and matrices using historical stock prices for companies like Apple and IBM. You will then be able to feel confident creating, naming, manipulating, and selecting from vectors and matrices.

2.1 What is a vector?

2.1.1 c()ombine

Now is where things get fun! It is time to create your first vector. Since this is a finance oriented course, it is only appropriate that your first vector be a numeric vector of stock prices. Remember, you create a vector using the combine function, c(), and each element you add is separated by a comma.

For example, this is a vector of Apple’s stock prices from December, 2016:

apple_stock <- c(109.49, 109.90, 109.11, 109.95, 111.03, 112.12)

And this is a character vector of bond credit ratings:

credit_rating <- c(“AAA”, “AA”, “BBB”, “BB”, “B”)

Another example of a numeric vector for IBM stock prices is shown for you.

# Another numeric vector
ibm_stock <- c(159.82, 160.02, 159.84)

Create a character vector of the finance related words “stocks”, “bonds”, and “investments”, in that order.

# Another character vector
finance <- c("stocks", "bonds", "investments")

Create a logical vector of TRUE, FALSE, TRUE in that order.

# A logical vector
logic <- c(TRUE, FALSE, TRUE)

Great job! You will use c() in almost all of the exercises!

2.1.2 Coerce it

It is important to remember that a vector can only be composed of one data type. This means that you cannot have both a numeric and a character in the same vector. If you attempt to do this, the lower ranking type will be coerced into the higher ranking type.

For example: c(1.5, “hello”) results in c(“1.5”, “hello”) where the numeric 1.5 has been coerced into the character data type.

The hierarchy for coercion is:

logical < integer < numeric < character

Logicals are coerced a bit differently depending on what the highest data type is. c(TRUE, 1.5) will return c(1, 1.5) where TRUE is coerced to the numeric 1 (FALSE would be converted to a 0). On the other hand, c(TRUE, “this_char”) is converted to c(“TRUE”, “this_char”).

The vectors a, b, and c have been defined for you from the following commands:

a <- c(1L , “I am a character”)

b <- c(TRUE, “Hello”)

c <- c(FALSE, 2)

Which statement is correct about type conversion?

a is a character vector, b is an logical vector, c is a numeric vector.
a is an integer vector, b is an character vector, c is a logical vector.
a is a character vector, b is a character vector, c is a numeric vector.

Awesome! Just remember, one type per vector!

2.1.3 Vector names()

Let’s return to the example about January and February’s returns. As a refresher, in January you earned a 5% return, and in February, an extra 2% return. Being the savvy data scientist you are, you realize that you can put these returns into a vector! That would look something like this:

ret <- c(5, 2)

This is great! Now all of the returns are in one place. However, you could go one step further by adding names to each return in your vector. You do this using names(). Check this out:

names(ret) <- c(“Jan”, “Feb”)

Printing ret now returns:

Jan Feb 
5   2

Pretty cool, right?

Defined for you are a vector of 12 monthly returns, and a vector of month names.

# Vectors of 12 months of returns, and month names
ret <- c(5, 2, 3, 7, 8, 3, 5, 9, 1, 4, 6, 3)
months <- c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")

Add months as names to ret to create a more descriptive vector.

# Add names to ret
names(ret) <- months

Print out ret to see the newly named vector!

# Print out ret to see the new names!
ret

## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 
##   5   2   3   7   8   3   5   9   1   4   6   3

Aren’t vectors fun?

2.1.4 Visualize your vector

Time to try something a bit different. So far, you have been programming in the script, and looking at your data by printing it out. For a more informative visualization, try a plot!

For this exercise, you will again be working with some Apple stock data. This time it contains the prices for all of December, 2016.

The plot() function is one of the many ways to create a graph from your data in R. Passing in a vector will add its values to the y-axis of the graph, and on the x-axis will be an index created from the order that your vector is in.

Inside of plot(), you can change the type of your graph using type =. The default is “p” for points, but you can also change it to “l” for line.

apple_stock has already been defined, and everything has been set up for you. Try running the script line-by-line using Command + Enter on Mac or Control + Enter on Windows while clicked on each line.

# Look at the data
apple_stock

## [1] 150.45

# Plot the data as a scatter plot
plot(apple_stock)

# Plot the data as a line graph
plot(apple_stock, type = "l")

Well done!

2.2 Vector manipulation

2.2.1 Weighted average (1)

As a finance professional, there are a number of important calculations that you will have to know. One of these is the weighted average. The weighted average allows you to calculate your portfolio return over a time period. Consider the following example:

Assume you have 40% of your cash in Apple stock, and 60% of your cash in IBM stock. If, in January, Apple earned 5% and IBM earned 7%, what was your total portfolio return?

To calculate this, take the return of each stock in your portfolio, and multiply it by the weight of that stock. Then sum up all of the results. For this example, you would do:

6.2 = 5 * .4 + 7 * .6

Or, in variable terms:

portf_ret <- apple_ret * apple_weight + ibm_ret * ibm_weight

Weights and returns for Microsoft and Sony have been defined for you.

# Weights and returns
micr_ret <- 7
sony_ret <- 9
micr_weight <- .2
sony_weight <- .8

Calculate the portf_ret for this porfolio.

# Portfolio return
portf_ret <- micr_ret * micr_weight + sony_ret * sony_weight

Finance + R = The greatest thing the universe has ever invented.

2.2.2 Weighted average (2)

Wait a minute, Lore taught us a much better way to do this! Remember, R does arithmetic with vectors! Can you take advantage of this fact to calculate the portfolio return more efficiently? Think carefully about the following code:

ret <- c(5, 7)
weight <- c(.4, .6)

ret_X_weight <- ret * weight

sum(ret_X_weight)

[1] 6.2

First, calculate ret * weight, which multiplies each element in the vectors together to create a new vector ret_X_weight. All you need to do then is add up the pieces, so you use sum() to sum up each element in the vector.

Now its your turn!

ret and weight for Microsoft and Sony are defined for you again, but this time, in vector form!

# Weights, returns, and company names
ret <- c(7, 9)
weight <- c(.2, .8)
companies <- c("Microsoft", "Sony")

Add company names to your ret and weight vectors.

# Assign company names to your vectors
names(ret) <- companies
names(weight) <- companies

Use vectorized arithmetic to multiply ret and weight together.

# Multiply the returns and weights together 
ret_X_weight <- ret * weight

Print ret_X_weight to see the results.

# Print ret_X_weight
ret_X_weight

## Microsoft      Sony 
##       1.4       7.2

Use sum() to get the total portf_ret.

# Sum to get the total portfolio return
portf_ret <- sum(ret_X_weight)

Print portf_ret and compare to the last exercise!

# Print portf_ret
portf_ret

## [1] 8.6

See! Financial math isn’t that hard!

2.2.3 Weighted average (3)

Let’s look at an example of recycling. What if you wanted to give equal weight to your Microsoft and Sony stock returns? That is, you want to be invested 50% in Microsoft and 50% in Sony.

ret <- c(7, 9)

weight <- .5

ret_X_weight <- ret * weight

ret_X_weight

[1] 3.5 4.5

ret is a vector of length 2, and weight is a vector of length 1. R reuses the .5 in weight twice to make it the same length of ret, then performs the element-wise arithmetic.

A named vector, ret, containing the returns of 3 stocks is in your workspace.

Print ret to see the returns of your 3 stocks.

# Print ret
ret

## Microsoft      Sony 
##         7         9

Assign the value of 1/3 to weight. This will be the weight that each stock receives.

# Assign 1/3 to weight
weight <- 1/3

Create ret_X_weight by multiplying ret and weight. See how R recycles weight?

# Create ret_X_weight
ret_X_weight <- ret * weight

sum() the ret_X_weight variable to create your equally weighted portf_ret.

# Calculate your portfolio return
portf_ret <- sum(ret_X_weight)

Run the last line of code multiplying a vector of length 3 by a vector of length 2. R reuses the 1st value of the vector of length 2, but notice the warning!

# Vector of length 3 * Vector of length 2?
ret * c(.2, .6)

## Microsoft      Sony 
##       1.4       5.4

Awesome! Recycling makes multiplying vectors by numbers like .5 easy to understand!

2.2.4 Vector subsetting

Sometimes, you will only want to use specific pieces of your vectors, and you’ll need some way to access just those parts. For example, what if you only wanted the first month of returns from the vector of 12 months of returns? To solve this, you can subset the vector using [ ].

Here is the 12 month return vector:

ret <- c(5, 2, 3, 7, 8, 3, 5, 9, 1, 4, 6, 3)

Select the first month: ret[1].

Select the first month by name: ret[“Jan”].

Select the first three months: ret[1:3] or ret[c(1, 2, 3)].

The named vector ret is defined in your workspace.

Subset the first 6 months of returns.

# First 6 months of returns
ret[1:6]

## Microsoft      Sony      <NA>      <NA>      <NA>      <NA> 
##         7         9        NA        NA        NA        NA

Subset only March and May’s returns using c() and “Mar”, “May”.

# Just March and May
ret[c("Mar","May")]

## <NA> <NA> 
##   NA   NA

Run the last line of code to perform a subset that omits the first month of returns.

# Omit the first month of returns
ret[-1]

## Sony 
##    9

Well done!

2.3 Matrix - a 2D vector

2.3.1 Create a matrix!

Matrices are similar to vectors, except they are in 2 dimensions! Let’s create a 2x2 matrix “by hand” using matrix().

matrix(data = c(2, 3, 4, 5), nrow = 2, ncol = 2)

     [,1] [,2]
[1,]    2    4
[2,]    3    5

Notice that the actual data for the matrix is passed in as a vector using c(), and is then converted to a matrix by specifying the number of rows and columns (also known as the dimensions).

Because the matrix is just created from a vector, the following is equivalent to the above code.

my_vector <- c(2, 3, 4, 5)

matrix(data = my_vector, nrow = 2, ncol = 2)

my_vector has been defined for you.

# A vector of 9 numbers
my_vector <- c(1, 2, 3, 4, 5, 6, 7, 8, 9)

Replace the ___ to create a 3x3 matrix from my_vector.

# 3x3 matrix
my_matrix <- matrix(data = my_vector, nrow = 3, ncol = 3)

Print my_matrix.

# Print my_matrix
my_matrix

##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2    5    8
## [3,]    3    6    9

By default, matrices fill down each row. Run the code in the last example and note how the matrix fills across by using byrow = TRUE. Compare this to the example given above.

# Filling across using byrow = TRUE
matrix(data = c(2, 3, 4, 5), nrow = 2, ncol = 2, byrow = TRUE)

##      [,1] [,2]
## [1,]    2    3
## [2,]    4    5

Awesome! You just created your first matrix! You’re becoming a true R wizard.

2.3.2 Matrix <- bind vectors

Often, you won’t be creating vectors like we did in the last example. Instead, you will create them from multiple vectors that you want to combine together. For this, it is easiest to use the functions cbind() and rbind() (column bind and row bind respectively). To see these in action, let’s combine two vectors of Apple and IBM stock prices:

apple <- c(109.49, 109.90, 109.11, 109.95, 111.03)
ibm <- c(159.82, 160.02, 159.84, 160.35, 164.79)

cbind(apple, ibm)

      apple    ibm
[1,] 109.49 159.82
[2,] 109.90 160.02
[3,] 109.11 159.84
[4,] 109.95 160.35
[5,] 111.03 164.79

rbind(apple, ibm)

        [,1]   [,2]   [,3]   [,4]   [,5]
apple 109.49 109.90 109.11 109.95 111.03
ibm   159.82 160.02 159.84 160.35 164.79

Now its your turn!

# edited by cliex159
apple <- c(109.49, 109.90, 109.11, 109.95, 111.03, 112.12, 113.95, 113.30, 115.19, 115.19, 115.82, 115.97, 116.64, 116.95, 117.06, 116.29, 116.52, 117.26, 116.76, 116.73, 115.82)
ibm <- c(159.82, 160.02, 159.84, 160.35, 164.79, 165.36, 166.52, 165.50, 168.29, 168.51, 168.02, 166.73, 166.68, 167.60, 167.33, 167.06, 166.71, 167.14, 166.19, 166.60, 165.99)
micr <- c(59.20, 59.25, 60.22, 59.95, 61.37, 61.01, 61.97, 62.17, 62.98, 62.68, 62.58, 62.30, 63.62, 63.54, 63.54, 63.55, 63.24, 63.28, 62.99, 62.90, 62.14)

The apple, ibm, and micr stock price vectors from December, 2016 are in your workspace.

Use cbind() to column bind apple, ibm, and micr together, in that order, as cbind_stocks.

# cbind the vectors together
cbind_stocks <- cbind(apple, ibm, micr)

Print cbind_stocks.

# Print cbind_stocks
cbind_stocks

##        apple    ibm  micr
##  [1,] 109.49 159.82 59.20
##  [2,] 109.90 160.02 59.25
##  [3,] 109.11 159.84 60.22
##  [4,] 109.95 160.35 59.95
##  [5,] 111.03 164.79 61.37
##  [6,] 112.12 165.36 61.01
##  [7,] 113.95 166.52 61.97
##  [8,] 113.30 165.50 62.17
##  [9,] 115.19 168.29 62.98
## [10,] 115.19 168.51 62.68
## [11,] 115.82 168.02 62.58
## [12,] 115.97 166.73 62.30
## [13,] 116.64 166.68 63.62
## [14,] 116.95 167.60 63.54
## [15,] 117.06 167.33 63.54
## [16,] 116.29 167.06 63.55
## [17,] 116.52 166.71 63.24
## [18,] 117.26 167.14 63.28
## [19,] 116.76 166.19 62.99
## [20,] 116.73 166.60 62.90
## [21,] 115.82 165.99 62.14

Use rbind() to row bind the three vectors together, in the same order, as rbind_stocks.

# rbind the vectors together
rbind_stocks <- rbind(apple, ibm, micr)

Print rbind_stocks.

# Print rbind_stocks
rbind_stocks

##         [,1]   [,2]   [,3]   [,4]   [,5]   [,6]   [,7]   [,8]   [,9]  [,10]
## apple 109.49 109.90 109.11 109.95 111.03 112.12 113.95 113.30 115.19 115.19
## ibm   159.82 160.02 159.84 160.35 164.79 165.36 166.52 165.50 168.29 168.51
## micr   59.20  59.25  60.22  59.95  61.37  61.01  61.97  62.17  62.98  62.68
##        [,11]  [,12]  [,13]  [,14]  [,15]  [,16]  [,17]  [,18]  [,19]  [,20]
## apple 115.82 115.97 116.64 116.95 117.06 116.29 116.52 117.26 116.76 116.73
## ibm   168.02 166.73 166.68 167.60 167.33 167.06 166.71 167.14 166.19 166.60
## micr   62.58  62.30  63.62  63.54  63.54  63.55  63.24  63.28  62.99  62.90
##        [,21]
## apple 115.82
## ibm   165.99
## micr   62.14

The functions cbind() and rbind() are pretty common. They also work with data frames, which you’ll learn about in the next chapter!

2.3.3 Visualize your matrix

Similar to vectors, we can visualize our matrix to gain some insights about the relationships in the data.

In this exercise, you will plot the matrix of Apple and Microsoft stock prices to see the relationship between the two companies’ stock prices during December, 2016.

# edited by cliex159
apple_micr_matrix <- cbind(apple, micr)

The matrix apple_micr_matrix is available in your workspace.

First, print out apple_micr_matrix to get a look at the data.

# View the data
apple_micr_matrix

##        apple  micr
##  [1,] 109.49 59.20
##  [2,] 109.90 59.25
##  [3,] 109.11 60.22
##  [4,] 109.95 59.95
##  [5,] 111.03 61.37
##  [6,] 112.12 61.01
##  [7,] 113.95 61.97
##  [8,] 113.30 62.17
##  [9,] 115.19 62.98
## [10,] 115.19 62.68
## [11,] 115.82 62.58
## [12,] 115.97 62.30
## [13,] 116.64 63.62
## [14,] 116.95 63.54
## [15,] 117.06 63.54
## [16,] 116.29 63.55
## [17,] 116.52 63.24
## [18,] 117.26 63.28
## [19,] 116.76 62.99
## [20,] 116.73 62.90
## [21,] 115.82 62.14

Use plot() to create a scatter plot of Microsoft VS Apple stock prices.

# Scatter plot of Microsoft vs Apple
plot(apple_micr_matrix)

Visualizations are critical to help you understand your data!

2.3.4 cor()relation

Did you notice the relationship between the two stocks? It seems that when Apple’s stock moves up, Microsoft’s does as well. One way to capture this kind of relationship is by finding the correlation between the two stocks. Correlation is a measure of association between two things, here, stock prices, and is represented by a number from -1 to 1. A 1 represents perfect positive correlation, a -1 represents perfect negative correlation, and 0 correlation means that the stocks move independently of each other. Correlation is a common metric in finance, and it is useful to know how to calculate it in R.

The cor() function will calculate the correlation between two vectors, or will create a correlation matrix when given a matrix.

cor(apple, micr)
[1] 0.9477011

cor(apple_micr_matrix)

          apple      micr
apple 1.0000000 0.9477011
micr  0.9477011 1.0000000

cor(apple, micr) simply returned the correlation between the two stocks. A large correlation of .9477 hints that Apple and Microsoft’s stock prices move closely together. cor(apple_micr_matrix) returned a matrix that shows all of the possible pairwise correlations. The top left correlation of 1 is the correlation of Apple with itself, which makes sense!

The vectors of stock prices for apple, micr, and ibm are in your workspace.

Calculate the correlation between apple and ibm.

# Correlation of Apple and IBM
cor(apple, ibm)

## [1] 0.8872467

Create a matrix of apple, micr, and ibm, in that order, named stocks using cbind().

# stock matrix
stocks <- cbind(apple, micr, ibm)

Try to run the code for the correlation of all three stocks. Notice how it fails when using more than 2 vectors!

Rewrite the failing code to use the stocks matrix instead. Correlation matrices are very powerful when you have many stocks!

# cor() of all three
cor(stocks)

##           apple      micr       ibm
## apple 1.0000000 0.9477010 0.8872467
## micr  0.9477010 1.0000000 0.9126597
## ibm   0.8872467 0.9126597 1.0000000

Great job! Correlations are a popular topic in finance. Ask Google for more information if you are interested.

2.3.5 Matrix subsetting

Just like vectors, matrices can be selected from and subsetted! To do this, you will again use [ ], but this time it will have two inputs. The basic structure is:

my_matrix[row, col]

Then: To select the first row and first column of stocks from the last example: stocks[1,1]

To select the entire first row, leave the col empty: stocks[1, ]

To select the first two rows: stocks[1:2, ] or stocks[c(1,2), ]

To select an entire column, leave the row empty: stocks[, 1]

You can also select an entire column by name: stocks[, “apple”]

stocks is in your workspace.

Select the third row of stocks.

# Third row
stocks[3, ]

##  apple   micr    ibm 
## 109.11  60.22 159.84

Select the fourth and fifth row of the ibm column of stocks.

# Fourth and fifth row of the ibm column
stocks[4:5, "ibm"]

## [1] 160.35 164.79

Select the apple and micr columns from stocks using c() inside the brackets.

# apple and micr columns
stocks[ , c("apple","micr")]

##        apple  micr
##  [1,] 109.49 59.20
##  [2,] 109.90 59.25
##  [3,] 109.11 60.22
##  [4,] 109.95 59.95
##  [5,] 111.03 61.37
##  [6,] 112.12 61.01
##  [7,] 113.95 61.97
##  [8,] 113.30 62.17
##  [9,] 115.19 62.98
## [10,] 115.19 62.68
## [11,] 115.82 62.58
## [12,] 115.97 62.30
## [13,] 116.64 63.62
## [14,] 116.95 63.54
## [15,] 117.06 63.54
## [16,] 116.29 63.55
## [17,] 116.52 63.24
## [18,] 117.26 63.28
## [19,] 116.76 62.99
## [20,] 116.73 62.90
## [21,] 115.82 62.14

3 Data Frames

Arguably the most important data structure in R, the data frame is what most of your data will take the form of. It combines the structure of a matrix with the flexibility of having different types of data in each column.

3.1 What is a data frame?

3.1.1 Create your first data.frame()

Data frames are great because of their ability to hold a different type of data in each column. To get started, let’s use the data.frame() function to create a data frame of your business’s future cash flows. Here are the variables that will be in the data frame:

company - The company that is paying you the cash flow (A or B).

cash_flow - The amount of money a company will receive.

year - The number of years from now that you receive the cash flow.

To create the data frame, you do the following:

data.frame(company = c("A", "A", "B"), cash_flow = c(100, 200, 300), year = c(1, 3, 2))

  company cash_flow year
1       A       100    1
2       A       200    3
3       B       300    2

Like matrices, data frames are created from vectors, so this code would have also worked:

company <- c("A", "A", "B")
cash_flow <- c(100, 200, 300)
year <- c(1, 3, 2)

data.frame(company, cash_flow, year)

New company, cash_flow, and year variables have been defined for you.

# Variables
company <- c("A", "A", "A", "B", "B", "B", "B")
cash_flow <- c(1000, 4000, 550, 1500, 1100, 750, 6000)
year <- c(1, 3, 4, 1, 2, 4, 5)

Create another data frame containing company, cash_flow, and year in that order. Assign it to cash You will use this data frame throughout the rest of the chapter!

# Data frame
cash <- data.frame(company, cash_flow, year)

Print out cash to get a look at your shiny new data frame.

# Print cash
cash

##   company cash_flow year
## 1       A      1000    1
## 2       A      4000    3
## 3       A       550    4
## 4       B      1500    1
## 5       B      1100    2
## 6       B       750    4
## 7       B      6000    5

Great job creating your first data frame!

3.1.2 What goes in a data frame?

Knowledge test! What kind of vectors can you not create a data frame from?

Awesome! Data frames can be built from vectors of any of the base types!

3.1.3 Making head()s and tail()s of your data with some str()ucture

Time to introduce a few simple, but very useful functions.

head() - Returns the first few rows of a data frame. By default, 6. To change this, use head(cash, n = ___)

tail() - Returns the last few rows of a data frame. By default, 6. To change this, use tail(cash, n = ___)

str() - Check the structure of an object. This fantastic function will show you the data type of the object you pass in (here, data.frame), and will list each column variable along with its data type.

With a small data set such as yours, head() and tail() are not incredibly useful, but imagine if you had a data frame of hundreds or thousands of rows!

Call head() on cash to see the first 4 rows.

# Call head() for the first 4 rows
head(cash, n = 4)

##   company cash_flow year
## 1       A      1000    1
## 2       A      4000    3
## 3       A       550    4
## 4       B      1500    1

Call tail() on cash to see the last 3 rows.

# Call tail() for the last 3 rows
tail(cash, n = 3)

##   company cash_flow year
## 5       B      1100    2
## 6       B       750    4
## 7       B      6000    5

Call str() on cash to check out the structure of your data frame. (You might notice that the class of company is a Factor and not a character. Do not fear! This will be covered in Chapter 4. For now, don’t worry about it.)

# Call str()
str(cash)

## 'data.frame':    7 obs. of  3 variables:
##  $ company  : chr  "A" "A" "A" "B" ...
##  $ cash_flow: num  1000 4000 550 1500 1100 750 6000
##  $ year     : num  1 3 4 1 2 4 5

Success!!

3.1.4 Naming your columns / rows

Let’s look at cash again:

cash

  comp cash yr
1    A 1000  1
2    A 4000  3
3    A  550  4
4    B 1500  1
5    B 1100  2
6    B  750  4
7    B 6000  5

Wait, that’s not right! It looks like someone has changed your column names! Don’t worry, you can change them back using colnames() just like you did with names() back with vectors.

Similarly, you can change the row names using rownames(), but this is less common.

The altered data frame cash is in your workspace.

Fix your column names by using colnames() and assigning a character vector of “company”, “cash_flow”, and “year” in that order.

# Fix your column names
colnames(cash) <- c("company", "cash_flow", "year")

Print out the fixed colnames() of cash.

# Print out the column names of cash
colnames(cash)

## [1] "company"   "cash_flow" "year"

Fantastic! Your column names are much better.

3.2 Data frame manipulation

3.2.1 Accessing and subsetting data frames (1)

Even more often than with vectors, you are going to want to subset your data frame or access certain columns. Again, one of the ways to do this is to use [ ]. The notation is just like matrices! Here are some examples:

Select the first row: cash[1, ]

Select the first column: cash[ ,1]

Select the first column by name: cash[ ,“company”]

Select the third row and second column of cash.

# Third row, second column
cash[3, 2]

## [1] 550

Select the fifth row of the “year” column of cash.

# Fifth row of the "year" column
cash[5, "year"]

## [1] 2

Great job! Subsetting data frames is a great skill to learn!

3.2.2 Accessing and subsetting data frames (2)

As you might imagine, selecting a specific column from a data frame is a common manipulation. So common, in fact, that it was given its own shortcut, the $</code>. The following return the same answer:</p> <pre><code>cash$cash_flow


[1] 1000 4000 550 1500 1100 750 6000
cash[,“cash_flow”]
[1] 1000 4000 550 1500 1100 750 6000

Useful right? Try it out!

Select the “year” column from cash using $.

# Select the year column
cash$year

## [1] 1 3 4 1 2 4 5

Select the “cash_flow” column from cash using $ and multiply it by 2.

# Select the cash_flow column and multiply by 2
cash$cash_flow * 2

## [1]  2000  8000  1100  3000  2200  1500 12000

You can delete a column by assigning it NULL. Run the code that deletes “company”.

# Delete the company column
cash$company <- NULL

Now print out cash again.

# Print cash again
cash

##   cash_flow year
## 1      1000    1
## 2      4000    3
## 3       550    4
## 4      1500    1
## 5      1100    2
## 6       750    4
## 7      6000    5

The $ is a great shortcut to use with data frames! Learn to love it!

3.2.3 Accessing and subsetting data frames (3)

Often, just simply selecting a column from a data frame is not all you want to do. What if you are only interested in the cash flows from company A? For more flexibility, try subset()!

subset(cash, company == "A")

  company cash_flow year
1       A      1000    1
2       A      4000    3
3       A       550    4

There are a few important things happening here:

The first argument you pass to subset() is the name of your data frame, cash.

Notice that you shouldn’t put company in quotes!

The == is the equality operator. It tests to find where two things are equal, and returns a logical vector. There is a lot more to learn about these relational operators, and you can learn all about them in the second finance course, Intermediate R for Finance!

Use subset() to select only the rows of cash corresponding to company B.

# Rows about company B
subset(cash, company == "B")

##   cash_flow year
## 4      1500    1
## 5      1100    2
## 6       750    4
## 7      6000    5

Now subset() rows that have cash flows due in 1 year.

# Rows with cash flows due in 1 year
subset(cash, year == 1)

##   cash_flow year
## 1      1000    1
## 4      1500    1

Great! subset() allows you to create more powerful ways to select groups from your data.

3.2.4 Adding new columns

In a perfect world, you could be 100% certain that you will receive all of your cash flows. But, since these are predictions about the future, there is always a chance that someone won’t be able to pay! You decide to run some analysis about a worst case scenario where you only receive half of your expected cash flow. To save the worst case scenario for later analysis, you decide to add it as a new column to the data frame!

cash$half_cash <- cash$cash_flow * .5

cash

  company cash_flow year half_cash
1       A      1000    1       500
2       A      4000    3      2000
3       A       550    4       275
4       B      1500    1       750
5       B      1100    2       550
6       B       750    4       375
7       B      6000    5      3000

And that’s it! Creating new columns in your data frame is as simple as assigning the new information to data_frame$new_column</code>. Often, the newly created column is some transformation of existing columns, so the <code>$ operator really comes in handy here!

Create a new worst case scenario where you only receive 25% of your expected cash flow, add it to the data frame as quarter_cash.

# Quarter cash flow scenario
cash$quarter_cash <- cash$cash_flow * .25

What if it took twice as long (in terms of year) to receive your money? Add a new column double_year with this scenario.

# Double year scenario
cash$double_year <- cash$year * 2

Great! See how useful the $ is for readability?

3.3 Present value

3.3.1 Present value of projected cash flows (1)

Time for some analysis! Earlier, Lore introduced the idea of present value. You will use that idea in the next two exercises, so here is another example.

If you expect a cash flow of $100 to be received 1 year from now, what is the present value of that cash flow at a 5% interest rate? To calculate this, you discount the cash flow to get it in terms of today’s dollars. The general formula for this is:

present_value <- cash_flow * (1 + interest / 100) ^ -year

95.238 = 100 * (1.05) ^ -1

Another way to think about this is to reverse the problem. If you have $95.238 today, and it earns 5% over the next year, how much money do you have at the end of the year? We know how to do this problem from way back in chapter 1! Find the multiplier that corresponds to 5% and multiply by $95.238!

100 = 95.238 * (1.05)

Aha! To discount your money, just do the reverse of what you did with stock returns in chapter 1.

If you expect to receive $4000 in 3 years, at a 5% interest rate, what is the present value of that money? Follow the general formula above and assign the result to present_value_4k.

# Present value of $4000, in 3 years, at 5%
present_value_4k <- 4000 * (1.05) ^ -3

Using vectors, you can calculate the present value of the entire column of cash_flow at once! Use cash$cash_flow</code>, <code>cash$year and the general formula to calculate the present value of all of your cash flows at 5% interest. Add it to cash as the column present_value.

# Present value of all cash flows
cash$present_value <- cash$cash_flow * (1.05) ^ -cash$year

Print out cash to see your new column.

# Print out cash
cash

##   cash_flow year quarter_cash double_year present_value
## 1      1000    1        250.0           2      952.3810
## 2      4000    3       1000.0           6     3455.3504
## 3       550    4        137.5           8      452.4864
## 4      1500    1        375.0           2     1428.5714
## 5      1100    2        275.0           4      997.7324
## 6       750    4        187.5           8      617.0269
## 7      6000    5       1500.0          10     4701.1570

Great! Learning to calculate present values is useful for any finance calculation.

3.3.2 Present value of projected cash flows (2)

Amazing! You are almost done with this chapter, and you are becoming a true wizard of data frames and finance. Before you move on, let’s answer a few more questions.

You now have a column for present_value, but you want to report the total amount of that column to your board members. Calculating this part is easy, use the sum() function you learned earlier to add up the elements of cash$present_value.

However, you also want to know how much company A and company B individually contribute to the total present value. Do you remember how to separate the rows of your data frame to only include company A or B?

cash_A <- subset(cash, company == "A")

sum(cash_A$present_value)

[1] 4860.218

Use the sum() function to calculate the total present_value of cash. Assign it to total_pv.

# Total present value of cash
total_pv <- sum(cash$present_value)

Subset cash to only include rows about company B to create cash_B.

# Company B information
cash_B <- subset(cash, company == "B")

Use sum() and cash_B to calculate the total present_value from company B. Assign it to total_pv_B.

# Total present value of cash_B
total_pv_B <- sum(cash_B$present_value)

4 Factors

Questions with answers that fall into a limited number of categories can be classified as factors. In this chapter, you will use bond credit ratings to learn all about creating, ordering, and subsetting factors.

4.1 What is a factor?

4.1.1 Create a factor

Bond credit ratings are common in the fixed income side of the finance world as a simple measure of how “risky” a certain bond might be. Here, riskiness can be defined as the probability of default, which means an inability to pay back your debts. The Standard and Poor’s and Fitch credit rating agency has defined the following ratings, from least likely to default to most likely:

AAA, AA, A, BBB, BB, B, CCC, CC, C, D

This is a perfect example of a factor! It is a categorical variable that takes on a limited number of levels.

To create a factor in R, use the factor() function, and pass in a vector that you want to be converted into a factor.

Suppose you have a portfolio of 7 bonds with these credit ratings:

credit_rating <- c(“AAA”, “AA”, “A”, “BBB”, “AA”, “BBB”, “A”)

To create a factor from this:

factor(credit_rating)

[1] AAA AA  A   BBB AA  BBB A  
Levels: A AA AAA BBB

A new character vector, credit_rating has been created for you in the code for this exercise.

Turn credit_rating into a factor using factor(). Assign it to credit_factor.

# credit_rating character vector
credit_rating <- c("BB", "AAA", "AA", "CCC", "AA", "AAA", "B", "BB")

# Create a factor from credit_rating
credit_factor <- factor(credit_rating)

Print out credit_factor.

# Print out your new factor
credit_factor

## [1] BB  AAA AA  CCC AA  AAA B   BB 
## Levels: AA AAA B BB CCC

Call str() on credit_rating to note the structure.

# Call str() on credit_rating
str(credit_rating)

##  chr [1:8] "BB" "AAA" "AA" "CCC" "AA" "AAA" "B" "BB"

Call str() on credit_factor and compare the structure to credit_rating.

# Call str() on credit_factor
str(credit_factor)

##  Factor w/ 5 levels "AA","AAA","B",..: 4 2 1 5 1 2 3 4

Fantastic! That wasn’t too bad, right?

4.1.2 Factor levels

Accessing the unique levels of your factor is simple enough by using the levels() function. You can also use this to rename your factor levels!

credit_factor

[1] AAA AA  A   BBB AA  BBB A  
Levels: A AA AAA BBB

levels(credit_factor)

[1] "A"   "AA"  "AAA" "BBB"

levels(credit_factor) <- c("1A", "2A", "3A", "3B")

credit_factor

[1] 3A 2A 1A 3B 2A 3B 1A
Levels: 1A 2A 3A 3B

The credit_factor variable you created in the last exercise is available in your workspace.

Use levels() on credit_factor to identify the unique levels.

# Identify unique levels
levels(credit_factor)

## [1] "AA"  "AAA" "B"   "BB"  "CCC"

Using the same “1A”, “2A” notation as in the example, rename the levels of credit_factor. Pay close attention to the level order!

# Rename the levels of credit_factor
levels(credit_factor) <- c("2A", "3A", "1B", "2B", "3C")

Print the renamed credit_factor.

# Print credit_factor
credit_factor

## [1] 2B 3A 2A 3C 2A 3A 1B 2B
## Levels: 2A 3A 1B 2B 3C

Great job!

4.1.3 Factor summary

As any good bond investor would do, you would like to keep track of how many bonds you are holding of each credit rating. A way to present a table of the counts of each bond credit rating would be great! Luckily for you, the summary() function for factors can help you with that.

The character vector credit_rating and the factor credit_factor are both in your workspace.

First call summary() on credit_rating. Does this seem useful?

# Summarize the character vector, credit_rating
summary(credit_rating)

##    Length     Class      Mode 
##         8 character character

Now try summary() again, but this time on credit_factor.

# Summarize the factor, credit_factor
summary(credit_factor)

## 2A 3A 1B 2B 3C 
##  2  2  1  2  1

Factor summaries are much more useful for tabulating data!

4.1.4 Visualize your factor

You can also visualize the table that you created in the last example by using a bar chart. A bar chart is a type of graph that displays groups of data using rectangular bars where the height of each bar represents the number of counts in that group.

The plot() function can again take care of all of the magic for you, check it out!

Note that in the example below, you are creating the plot from a factor and not a character vector. R will throw an error if you try and plot a character vector!

The factor credit_factor is in your workspace.

Plot credit_factor to create your first bar chart!

# Visualize your factor!
plot(credit_factor)

Awesome bar chart!

4.1.5 Bucketing a numeric variable into a factor

Your old friend Dan sent you a list of 50 AAA rated bonds called AAA_rank, with each bond having an additional number from 1-100 describing how profitable he thinks that bond will be (100 being the most profitable). You are interested in doing further analysis on his suggestions, but first it would be nice if the bonds were bucketed by their ranking somehow. This would help you create groups of bonds, from least profitable to most profitable, to more easily analyze them.

This is a great example of creating a factor from a numeric vector. The easiest way to do this is to use cut(). Below, Dan’s 1-100 ranking is bucketed into 5 evenly spaced groups. Note that the ( in the factor levels means we do not include the number beside it in that group, and the ] means that we do include that number in the group.

head(AAA_rank)

[1]  31  48 100  53  85  73

AAA_factor <- cut(x = AAA_rank, breaks = c(0, 20, 40, 60, 80, 100))

head(AAA_factor)

[1] (20,40]  (40,60]  (80,100] (40,60]  (80,100] (60,80] 
Levels: (0,20] (20,40] (40,60] (60,80] (80,100]

In the cut() function, using breaks = allows you to specify the groups that you want R to bucket your data by!

# edited by cliex159
AAA_rank = c(9, 88, 74, 94, 44, 59, 81, 67, 48, 16, 58, 72, 62, 31, 65, 93, 49, 21, 68, 33, 32, 56, 51, 56, 38, 85, 9, 23, 91, 25, 11, 95, 84, 31, 33, 1, 13, 38, 34, 15, 29, 50, 51, 53, 20, 75, 83, 52, 39, 11)

Instead of 5 buckets, can you create just 4? In breaks = use a vector from 0 to 100 where each element is 25 numbers apart. Assign it to AAA_factor.

# Create 4 buckets for AAA_rank using cut()
AAA_factor <- cut(x = AAA_rank, breaks = c(0, 25, 50, 75, 100))

The 4 buckets do not have very descriptive names. Use levels() to rename the levels to “low”, “medium”, “high”, and “very_high”, in that order.

# Rename the levels 
levels(AAA_factor) <- c("low", "medium", "high", "very_high")

Print the newly named AAA_factor.

# Print AAA_factor
AAA_factor

##  [1] low       very_high high      very_high medium    high      very_high
##  [8] high      medium    low       high      high      high      medium   
## [15] high      very_high medium    low       high      medium    medium   
## [22] high      high      high      medium    very_high low       low      
## [29] very_high low       low       very_high very_high medium    medium   
## [36] low       low       medium    medium    low       medium    medium   
## [43] high      high      low       high      very_high high      medium   
## [50] low      
## Levels: low medium high very_high

Plot the AAA_factor to visualize your work!

# Plot AAA_factor
plot(AAA_factor)

Great! Sometimes factors are easier to plot than numerics.

4.2 Ordering and subsetting factors

4.2.1 Create an ordered factor

Look at the plot created over on the right. It looks great, but look at the order of the bars! No order was specified when you created the factor, so, when R tried to plot it, it just placed the levels in alphabetical order. By now, you know that there is an order to credit ratings, and your plots should reflect that!

As a reminder, the order of credit ratings from least risky to most risky is:

AAA, AA, A, BBB, BB, B, CCC, CC, C, D

To order your factor, there are two options.

When creating a factor, specify ordered = TRUE and add unique levels in order from least to greatest:

credit_rating <- c("AAA", "AA", "A", "BBB", "AA", "BBB", "A")

credit_factor_ordered <- factor(credit_rating, ordered = TRUE, 
                                levels = c("AAA", "AA", "A", "BBB"))

For an existing unordered factor like credit_factor, use the ordered() function:

ordered(credit_factor, levels = c(“AAA”, “AA”, “A”, “BBB”))

Both ways result in:

credit_factor_ordered

[1] AAA AA  A   BBB AA  BBB A  
Levels: AAA < AA < A < BBB

Notice the < specifying the order of the levels that was not there before!

The character vector credit_rating is in your workspace.

Use the unique() function with credit_rating to print only the unique words in the character vector. These will be your levels.

# Use unique() to find unique words
unique(credit_rating)

## [1] "BB"  "AAA" "AA"  "CCC" "B"

Use factor() to create an ordered factor for credit_rating and store it as credit_factor_ordered. Make sure to list the levels from least to greatest in terms of risk!

# Create an ordered factor
credit_factor_ordered <- factor(credit_rating, ordered = TRUE, levels = c("AAA", "AA", "BB", "B", "CCC"))

Plot credit_factor_ordered and note the new order of the bars.

# Plot credit_factor_ordered
plot(credit_factor_ordered)

Awesome! Ordered factors are great for plotting or creating tables with a predefined order.

4.2.2 Subsetting a factor

You can subset factors in a similar way that you subset vectors. As usual, [ ] is the key! However, R has some interesting behavior when you want to remove a factor level from your analysis. For example, what if you wanted to remove the AAA bond from your portfolio?

credit_factor

[1] AAA AA  A   BBB AA  BBB A  
Levels: BBB < A < AA < AAA

credit_factor[-1]

[1] AA  A   BBB AA  BBB A  
Levels: BBB < A < AA < AAA

R removed the AAA bond at the first position, but left the AAA level behind! If you were to plot this, you would end up with the bar chart over to the right. A better plan would have been to tell R to drop the AAA level entirely. To do that, add drop = TRUE:

credit_factor[-1, drop = TRUE]

[1] AA  A   BBB AA  BBB A  
Levels: BBB < A < AA

That’s what you wanted!

Using the same data, remove the “A” bonds from positions 3 and 7 of credit_factor. For now, do not use drop = TRUE. Assign this to keep_level.

# Remove the A bonds at positions 3 and 7. Don't drop the A level.
keep_level <- credit_factor[-c(3,7)]

Plot keep_level.

# Plot keep_level
plot(keep_level)

Now, remove “A” from credit_factor again, but this time use drop = TRUE. Assign this to drop_level.

# Remove the A bonds at positions 3 and 7. Drop the A level.
drop_level <- credit_factor[-c(3,7), drop = TRUE]

Plot drop_level.

# Plot drop_level
plot(drop_level)

Great! The drop argument will help you get rid of those pesky factor levels that stick around.

4.2.3 stringsAsFactors

Do you remember back in the data frame chapter when you used str() on your cash data frame? This was the output:

str(cash)

'data.frame':    3 obs. of  3 variables:
 $ company  : Factor w/ 2 levels "A","B": 1 1 2
 $ cash_flow: num  100 200 300
 $ year     : num  1 3 2

See how the company column has been converted to a factor? R’s default behavior when creating data frames is to convert all characters into factors. This has caused countless novice R users a headache trying to figure out why their character columns are not working properly, but not you! You will be prepared!

To turn off this behavior:

cash <- data.frame(company, cash_flow, year, stringsAsFactors = FALSE)

str(cash)

'data.frame':    3 obs. of  3 variables:
 $ company  : chr  "A" "A" "B"
 $ cash_flow: num  100 200 300
 $ year     : num  1 3 2

Two variables, credit_rating and bond_owners have been defined for you. bond_owners is a character vector of the names of some of your friends.

# Variables
credit_rating <- c("AAA", "A", "BB")
bond_owners <- c("Dan", "Tom", "Joe")

Create a data frame named bonds from credit_rating and bond_owners, in that order, and use stringsAsFactors = FALSE.

# Create the data frame of character vectors, bonds
bonds <- data.frame(credit_rating, bond_owners, stringsAsFactors = FALSE)

Use str() to confirm that both columns are characters.

# Use str() on bonds
str(bonds)

## 'data.frame':    3 obs. of  2 variables:
##  $ credit_rating: chr  "AAA" "A" "BB"
##  $ bond_owners  : chr  "Dan" "Tom" "Joe"

bond_owners would not be a useful factor, but credit_rating could be! Create a new column in bonds called credit_factor using $ which is created from credit_rating as a correctly ordered factor.

# Create a factor column in bonds called credit_factor from credit_rating
bonds$credit_factor <- factor(bonds$credit_rating, ordered = TRUE, levels = c("AAA","A","BB"))

Use str() again to confirm that credit_factor is an ordered factor.

# Use str() on bonds again
str(bonds)

## 'data.frame':    3 obs. of  3 variables:
##  $ credit_rating: chr  "AAA" "A" "BB"
##  $ bond_owners  : chr  "Dan" "Tom" "Joe"
##  $ credit_factor: Ord.factor w/ 3 levels "AAA"<"A"<"BB": 1 2 3

5 Lists

Wouldn’t it be nice if there was a way to hold related vectors, matrices, or data frames together in R? In this final chapter, you will explore lists and many of their interesting features by building a small portfolio of stocks.

5.1 What is a list?

5.1.1 Create a list

Just like a grocery list, lists in R can be used to hold together items of different data types. Creating a list is, you guessed it, as simple as using the list() function. You could say that a list is a kind of super data type: you can store practically any piece of information in it! Create a list like so:

words <- c("I <3 R")
numbers <- c(42, 24)

my_list <- list(words, numbers)

my_list

[[1]]
[1] "I <3 R"

[[2]]
[1] 42 24

Below, you will create your first list from some of the data you have already worked with!

The 4 components for your list have been created for you.

# List components
name <- "Apple and IBM"
apple <- c(109.49, 109.90, 109.11, 109.95, 111.03)
ibm <- c(159.82, 160.02, 159.84, 160.35, 164.79)
cor_matrix <- cor(cbind(apple, ibm))

Use list() to create a list of name, apple, ibm, and cor_matrix, in that order, and assign it to portfolio.

# Create a list
portfolio <- list(name, apple, ibm, cor_matrix)

Print your portfolio.

# View your first list
portfolio

## [[1]]
## [1] "Apple and IBM"
## 
## [[2]]
## [1] 109.49 109.90 109.11 109.95 111.03
## 
## [[3]]
## [1] 159.82 160.02 159.84 160.35 164.79
## 
## [[4]]
##           apple       ibm
## apple 1.0000000 0.9131575
## ibm   0.9131575 1.0000000

Awesome! Lists are great for holding groups of related data structures together.

5.1.2 Named lists

Knowing how forgetful you are, you decide it would be important to add names to your list so you can remember what each element is describing. There are two ways to do this!

You could name the elements as you create the list with the form name = value:

my_list <- list(my_words = words, my_numbers = numbers)

Or, if the list was already created, you could use names():

my_list <- list(words, numbers)
names(my_list) <- c("my_words", "my_numbers")

Both would result in:

my_list

$my_words
[1] "I <3 R"

$my_numbers
[1] 42 24

The portfolio list is available to work with.

Use names() to add the following names to your list: “portfolio_name”, “apple”, “ibm”, “correlation”, in that order.

# Add names to your portfolio
names(portfolio) <- c("portfolio_name", "apple", "ibm", "correlation")

Print portfolio to see your newly named list.

# Print the named portfolio
portfolio

## $portfolio_name
## [1] "Apple and IBM"
## 
## $apple
## [1] 109.49 109.90 109.11 109.95 111.03
## 
## $ibm
## [1] 159.82 160.02 159.84 160.35 164.79
## 
## $correlation
##           apple       ibm
## apple 1.0000000 0.9131575
## ibm   0.9131575 1.0000000

Adding names to your list will make them much easier to understand!

5.1.3 Access elements in a list

Subsetting a list is similar to subsetting a vector or data frame, with one extra useful operation.

To access the elements in the list, use [ ]. This will always return another list.

my_list[1]

$my_words
[1] "I <3 R"

my_list[c(1,2)]

$my_words
[1] "I <3 R"

$my_numbers
[1] 42 24

To pull out the data inside each element of your list, use [[ ]].

my_list[[1]]

[1] "I <3 R"

If your list is named, you can use the $</code> operator: <code>my_list$my_words. This is the same as using [[ ]] to return the inner data.

The portfolio named list is available for use.

Access the second and third elements of portfolio using [ ] and c().

# Second and third elements of portfolio
portfolio[c(2,3)]

## $apple
## [1] 109.49 109.90 109.11 109.95 111.03
## 
## $ibm
## [1] 159.82 160.02 159.84 160.35 164.79

Use $ to access the correlation data.

# Use $ to get the correlation data
portfolio$correlation

##           apple       ibm
## apple 1.0000000 0.9131575
## ibm   0.9131575 1.0000000

Notice how the use of $ in lists is similar to data frames!

5.1.4 Adding to a list

Once you create a list, you aren’t stuck with it forever. You can add new elements to it whenever you want! Say you want to add your friend Dan’s favorite movie to your list. You can do so using $</code> like you did when adding new columns to data frames.</p> <pre><code>my_list$dans_movie <- “StaR Wars”


my_list
$my_words
[1] “I <3 R”
$my_numbers
[1] 42 24
$dans_movie
[1] “StaR Wars”

You could have also used c() to add another element to the list: c(my_list, dans_movie = “StaR Wars”). This can be useful if you want to add multiple elements to your list at once.

Another useful piece of information for your portfolio is the variable weight describing how invested you are in Apple and IBM. Fill in the ___ correctly so that you are invested 20% in Apple and 80% in IBM. Remember to use decimal numbers, not percentages!

# Add weight: 20% Apple, 80% IBM
portfolio$weight <- c(apple = .2, ibm = .8)

Print portfolio to see the weight element.

# Print portfolio
portfolio

## $portfolio_name
## [1] "Apple and IBM"
## 
## $apple
## [1] 109.49 109.90 109.11 109.95 111.03
## 
## $ibm
## [1] 159.82 160.02 159.84 160.35 164.79
## 
## $correlation
##           apple       ibm
## apple 1.0000000 0.9131575
## ibm   0.9131575 1.0000000
## 
## $weight
## apple   ibm 
##   0.2   0.8

You can change the data in a list in the same way as adding to it using $. Create weight to be invested 30% in Apple and 70% in IBM.

# Change the weight variable: 30% Apple, 70% IBM
portfolio$weight <- c(apple = .3, ibm = .7)

Print portfolio again to see your changes.

# Print portfolio to see the changes
portfolio

## $portfolio_name
## [1] "Apple and IBM"
## 
## $apple
## [1] 109.49 109.90 109.11 109.95 111.03
## 
## $ibm
## [1] 159.82 160.02 159.84 160.35 164.79
## 
## $correlation
##           apple       ibm
## apple 1.0000000 0.9131575
## ibm   0.9131575 1.0000000
## 
## $weight
## apple   ibm 
##   0.3   0.7

Great job!

5.1.5 Removing from a list

The natural next step is to learn how to remove elements from a list. You decide that even though Dan is your best friend, you don’t want his info in your list. To remove dans_movie:

my_list$dans_movie <- NULL

my_list

$my_words
[1] "I <3 R"

$my_numbers
[1] 42 24

Using NULL is the easiest way to remove an element from your list! If your list is not named, you can also remove elements by position using my_list[1] <- NULL or my_list[[1]] <- NULL.

Take a look at your portfolio. It seems that someone has added microsoft stock that you did not buy!

# Take a look at portfolio
portfolio

## $portfolio_name
## [1] "Apple and IBM"
## 
## $apple
## [1] 109.49 109.90 109.11 109.95 111.03
## 
## $ibm
## [1] 159.82 160.02 159.84 160.35 164.79
## 
## $correlation
##           apple       ibm
## apple 1.0000000 0.9131575
## ibm   0.9131575 1.0000000
## 
## $weight
## apple   ibm 
##   0.3   0.7

Remove the microsoft element of portfolio using NULL.

# Remove the microsoft stock prices from your portfolio
portfolio$microsoft <- NULL

Awesome! Now the list only has your information again. Sorry, Dan!

5.2 A few list creating functions

5.2.1 Split it

Often, you will have data for multiple groups together in one data frame. The cash data frame was an example of this back in Chapter 3. There were cash_flow and year columns for two groups (companies A and B). What if you wanted to split up this data frame into two separate data frames divided by company? In the next exercise, you will explore why you might want to do this, but first let’s explore how to make this happen using the split() function.

Create a grouping to split on, and use split() to create a list of two data frames.

grouping <- cash$company
split_cash <- split(cash, grouping)

split_cash 

$A
  company cash_flow year
1       A      1000    1
2       A      4000    3
3       A       550    4

$B
  company cash_flow year
4       B      1500    1
5       B      1100    2
6       B       750    4
7       B      6000    5

To get your original data frame back, use unsplit(split_cash, grouping).

The cash data frame is available in your workspace.

Create a new grouping from the year column.

# Define grouping from year
grouping <- cash$year

Use split() to split cash into a list of 5 data frames separated by year. Assign this to split_cash.

# Split cash on your new grouping
split_cash <- split(cash, grouping)

Print split_cash.

# Look at your split_cash list
split_cash

## $`1`
##   cash_flow year quarter_cash double_year present_value
## 1      1000    1          250           2       952.381
## 4      1500    1          375           2      1428.571
## 
## $`2`
##   cash_flow year quarter_cash double_year present_value
## 5      1100    2          275           4      997.7324
## 
## $`3`
##   cash_flow year quarter_cash double_year present_value
## 2      4000    3         1000           6       3455.35
## 
## $`4`
##   cash_flow year quarter_cash double_year present_value
## 3       550    4        137.5           8      452.4864
## 6       750    4        187.5           8      617.0269
## 
## $`5`
##   cash_flow year quarter_cash double_year present_value
## 7      6000    5         1500          10      4701.157

Use unsplit() to combine the data frames again. Assign this to original_cash.

# Unsplit split_cash to get the original data back.
original_cash <- unsplit(split_cash, grouping)

Print original_cash to compare to the first cash data frame.

# Print original_cash
original_cash

##   cash_flow year quarter_cash double_year present_value
## 1      1000    1        250.0           2      952.3810
## 2      4000    3       1000.0           6     3455.3504
## 3       550    4        137.5           8      452.4864
## 4      1500    1        375.0           2     1428.5714
## 5      1100    2        275.0           4      997.7324
## 6       750    4        187.5           8      617.0269
## 7      6000    5       1500.0          10     4701.1570

Great job! This is a very important concept for more advanced data wrangling.

5.2.2 Split-Apply-Combine

A common data science problem is to split your data frame by a grouping, apply some transformation to each group, and then recombine those pieces back into one data frame. This is such a common class of problems in R that it has been given the name split-apply-combine. In Intermediate R for Finance, you will explore a number of these problems and functions that are useful when solving them, but, for now, let’s do a simple example.

Suppose, for the cash data frame, you are interested in doubling the cash_flow for company A, and tripling it for company B:

grouping <- cash$company
split_cash <- split(cash, grouping)

# We can access each list element's cash_flow column by:
split_cash$A$cash_flow
[1] 1000 4000  550

split_cash$A$cash_flow <- split_cash$A$cash_flow * 2
split_cash$B$cash_flow <- split_cash$B$cash_flow * 3

new_cash <- unsplit(split_cash, grouping)

Take a look again at how you access the cash_flow column. The first $</code> is to access the <code>A</code> element of the <code>split_cash</code> list. The second <code>$ is to access the cash_flow column of the data frame in A.

The split_cash data frame is available for you. Also, the grouping that was used to split cash is available.

Print split_cash to get a look at the list.

# Print split_cash
split_cash

## $`1`
##   cash_flow year quarter_cash double_year present_value
## 1      1000    1          250           2       952.381
## 4      1500    1          375           2      1428.571
## 
## $`2`
##   cash_flow year quarter_cash double_year present_value
## 5      1100    2          275           4      997.7324
## 
## $`3`
##   cash_flow year quarter_cash double_year present_value
## 2      4000    3         1000           6       3455.35
## 
## $`4`
##   cash_flow year quarter_cash double_year present_value
## 3       550    4        137.5           8      452.4864
## 6       750    4        187.5           8      617.0269
## 
## $`5`
##   cash_flow year quarter_cash double_year present_value
## 7      6000    5         1500          10      4701.157

Print the cash_flow column for company B in split_cash.

# Print the cash_flow column of B in split_cash
split_cash$B$cash_flow

## NULL

Tragically, you have learned that company A went out of business. Set the cash_flow for company A to 0.

# Set the cash_flow column of company A in split_cash to 0
split_cash$A$cash_flow <- 0

Use grouping to unsplit() the split_cash data frame. Assign this to cash_no_A.

# Use the grouping to unsplit split_cash
cash_no_A <- unsplit(split_cash, grouping)

Finally, print cash_no_A to see the modified data frame.

# Print cash_no_A
cash_no_A

##   cash_flow year quarter_cash double_year present_value
## 1      1000    1        250.0           2      952.3810
## 2      4000    3       1000.0           6     3455.3504
## 3       550    4        137.5           8      452.4864
## 4      1500    1        375.0           2     1428.5714
## 5      1100    2        275.0           4      997.7324
## 6       750    4        187.5           8      617.0269
## 7      6000    5       1500.0          10     4701.1570

Great job! You will learn much more about this, and the apply() functions in the second course.

5.2.3 Attributes

You have made it to the last exercise in the course! Congrats! Let’s finish up with an easy one.

Attributes are a bit of extra metadata about your data structure. Some of the most common attributes are: row names and column names, dimensions, and class. You can use the attributes() function to return a list of attributes about the object you pass in. To access a specific attribute, you can use the attr() function.

Exploring the attributes of cash:

attributes(cash)

$names
[1] "company"   "cash_flow" "year"     

$row.names
[1] 1 2 3 4 5 6 7

$class
[1] "data.frame"

attr(cash, which = "names")

[1] "company"   "cash_flow" "year"

The matrix my_matrix and the factor my_factor are defined for you.

# my_matrix and my_factor
my_matrix <- matrix(c(1,2,3,4,5,6), nrow = 2, ncol = 3)
rownames(my_matrix) <- c("Row1", "Row2")
colnames(my_matrix) <- c("Col1", "Col2", "Col3")

my_factor <- factor(c("A", "A", "B"), ordered = T, levels = c("A", "B"))

Use attributes() on my_matrix.

# attributes of my_matrix
attributes(my_matrix)

## $dim
## [1] 2 3
## 
## $dimnames
## $dimnames[[1]]
## [1] "Row1" "Row2"
## 
## $dimnames[[2]]
## [1] "Col1" "Col2" "Col3"

Use attr() on my_matrix to return the “dim” attribute.

# Just the dim attribute of my_matrix
attr(my_matrix, which = "dim")

## [1] 2 3

Use attributes() on my_factor.

# attributes of my_factor
attributes(my_factor)

## $levels
## [1] "A" "B"
## 
## $class
## [1] "ordered" "factor"

5.3 Congratulations!

5.3.1 Congratulations!

From quantitative finance, to machine learning, to geospatial data, the possibilities of what you can do with R are just about endless. My hope is that you take what you learned in this course, and use that knowledge to explore a data set that interests you.

5.3.2 More to learn

5.3.3 Keep learning!

We have only just scratched the surface of what R can do, and I hope you will check out some of our other courses to learn much more about it. If you are interested in continuing the Finance curriculum, check out Intermediate R for Finance. I’m happy that you were able to take this course with me, thanks for attending!