Intermediate R for Finance

Lore Dirick - DataCamp

Course Description

If you enjoyed the Introduction to R for Finance course, then you will love Intermediate R for Finance. Here, you will first learn the basics about how dates work in R, an important skill for the rest of the course. Your next step will be to explore the world of if statements, loops, and functions. These are powerful ideas that are essential to any financial data scientist’s toolkit. Finally, we will spend some time working with the family of apply functions as a vectorized alternative to loops. And of course, all examples will be finance related! Enjoy!

1 Dates

Welcome! Before we go deeper into the world of R, it will be nice to have an understanding of how dates and times are created. This chapter will teach you enough to begin working with dates, but only scratches the surface of what you can do with them.

1.1 An introduction to dates in R

1.1.1 What day is it?

R has a lot to offer in terms of dates and times. The two main classes of data for this are Date and POSIXct. Date is used for calendar date objects like “2015-01-22”. POSIXct is a way to represent datetime objects like “2015-01-22 08:39:40 EST”, meaning that it is 40 seconds after 8:39 AM Eastern Standard Time.

In practice, the best strategy is to use the simplest class that you need. Often, Date will be the simplest choice. This course will use the Date class almost exclusively, but it is important to be aware of POSIXct as well for storing intraday financial data.

In the exercise below, you will explore your first date and time objects by asking R to return the current date and the current time.

# edited by cliex159

Type Sys.Date() to have R return the current date.

# What is the current date?
Sys.Date()

## [1] "2022-10-22"

Type Sys.time() to have R return the current date and time. Notice the difference in capitalization of Date vs time.

# What is the current date and time?
Sys.time()

## [1] "2022-10-22 14:59:27 +07"

Store Sys.Date() in the variable today.

# Create the variable today
today <- Sys.Date()

Use class() on today to confirm its class.

# Confirm the class of today
class(today)

## [1] "Date"

Awesome! What else can you do with dates?

1.1.2 From char to date

You will often have to create dates yourself from character strings. The as.Date() function is the best way to do this:

# The Great Crash of 1929
great_crash <- as.Date("1929-11-29")

great_crash
[1] "1929-11-29"

class(great_crash)
[1] "Date"

Notice that the date is given in the format of “yyyy-mm-dd”. This is known as ISO format (ISO = International Organization for Standardization), and is the way R accepts and displays dates.

Internally, dates are stored as the number of days since January 1, 1970, and datetimes are stored as the number of seconds since then. You will confirm this in the exercises below.

# edited by cliex159

Create a date variable named crash for “2008-09-29”, the date of the largest stock market point drop in a single day.

# Create crash
crash <- as.Date("2008-09-29")

Print crash.

# Print crash
crash

## [1] "2008-09-29"

Use as.numeric() on crash to convert it to the number of days since January 1, 1970.

# crash as a numeric
as.numeric(crash)

## [1] 14151

Wrap as.numeric() around Sys.time() to see the current time in number of seconds since January 1, 1970.

# Current time as a numeric
as.numeric(Sys.time())

## [1] 1666425568

Attempt to create a date from “09/29/2008”. What happens?

# Incorrect date format
as.Date("09/29/2008", format = "%m/%d/%Y")

## [1] "2008-09-29"

Nice job! You’ll learn how to deal with non-standard date formats later on!

1.1.3 Many dates

Creating a single date is nice to know how to do, but with financial data you will often have a large number of dates to work with. When this is the case, you will need to convert multiple dates from character to date format. You can do this all at once using vectors. In fact, if you remembered that a single character is actually a vector of length 1, then you would know that you have been doing this all along!

# Create a vector of daily character dates
dates <- c("2017-01-01", "2017-01-02",
           "2017-01-03", "2017-01-04") 

as.Date(dates)
[1] "2017-01-01" "2017-01-02" "2017-01-03" "2017-01-04"

Like before, this might look like it returned another character vector, but internally these are all stored as numerics, with some special properties that only dates have.

# edited by cliex159

Create another vector of dates containing the 4 days from “2017-02-05” to “2017-02-08” inclusive. Call this dates.

# Create dates from "2017-02-05" to "2017-02-08" inclusive
dates <- c("2017-02-05", "2017-02-06", "2017-02-07", "2017-02-08")

Assign the days of the week “Sunday”, “Monday”, “Tuesday”, “Wednesday”, in that order, as names() of the vector dates.

# Add names to dates
names(dates) <- c("Sunday", "Monday", "Tuesday", "Wednesday")

Subset dates using [ ] to retrieve only the date for “Monday”.

# Subset dates to only return the date for Monday
dates["Monday"]

##       Monday 
## "2017-02-06"

Nice job! Subsetting by name is very useful!

1.2 Date formats and extractor functions

1.2.1 Date formats (1)

As you saw earlier, R is picky about how it reads dates. To remind you, as.Date(“09/28/2008”) threw an error because it was not in the correct format. The fix for this is to specify the format you are using through the format argument:

as.Date("09/28/2008", format = "%m / %d / %Y")
[1] "2008-09-29"

This might look strange, but the basic idea is that you are defining a character vector telling R that your date is in the form of mm/dd/yyyy. It then knows how to extract the components and switch to yyyy-mm-dd.

There are a number of different formats you can specify, here are a few of them:

%Y: 4-digit year (1982)

%y: 2-digit year (82)

%m: 2-digit month (01)

%d: 2-digit day of the month (13)

%A: weekday (Wednesday)

%a: abbreviated weekday (Wed)

%B: month (January)

%b: abbreviated month (Jan)

# edited by cliex159

In this exercise you will work with the date, “1930-08-30”, Warren Buffett’s birth date!

Use as.Date() and an appropriate format to convert “08,30,1930” to a date (it is in the form of “month,day,year”).

# "08,30,30"
as.Date("08,30,1930", format = "%m, %d, %Y")

## [1] "1930-08-30"

Use as.Date() and an appropriate format to convert “Aug 30,1930” to a date.

# "Aug 30,1930"
as.Date("Aug 30,1930", format = "%b %d, %Y")

## [1] "1930-08-30"

Use as.Date() and an appropriate format to convert “30aug1930” to a date.

# "30aug1930"
as.Date("30aug1930", format = "%d%b%Y")

## [1] "1930-08-30"

Nice! Now you can work with all kinds of date formats.

1.2.2 Date formats (2)

Not only can you convert characters to dates, but you can convert objects that are already dates to differently formatted dates using format():

# The best point move in stock market history. A +936 point change in the Dow!
best_date
[1] "2008-10-13"

format(best_date, format = "%Y/%m/%d")
[1] "2008/10/13"

format(best_date, format = "%B %d, %Y")
[1] "October 13, 2008"

As a reminder, here are the formats:

%Y: 4-digit year (1982)

%y: 2-digit year (82)

%m: 2-digit month (01)

%d: 2-digit day of the month (13)

%A: weekday (Wednesday)

%a: abbreviated weekday (Wed)

%B: month (January)

%b: abbreviated month (Jan)

# edited by cliex159

Create the vector dates from char_date, specifying the format so R reads them correctly.

char_dates <- c("1jan17", "2jan17", "3jan17", "4jan17", "5jan17")

# Create dates using as.Date() and the correct format 
dates <- as.Date(char_dates, format = "%d%b%y")

Modify dates using format() so that each date looks like “Jan 04, 17”.

# Use format() to go from "2017-01-04" -> "Jan 04, 17"
format(dates, format = "%b %d, %y")

## [1] "Jan 01, 17" "Jan 02, 17" "Jan 03, 17" "Jan 04, 17" "Jan 05, 17"

Modify dates using format() so that each date looks like “01,04,2017”.

# Use format() to go from "2017-01-04" -> "01,04,2017"
format(dates, format = "%m,%d,%Y")

## [1] "01,01,2017" "01,02,2017" "01,03,2017" "01,04,2017" "01,05,2017"

Nice Job! This can be useful when reporting or exporting dates.

1.2.3 Subtraction of dates

Just like with numerics, arithmetic can be done on dates. In particular, you can find the difference between two dates, in days, by using subtraction:

today <- as.Date("2017-01-02")
tomorrow <- as.Date("2017-01-03")
one_year_away <- as.Date("2018-01-02")

tomorrow - today
Time difference of 1 days

one_year_away - today
Time difference of 365 days

Equivalently, you could use the difftime() function to find the time interval instead.

difftime(tomorrow, today)
Time difference of 1 days

# With some extra options!
difftime(tomorrow, today, units = "secs")
Time difference of 86400 secs

# edited by cliex159

A vector of dates has been created for you.

# Dates
dates <- as.Date(c("2017-01-01", "2017-01-02", "2017-01-03"))

You can use subtraction to confirm that January 1, 1970 is the first date that R counts from. First, create a variable called origin containing “1970-01-01” as a date.

# Create the origin
origin <- as.Date("1970-01-01")

Now, use as.numeric() on dates to see how many days from January 1, 1970 it has been.

# Use as.numeric() on dates
as.numeric(dates)

## [1] 17167 17168 17169

Finally, subtract origin from dates to confirm the results! (Notice how recycling is used here!)

# Find the difference between dates and origin
dates - origin

## Time differences in days
## [1] 17167 17168 17169

Great work!

1.2.4 months() and weekdays() and quarters(), oh my!

As a final lesson on dates, there are a few functions that are useful for extracting date components. One of those is months().

my_date <- as.Date("2017-01-02")

months(my_date)
[1] "January"

Two other useful functions are weekdays() to extract the day of the week that your date falls on, and quarters() to determine which quarter of the year (Q1-Q4) that your date falls in.

# edited by cliex159

A vector of dates has been created for you.

# dates
dates <- as.Date(c("2017-01-02", "2017-05-03", "2017-08-04", "2017-10-17"))

Extract the months() from these dates.

# Extract the months
months(dates)

## [1] "January" "May"     "August"  "October"

Extract the quarters() from these dates.

# Extract the quarters
quarters(dates)

## [1] "Q1" "Q2" "Q3" "Q4"

Another vector, dates2 has also been created for you.

# dates2
dates2 <- as.Date(c("2017-01-02", "2017-01-03", "2017-01-04", "2017-01-05"))

Use weekdays() to determine what day of the week the dates fell on, and assign them to the names of dates2 using names().

# Assign the weekdays() of dates2 as the names()
names(dates2) <- weekdays(dates2)

Print dates2.

# Print dates2
dates2

##       Monday      Tuesday    Wednesday     Thursday 
## "2017-01-02" "2017-01-03" "2017-01-04" "2017-01-05"

Nice work! These functions, and a number of other ones, are useful for extracting information from dates.

2 If Statements and Operators

Imagine you own stock in a company. If the stock goes above a certain price, you might want to sell. If the stock drops below a certain price, you might want to buy it while it’s cheap! This kind of thinking can be implemented using operators and if statements. In this chapter, you will learn all about them, and create a program that tells you to buy or sell a stock.

2.1 Relational operators

2.1.1 Relational practice

In the video, Lore taught you all about different types of relational operators. For reference, here they are again:

> : Greater than

>=: Greater than or equal to

< : Less than

<=: Less than or equal to

==: Equality

!=: Not equal

These relational operators let us make comparisons in our data. If the equation is true, then the relational operator will return TRUE, otherwise it will return FALSE.

apple <- 45.46
microsoft <- 67.88

apple <= microsoft
[1] TRUE

hello <- "Hello world"

# Case sensitive!
hello == "hello world"
[1] FALSE

micr and apple stock prices and two dates, today and tomorrow, have been created for you.

# edited by cliex159

Is apple larger than micr?

# Stock prices
apple <- 48.99
micr <- 77.93

# Apple vs. Microsoft
apple > micr

## [1] FALSE

Check to see if apple and micr are not equal using !=.

# Not equals
apple != micr

## [1] TRUE

Is tomorrow less than today?

# Dates - today and tomorrow
today <- as.Date(Sys.Date())
tomorrow <- as.Date(Sys.Date() + 1)

# Today vs. Tomorrow
tomorrow < today

## [1] FALSE

Amazing! Relational operators will be used throughout the course!

2.1.2 Vectorized operations

You can extend the concept of relational operators to vectors of any arbitrary length. Compare two vectors using > to get a logical vector back of the same length, holding TRUE when the first is greater than the second, and FALSE otherwise.

apple <- c(120.00, 120.08, 119.97, 121.88)
datacamp  <- c(118.5, 124.21, 125.20, 120.22)

apple > datacamp
[1]  TRUE FALSE FALSE  TRUE

Comparing a vector and a single number works as well. R will recycle the number to be the same length as the vector:

apple > 120
[1] FALSE  TRUE FALSE  TRUE

Imagine how this could be used as a buy/sell signal in stock analysis! A data frame, stocks, is available for you to use.

# edited by cliex159
date = as.Date(c("2017-01-20",
"2017-01-23",
"2017-01-24",
"2017-01-25"))
ibm = c(170.55, 171.03, 175.90, 178.29)
panera = c(216.65, 216.06, 213.55, 212.22)
stocks = data.frame(date = date, ibm = ibm, panera = panera)

Print stocks.

# Print stocks
stocks

##         date    ibm panera
## 1 2017-01-20 170.55 216.65
## 2 2017-01-23 171.03 216.06
## 3 2017-01-24 175.90 213.55
## 4 2017-01-25 178.29 212.22

You want to buy ibm when it crosses below 175. Use $ to select the ibm column and a logical operator to know when this happens. Add it to stocks as the column, ibm_buy.

# IBM range
stocks$ibm_buy <- stocks$ibm < 175

If panera crosses above 213, sell. Use a logical operator to know when this happens. Add it to stocks as the column, panera_sell.

# Panera range
stocks$panera_sell <- stocks$panera > 213

Is ibm ever above panera? Add the result to stocks as the column, ibm_vs_panera.

# IBM vs Panera
stocks$ibm_vs_panera <- stocks$ibm > stocks$panera

Print stocks.

# Print stocks
stocks

##         date    ibm panera ibm_buy panera_sell ibm_vs_panera
## 1 2017-01-20 170.55 216.65    TRUE        TRUE         FALSE
## 2 2017-01-23 171.03 216.06    TRUE        TRUE         FALSE
## 3 2017-01-24 175.90 213.55   FALSE        TRUE         FALSE
## 4 2017-01-25 178.29 212.22   FALSE       FALSE         FALSE

Nice! More complex logic can always be created for useful buy and sell signals.

2.2 Logical operators

2.2.1 And / Or

You might want to check multiple relational conditions at once. What if you wanted to know if Apple stock was above 120, but below 121? Simple relational operators are not enough! For multiple conditions, you need the And operator &, and the Or operator |.

& (And): An intersection. a & b is true only if both a and b are true.

| (Or): A union. a | b is true if either a or b is true.

apple <- c(120.00, 120.08, 119.97, 121.88)

# Both conditions must hold
(apple > 120) & (apple < 121)
[1] FALSE  TRUE FALSE FALSE

# Only one condition has to hold
(apple <= 120) | (apple > 121)
[1]  TRUE FALSE  TRUE  TRUE

The stocks data frame is available for you to use.

# edited by cliex159

When is ibm between 171 and 176? Add the logical vector to stocks as ibm_buy_range.

# IBM buy range 
stocks$ibm_buy_range <- (stocks$ibm > 171) & (stocks$ibm < 176)

Check if panera drops below 213.20 or rises above 216.50, then add it to stocks as the column panera_spike.

# Panera spikes 
stocks$panera_spike <- (stocks$panera < 213.20) | (stocks$panera > 216.50)

Suppose you are interested in dates after 2017-01-21 but before 2017-01-25, exclusive. Use as.Date() and & for this. Add the result to stocks as good_dates.

# Date range    
stocks$good_dates <- (stocks$date > as.Date("2017-01-21")) & (stocks$date < as.Date("2017-01-25"))

Print stocks.

# Print stocks  
stocks

##         date    ibm panera ibm_buy panera_sell ibm_vs_panera ibm_buy_range
## 1 2017-01-20 170.55 216.65    TRUE        TRUE         FALSE         FALSE
## 2 2017-01-23 171.03 216.06    TRUE        TRUE         FALSE          TRUE
## 3 2017-01-24 175.90 213.55   FALSE        TRUE         FALSE          TRUE
## 4 2017-01-25 178.29 212.22   FALSE       FALSE         FALSE         FALSE
##   panera_spike good_dates
## 1         TRUE      FALSE
## 2        FALSE       TRUE
## 3        FALSE       TRUE
## 4         TRUE      FALSE

Awesome! Combining logical and relational operators makes for powerful logic!

2.2.2 Not!

One last operator to introduce is ! or, Not. You have already seen a similar operator, !=, so you might be able to guess what it does. Add ! in front of a logical expression, and it will flip that expression from TRUE to FALSE (and vice versa).

!TRUE
[1] FALSE

apple <- c(120.00, 120.08, 119.97, 121.88)

!(apple < 121)
[1] FALSE FALSE FALSE  TRUE

The stocks data frame is available for you to use.

# edited by cliex159

Use ! and a relational operator to know when ibm is not above 176.

# IBM range
!(stocks$ibm > 176)

## [1]  TRUE  TRUE  TRUE FALSE

A new vector, missing, has been created, which contains missing data.

# Missing data
missing <- c(24.5, 25.7, NA, 28, 28.6, NA)

The function is.na() checks for missing data. Use is.na() on missing.

# Is missing?
is.na(missing)

## [1] FALSE FALSE  TRUE FALSE FALSE  TRUE

Suppose you are more interested in where you are not missing data. ! can show you this. Use ! in front of is.na() to show positions where you do have data.

# Not missing?
!is.na(missing)

## [1]  TRUE  TRUE FALSE  TRUE  TRUE FALSE

Nice! This can help you remove NA’s from your data easily.

2.2.3 Logicals and subset()

Here’s a fun problem. You know how to create logical vectors that tell you when a certain condition is true, but can you subset a data frame to only contains rows where that condition is true?

If you took Introduction to R for Finance, you might remember the subset() function. subset() takes as arguments a data frame (or vector/matrix) and a logical vector of which rows to return:

stocks
        date    ibm panera
1 2017-01-20 170.55 216.65
2 2017-01-23 171.03 216.06
3 2017-01-24 175.90 213.55
4 2017-01-25 178.29 212.22

subset(stocks, ibm < 175)
        date    ibm panera
1 2017-01-20 170.55 216.65
2 2017-01-23 171.03 216.06

Useful, right? The stocks data frame is available for you to use.

# edited by cliex159

Subset stocks to include rows where panera is greater than 216.

# Panera range
subset(stocks, panera > 216)

##         date    ibm panera ibm_buy panera_sell ibm_vs_panera ibm_buy_range
## 1 2017-01-20 170.55 216.65    TRUE        TRUE         FALSE         FALSE
## 2 2017-01-23 171.03 216.06    TRUE        TRUE         FALSE          TRUE
##   panera_spike good_dates
## 1         TRUE      FALSE
## 2        FALSE       TRUE

Subset stocks to retrieve the row where date is equal to “2017-01-23”. Don’t forget as.Date()!

# Specific date
subset(stocks, date == as.Date("2017-01-23"))

##         date    ibm panera ibm_buy panera_sell ibm_vs_panera ibm_buy_range
## 2 2017-01-23 171.03 216.06    TRUE        TRUE         FALSE          TRUE
##   panera_spike good_dates
## 2        FALSE       TRUE

Subset stocks to retrieve rows where ibm is less than 175 and panera is less than 216.50.

# IBM and Panera joint range
subset(stocks, ibm < 175 & panera < 216.50)

##         date    ibm panera ibm_buy panera_sell ibm_vs_panera ibm_buy_range
## 2 2017-01-23 171.03 216.06    TRUE        TRUE         FALSE          TRUE
##   panera_spike good_dates
## 2        FALSE       TRUE

Awesome! This is a great function for interactively looking at different pieces of your data frame.

2.2.4 All together now!

Great! You have learned a lot about operators and subsetting. This will serve you well in future data analysis projects. Let’s do one last exercise that combines a number of operators together.

A new version of the stocks data frame is available for you to use.

# edited by cliex159

First, print stocks. It contains Apple and Microsoft prices for December, 2016.

# View stocks
stocks

##         date    ibm panera ibm_buy panera_sell ibm_vs_panera ibm_buy_range
## 1 2017-01-20 170.55 216.65    TRUE        TRUE         FALSE         FALSE
## 2 2017-01-23 171.03 216.06    TRUE        TRUE         FALSE          TRUE
## 3 2017-01-24 175.90 213.55   FALSE        TRUE         FALSE          TRUE
## 4 2017-01-25 178.29 212.22   FALSE       FALSE         FALSE         FALSE
##   panera_spike good_dates
## 1         TRUE      FALSE
## 2        FALSE       TRUE
## 3        FALSE       TRUE
## 4         TRUE      FALSE

It seems like you have missing data. Let’s investigate further. Use weekdays() on the date column, and assign it to stocks as the column, weekday.

# Weekday investigation
stocks$weekday <- weekdays(stocks$date)

View stocks now. The missing data is on weekends! This makes sense, the stock market is not open on weekends.

# View stocks again
stocks

##         date    ibm panera ibm_buy panera_sell ibm_vs_panera ibm_buy_range
## 1 2017-01-20 170.55 216.65    TRUE        TRUE         FALSE         FALSE
## 2 2017-01-23 171.03 216.06    TRUE        TRUE         FALSE          TRUE
## 3 2017-01-24 175.90 213.55   FALSE        TRUE         FALSE          TRUE
## 4 2017-01-25 178.29 212.22   FALSE       FALSE         FALSE         FALSE
##   panera_spike good_dates   weekday
## 1         TRUE      FALSE    Friday
## 2        FALSE       TRUE    Monday
## 3        FALSE       TRUE   Tuesday
## 4         TRUE      FALSE Wednesday

Remove the missing rows using subset(). Use!is.na() on apple as your condition. Assign this new data frame to stocks_no_NA.

# Remove missing data
stocks_no_NA <- subset(stocks, !is.na(apple))

Now, you are interested in days where apple was above 117, or when micr was above 63. Use relational operators, |, and subset() to accomplish this with stocks_no_NA.

# Apple and Microsoft joint range
subset(stocks_no_NA, apple > 117 | micr > 63)

##         date    ibm panera ibm_buy panera_sell ibm_vs_panera ibm_buy_range
## 1 2017-01-20 170.55 216.65    TRUE        TRUE         FALSE         FALSE
## 2 2017-01-23 171.03 216.06    TRUE        TRUE         FALSE          TRUE
## 3 2017-01-24 175.90 213.55   FALSE        TRUE         FALSE          TRUE
## 4 2017-01-25 178.29 212.22   FALSE       FALSE         FALSE         FALSE
##   panera_spike good_dates   weekday
## 1         TRUE      FALSE    Friday
## 2        FALSE       TRUE    Monday
## 3        FALSE       TRUE   Tuesday
## 4         TRUE      FALSE Wednesday

Woo! Hopefully you can see how useful these operators can be in a fiancial data science workflow.

2.3 If statements

2.3.1 If this

If statements are great for adding extra logical flow to your code. First, let’s look at the basic structure of an if statement:

if(condition) {
    code
}

The condition is anything that returns a single TRUE or FALSE. If the condition is TRUE, then the code inside gets executed. Otherwise, the code gets skipped and the program continues. Here is an example:

apple <- 54.3

if(apple < 70) {
    print("Apple is less than 70")
}
[1] "Apple is less than 70"

Relational operators are a common way to create the condition in the if statement! The variable, micr, has been created for you.

# edited by cliex159

Fill in the if statement that first tests if micr is less than 55, and if it is, then prints “Buy!”.

micr <- 48.55

# Print "Buy!" if micr is less than 55
if( micr < 55 ) {
    print("Buy!")
}

## [1] "Buy!"

Great! Since micr was less than 55, the statement was printed.

2.3.2 If this, Else that

An extension of the if statement is to perform a different action if the condition is false. You can do this by adding else after your if statement:

if(condition) {
    code if true
} else {
    code if false 
}

# edited by cliex159

Extend the last exercise by adding an else statement that prints “Do nothing!”.

micr <- 57.44

# Fill in the blanks
if( micr < 55 ) {
    print("Buy!")
} else {
    print("Do nothing!")
}

## [1] "Do nothing!"

Great! Since micr was greater than 55, ‘Do Nothing!’ was printed.

2.3.3 If this, Else If that, Else that other thing

To add even more logic, you can follow the pattern of if, else if, else. You can add as many else if’s as you need for your control logic.

if(condition1) {
    code if condition1 is true
} else if(condition2) {
    code if condition2 is true
} else {
    code if both are false
}

# edited by cliex159

Extend the last example by filling in the blanks to complete the following logic:

if micr is less than 55, print “Buy!”

else if greater than or equal to 55 and micr is less than 75, print “Do nothing!”

else print “Sell!”

micr <- 105.67

# Fill in the blanks
if( micr < 55 ) {
    print("Buy!")
} else if( micr >= 55 & micr < 75 ){
    print("Do nothing!")
} else { 
    print("Sell!")
}

## [1] "Sell!"

Great! Since micr was greater than all of the conditions, the final else statement was run.

2.3.4 Can you If inside an If?

Sometimes it makes sense to have nested if statements to add even more control. In the following exercise, you will add an if statement that checks if you are holding a share of the Microsoft stock before you attempt to sell it.

Here is the structure of nested if statements, it should look somewhat familiar:

if(condition1) {        
    if(condition2) {     
        code if both pass
    } else {            
        code if 1 passes, 2 fails
    }
} else {            
    code if 1 fails
}

The variables, micr and shares, have been created for you.

# edited by cliex159

Fill in the nested if statement to check if shares is greater than or equal to 1 before you decide to sell.

If this is true, then print “Sell!”.

Else, print “Not enough shares to sell!”.

micr <- 105.67
shares <- 1

# Fill in the blanks
if( micr < 55 ) {
    print("Buy!")
} else if( micr >= 55 & micr < 75 ) {
    print("Do nothing!")
} else { 
    if( shares >= 1 ) {
        print("Sell!")
    } else {
        print("Not enough shares to sell!")
    }
}

## [1] "Sell!"

Great! Since micr was greater than all of the conditions, the final else statement was run.

2.3.5 ifelse()

A powerful function to know about is ifelse(). It creates an if statement in 1 line of code, and more than that, it works on entire vectors!

Suppose you have a vector of stock prices. What if you want to return “Buy!” each time apple > 110, and “Do nothing!”, otherwise? A simple if statement would not be enough to solve this problem. However, with ifelse() you can do:

apple
[1] 109.49 109.90 109.11 109.95 111.03 112.12

ifelse(test = apple > 110, yes = "Buy!", no = "Do nothing!")
[1] "Do nothing!" "Do nothing!" "Do nothing!" "Do nothing!" "Buy!"       
[6] "Buy!"

ifelse() evaluates the test to get a logical vector, and where the logical vector is TRUE it replaces TRUE with whatever is in yes. Similarly, FALSE is replaced by no.

The stocks data frame is available for you to use.

# edited by cliex159
library(tidyverse)
stocks = tribble(~date,      ~apple, ~micr,
"2016-12-01", 109.49, 59.20,
"2016-12-02", 109.90, 59.25,
"2016-12-05", 109.11, 60.22,
"2016-12-06", 109.95, 59.95,
"2016-12-07", 111.03, 61.37,
"2016-12-08", 112.12, 61.01,
"2016-12-09", 113.95, 61.97,
"2016-12-12", 113.30, 62.17,
"2016-12-13", 115.19, 62.98,
"2016-12-14", 115.19, 62.68,
"2016-12-15", 115.82, 62.58,
"2016-12-16", 115.97, 62.30,
"2016-12-19", 116.64, 63.62,
"2016-12-20", 116.95, 63.54,
"2016-12-21", 117.06, 63.54,
"2016-12-22", 116.29, 63.55,
"2016-12-23", 116.52, 63.24,
"2016-12-27", 117.26, 63.28,
"2016-12-28", 116.76, 62.99,
"2016-12-29", 116.73, 62.90,
"2016-12-30", 115.82, 62.14) %>% mutate(date = as.Date(date))

Use ifelse() to test if micr is above 60 but below 62. When true, return a 1 and when false return a 0. Add the result to stocks as the column, micr_buy.

# Microsoft test
stocks$micr_buy <- ifelse(test = stocks$micr > 60 & stocks$micr < 62, yes = 1, no = 0)

Use ifelse() to test if apple is greater than 117. The returned value should be the date column if TRUE, and NA otherwise.

# Apple test
stocks$apple_date <- ifelse(test = stocks$apple > 117, yes = stocks$date, no = NA)

Print stocks. date became a numeric! ifelse() strips the date of its attribute before returning it, so it becomes a numeric.

# Print stocks
stocks

## # A tibble: 21 × 5
##    date       apple  micr micr_buy apple_date
##    <date>     <dbl> <dbl>    <dbl>      <dbl>
##  1 2016-12-01  109.  59.2        0         NA
##  2 2016-12-02  110.  59.2        0         NA
##  3 2016-12-05  109.  60.2        1         NA
##  4 2016-12-06  110.  60.0        0         NA
##  5 2016-12-07  111.  61.4        1         NA
##  6 2016-12-08  112.  61.0        1         NA
##  7 2016-12-09  114.  62.0        1         NA
##  8 2016-12-12  113.  62.2        0         NA
##  9 2016-12-13  115.  63.0        0         NA
## 10 2016-12-14  115.  62.7        0         NA
## # … with 11 more rows

Assigning the apple_date column the class() of “Date”.

# Change the class() of apple_date.
class(stocks$apple_date) <- "Date"

Print stocks again.

# Print stocks again
stocks

## # A tibble: 21 × 5
##    date       apple  micr micr_buy apple_date
##    <date>     <dbl> <dbl>    <dbl> <date>    
##  1 2016-12-01  109.  59.2        0 NA        
##  2 2016-12-02  110.  59.2        0 NA        
##  3 2016-12-05  109.  60.2        1 NA        
##  4 2016-12-06  110.  60.0        0 NA        
##  5 2016-12-07  111.  61.4        1 NA        
##  6 2016-12-08  112.  61.0        1 NA        
##  7 2016-12-09  114.  62.0        1 NA        
##  8 2016-12-12  113.  62.2        0 NA        
##  9 2016-12-13  115.  63.0        0 NA        
## 10 2016-12-14  115.  62.7        0 NA        
## # … with 11 more rows

Nice job! ifelse() is certainly powerful, just make sure it is working like you expect!

3 Loops

Loops can be useful for doing the same operation to each element of your data structure. In this chapter you will learn all about repeat, while, and for loops!

3.1 Repeat loops

3.1.1 Repeat, repeat, repeat

Loops are a core concept in programming. They are used in almost every language. In R, there is another way of performing repeated actions using apply functions, but we will save those until chapter 5. For now, let’s look at the repeat loop!

This is the simplest loop. You use repeat, and inside the curly braces perform some action. You must specify when you want to break out of the loop. Otherwise it runs for eternity!

repeat {
    code
    if(condition) {
        break
    }
}

Do not do the following. This is an infinite loop! In words, you are telling R to repeat your code for eternity.

repeat {
    code
}

# edited by cliex159

A repeat loop has been created for you. Run the script and see what happens.

Change the condition in the if statement to break when stock_price is below 125.

Update the stock price value in the print statement to be consistent with the change.

Rerun the script again. Then press Submit Answer.

# Stock price
stock_price <- 126.34

repeat {
  # New stock price
  stock_price <- stock_price * runif(1, .985, 1.01)
  print(stock_price)
  
  # Check
  if(stock_price < 125) {
    print("Stock price is below 125! Buy it while it's cheap!")
    break
  }
}

## [1] 124.6735
## [1] "Stock price is below 125! Buy it while it's cheap!"

Great job!

3.1.2 When to break?

The order in which you execute your code inside the loop and check when you should break is important. The following would run the code a different number of times.

# Code, then check condition
repeat {
    code
    if(condition) {
        break
    }
}

# Check condition, then code
repeat {
    if(condition) {
        break
    }
    code
}

Let’s see this in an extension of the previous exercise. For the purposes of this example, the runif() function has been replaced with a static multiplier to remove randomness.

# edited by cliex159

The structure of a repeat loop has been created. Fill in the blanks so that the loop checks if the stock_price is below 66, and breaks if so. Run this, and note the number of times that the stock price was printed.

Move the statement print(stock_price) to after the if statement, but still inside the repeat loop. Run the script again, how many times was the stock_price printed now?

# Stock price
stock_price <- 67.55

repeat {
  # New stock price
  stock_price <- stock_price * .995
  print(stock_price)
 
  # Check
  if(stock_price < 66) {
    print("Stock price is below 66! Buy it while it's cheap!")
    break
  }
  
}

## [1] 67.21225
## [1] 66.87619
## [1] 66.54181
## [1] 66.2091
## [1] 65.87805
## [1] "Stock price is below 66! Buy it while it's cheap!"

Nice work!

3.2 While loops

3.2.1 While with a print

While loops are slightly different from repeat loops. Like if statements, you specify the condition for them to run at the very beginning. There is no need for a break statement because the condition is checked at each iteration.

while (condition) {
    code
}

It might seem like the while loop is doing the exact same thing as the repeat loop, just with less code. In our cases, this is true. So, why ever use the repeat loop? Occasionally, there are cases when using a repeat loop to run forever is desired. If you are interested, click here and check out Intentional Looping.

For the exercise, imagine that you have a debt of $5000 that you need to pay back. Each month, you pay off $500 dollars, until you’ve paid everything off. You will use a loop to model the process of paying off the debt each month, where each iteration you decrease your total debt and print out the new total!

The variable debt has been created for you.

# edited by cliex159

Fill in the while loop condition to check if debt is greater than 0. If this is true, decrease debt by 500.

# Initial debt
debt <- 5000

# While loop to pay off your debt
while (debt > 0) {
  debt <- debt - 500
  print(paste("Debt remaining", debt))
}

## [1] "Debt remaining 4500"
## [1] "Debt remaining 4000"
## [1] "Debt remaining 3500"
## [1] "Debt remaining 3000"
## [1] "Debt remaining 2500"
## [1] "Debt remaining 2000"
## [1] "Debt remaining 1500"
## [1] "Debt remaining 1000"
## [1] "Debt remaining 500"
## [1] "Debt remaining 0"

Aren’t loops fun?

3.2.2 While with a plot

Loops can be used for all kinds of fun examples! What if you wanted to visualize your debt decreasing over time? Like the last exercise, this one uses a loop to model paying it off, $500 at a time. However, at each iteration you will also append your remaining debt total to a plot, so that you can visualize the total decreasing as you go.

This exercise has already been done for you. Let’s talk about what is happening here.

First, initialize some variables:

debt = Your current debt

i = Incremented each time debt is reduced. The next point on the x axis.

x_axis = A vector of i’s. The x axis for the plots.

y_axis = A vector of debt. The y axis for the plots.

Also, create the first plot. Just a single point of your current debt.

Then, create a while loop. As long as you still have debt:

debt is reduced by 500.

i is incremented.

x_axis is extended by 1 more point.

y_axis is extended by the next debt point.

The next plot is created from the updated data.

After you run the code, you can use Previous Plot to go back and view all 11 of the created plots!

# edited by cliex159

Just press Submit Answer after you finish exploring!

debt <- 5000    # initial debt
i <- 0          # x axis counter
x_axis <- i     # x axis
y_axis <- debt  # y axis

# Initial plot
plot(x_axis, y_axis, xlim = c(0,10), ylim = c(0,5000))

# Graph your debt
while (debt > 0) {

  # Updating variables
  debt <- debt - 500
  i <- i + 1
  x_axis <- c(x_axis, i)
  y_axis <- c(y_axis, debt)
  
  # Next plot
  plot(x_axis, y_axis, xlim = c(0,10), ylim = c(0,5000))
}

I bet you didn’t know you could make that, did you?

3.2.3 Break it

Sometimes, you have to end your while loop early. With the debt example, if you don’t have enough cash to pay off all of your debt, you won’t be able to continuing paying it down. In this exercise, you will add an if statement and a break to let you know if you run out of money!

while (condition) {
    code
    if (breaking_condition) {
        break
    }
}

The while loop will completely stop, and all lines after it will be run, if the breaking_condition is met. In this case, that condition will be running out of cash!

debt and cash have been defined for you.

# edited by cliex159

First, fill in the while loop, but don’t touch the commented if statement. It should decrement cash and debt by 500 each time. Run this. What happens to cash when you reach 0 debt?

# debt and cash
debt <- 5000
cash <- 4000

# Pay off your debt...if you can!
while (debt > 0) {
  debt <- debt - 500
  cash <- cash - 500
  print(paste("Debt remaining:", debt, "and Cash remaining:", cash))

  # if (cash == 0) {
  #   print("You ran out of cash!")
  #   break
  # }
}

## [1] "Debt remaining: 4500 and Cash remaining: 3500"
## [1] "Debt remaining: 4000 and Cash remaining: 3000"
## [1] "Debt remaining: 3500 and Cash remaining: 2500"
## [1] "Debt remaining: 3000 and Cash remaining: 2000"
## [1] "Debt remaining: 2500 and Cash remaining: 1500"
## [1] "Debt remaining: 2000 and Cash remaining: 1000"
## [1] "Debt remaining: 1500 and Cash remaining: 500"
## [1] "Debt remaining: 1000 and Cash remaining: 0"
## [1] "Debt remaining: 500 and Cash remaining: -500"
## [1] "Debt remaining: 0 and Cash remaining: -1000"

Negative cash? That’s not good! Remove the comments and fill in the if statement. It should break if you run out of cash. Specifically, if cash equals 0. Run the entire program again.

# debt and cash
debt <- 5000
cash <- 4000

# Pay off your debt...if you can!
while (debt > 0) {
  debt <- debt - 500
  cash <- cash - 500
  print(paste("Debt remaining:", debt, "and Cash remaining:", cash))

  if (cash == 0) {
    print("You ran out of cash!")
    break
  }
}

## [1] "Debt remaining: 4500 and Cash remaining: 3500"
## [1] "Debt remaining: 4000 and Cash remaining: 3000"
## [1] "Debt remaining: 3500 and Cash remaining: 2500"
## [1] "Debt remaining: 3000 and Cash remaining: 2000"
## [1] "Debt remaining: 2500 and Cash remaining: 1500"
## [1] "Debt remaining: 2000 and Cash remaining: 1000"
## [1] "Debt remaining: 1500 and Cash remaining: 500"
## [1] "Debt remaining: 1000 and Cash remaining: 0"
## [1] "You ran out of cash!"

Nice job! This allows you to create more than one stopping condition in your loop.

3.3 For loops

3.3.1 Loop over a vector

Last, but not least, in our discussion of loops is the for loop. When you know how many times you want to repeat an action, a for loop is a good option. The idea of the for loop is that you are stepping through a sequence, one at a time, and performing an action at each step along the way. That sequence is commonly a vector of numbers (such as the sequence from 1:10), but could also be numbers that are not in any order like c(2, 5, 4, 6), or even a sequence of characters!

for (value in sequence) {
    code
}

In words this is saying, “for each value in my sequence, run this code.” Examples could be, “for each row of my data frame, print column 1”, or “for each word in my sentence, check if that word is DataCamp.”

Let’s try an example! First, you will create a loop that prints out the values in a sequence from 1 to 10. Then, you will modify that loop to also sum the values from 1 to 10, where at each iteration the next value in the sequence is added to the running sum.

A vector seq and a variable sum have been defined for you.

# edited by cliex159

Fill in the for loop, using seq as your sequence. Print out value during each iteration.

# Sequence
seq <- c(1:10)

# Print loop
for (value in seq) {
    print(value)
}

## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6
## [1] 7
## [1] 8
## [1] 9
## [1] 10

Use the loop to sum the numbers in seq. Each iteration, value should be added to sum, then sum is printed out.

# A sum variable
sum <- 0

# Sum loop
for (value in seq) {
    sum <- sum + value
    print(sum)
}

## [1] 1
## [1] 3
## [1] 6
## [1] 10
## [1] 15
## [1] 21
## [1] 28
## [1] 36
## [1] 45
## [1] 55

Great job! Let’s see what else you can do with for loops.

3.3.2 Loop over data frame rows

Imagine that you are interested in the days where the stock price of Apple rises above 117. If it goes above this value, you want to print out the current date and stock price. If you have a stock data frame with a date and apple price column, could you loop over the rows of the data frame to accomplish this? You certainly could!

Before you do so, note that you can get the number of rows in your data frame using nrow(stock). Then, you can create a sequence to loop over from 1:nrow(stock).

for (row in 1:nrow(stock)) {
    price <- stock[row, "apple"]
    date  <- stock[row, "date"]

    if(price > 117) {
        print(paste("On", date, 
                    "the stock price was", price))
    }
}
[1] "On 2016-12-21 the stock price was 117.06"
[1] "On 2016-12-27 the stock price was 117.26"

This incorporates a number of things we have learned so far. If statements, subsetting vectors, conditionals, and loops! Congratulations for learning so much!

The stocks data frame is available for you to use.

# edited by cliex159
stock = stocks %>% select(date, apple)

Fill in the blanks in the for loop to make the following true:

price should hold that iteration’s price

date should hold that iteration’s date

This time, you want to know if apple goes above 116.

If it does, print the date and price.

If it was below 116, print out the date and print that it was not an important day!

# Loop over stock rows
for (row in 1:nrow(stock)) {
    price <- stock[row, "apple"]
    date  <- stock[row, "date"]

    if(price > 116) {
        print(paste("On", date, 
                    "the stock price was", price))
    } else {
        print(paste("The date:", date, 
                    "is not an important day!"))
    }
}

## [1] "The date: 17136 is not an important day!"
## [1] "The date: 17137 is not an important day!"
## [1] "The date: 17140 is not an important day!"
## [1] "The date: 17141 is not an important day!"
## [1] "The date: 17142 is not an important day!"
## [1] "The date: 17143 is not an important day!"
## [1] "The date: 17144 is not an important day!"
## [1] "The date: 17147 is not an important day!"
## [1] "The date: 17148 is not an important day!"
## [1] "The date: 17149 is not an important day!"
## [1] "The date: 17150 is not an important day!"
## [1] "The date: 17151 is not an important day!"
## [1] "On 17154 the stock price was 116.64"
## [1] "On 17155 the stock price was 116.95"
## [1] "On 17156 the stock price was 117.06"
## [1] "On 17157 the stock price was 116.29"
## [1] "On 17158 the stock price was 116.52"
## [1] "On 17162 the stock price was 117.26"
## [1] "On 17163 the stock price was 116.76"
## [1] "On 17164 the stock price was 116.73"
## [1] "The date: 17165 is not an important day!"

Nice job!

3.3.3 Loop over matrix elements

So far, you have been looping over 1 dimensional data types. If you want to loop over elements in a matrix (columns and rows), then you will have to use nested loops. You will use this idea to print out the correlations between three stocks.

The easiest way to think about this is that you are going to start on row1, and move to the right, hitting col1, col2, …, up until the last column in row1. Then, you move down to row2 and repeat the process.

my_matrix
     [,1]   [,2]  
[1,] "r1c1" "r1c2"
[2,] "r2c1" "r2c2"

# Loop over my_matrix
for(row in 1:nrow(my_matrix)) {
    for(col in 1:ncol(my_matrix)) {
        print(my_matrix[row, col])
    }
}
[1] "r1c1"
[1] "r1c2"
[1] "r2c1"
[1] "r2c2"

The correlation matrix, corr, is available for you to use.

# edited by cliex159
corr = matrix(c(1.00, 0.96, 0.88, 0.96, 1.00, 0.74, 0.88, 0.74, 1.00),ncol=3,dimnames = list(c("apple","ibm","micr"),c("apple","ibm","micr")))

Print corr to get a peek at the data.

# Print out corr
corr

##       apple  ibm micr
## apple  1.00 0.96 0.88
## ibm    0.96 1.00 0.74
## micr   0.88 0.74 1.00

Fill in the nested for loop! It should satisfy the following:

The outer loop should be over the rows of corr.

The inner loop should be over the cols of corr.

The print statement should print the names of the current column and row, and also print their correlation.

# Create a nested loop
for(row in 1:nrow(corr)) {
    for(col in 1:ncol(corr)) {
        print(paste(colnames(corr)[col], "and", rownames(corr)[row],
                    "have a correlation of", corr[row,col]))
    }
}

## [1] "apple and apple have a correlation of 1"
## [1] "ibm and apple have a correlation of 0.96"
## [1] "micr and apple have a correlation of 0.88"
## [1] "apple and ibm have a correlation of 0.96"
## [1] "ibm and ibm have a correlation of 1"
## [1] "micr and ibm have a correlation of 0.74"
## [1] "apple and micr have a correlation of 0.88"
## [1] "ibm and micr have a correlation of 0.74"
## [1] "micr and micr have a correlation of 1"

Nice! That was a tough one. Nested loops require a lot of computational energy, because so many calculations are required.

3.3.4 Break and next

To finish your lesson on loops, let’s return to the concept of break, and the related concept of next. Just like with repeat and while loops, you can break out of a for loop completely by using the break statement. Additionally, if you just want to skip the current iteration, and continue the loop, you can use the next statement. This can be useful if your loop encounters an error, but you don’t want it to break everything.

for (value in sequence) {
    if(next_condition) {
        next
    }
    code
    if(breaking_condition) {
        break
    }
}

You don’t have to use both break and next at the same time, this simply shows the general structure of using them.

The point of using next at the beginning, before the code runs, is to check for a problem before it happens.

# edited by cliex159

The apple vector is in your workspace.

Print out apple. You have some missing values!

# Print apple
apple

## [1] 48.99

Fill in the blanks in the loop to do the following:

Check if value is NA. If so, go to the next iteration.

Check if value is above 117. If so, break and sell!

Else print “Nothing to do here!”.

# Loop through apple. Next if NA. Break if above 117.
for (value in apple) {
    if(is.na(value)) {
        print("Skipping NA")
        next
    }
    
    if(value > 117) {
        print("Time to sell!")
        break
    } else {
        print("Nothing to do here!")
    }
}

## [1] "Nothing to do here!"

Awesome! You’ve become a master looper!

4 Functions

If data structures like data frames and vectors are how you hold your data, functions are how you tell R what to do with your data. In this chapter, you will learn about using built-in functions, creating your own unique functions, and you will finish off with a brief introduction to packages.

4.1 What are functions?

4.1.1 Function help and documentation

When you don’t know how to use a function, or don’t know what arguments it takes, where do you turn? Luckily for you, R has built in documentation. For example, to get help for the names() function, you can type one of:

?names

?names()

help(names)

These all do the same thing; they take you straight to the help page for names()!

In the DataCamp console, this takes you to the RDocumentation site to get help from there, but the information is all the same!

Below, you will explore the documentation for a few other functions.

# edited by cliex159

Use ? to look at the documentation for subset().

# Look at the documentation for subset
?subset

Use ? to look at the documentation for Sys.time().

# Look at the documentation for Sys.time
?Sys.time

Great work!

4.1.2 Optional arguments

Let’s look at some of the round() function’s help documentation. It simply rounds a numeric vector off to a specified number of decimal places.

round(x, digits = 0)

The first argument, x is required. Without it, the function will not work!

The argument digits is known as an optional argument. Optional arguments are ones that don’t have to be set by the user, either because they are given a default value, or because the function can infer them from the other data you have given it. Even though they don’t have to be set, they often provide extra flexibility. Here, digits specifies the number of decimal places to round to.

Explore the round() function in the exercise!

# edited by cliex159

Use round() on 5.4.

# Round 5.4
round(5.4)

## [1] 5

Use round() on 5.4, specify digits = 1.

# Round 5.4 with 1 decimal place
round(5.4, digits = 1)

## [1] 5.4

A vector numbers has been created for you.

numbers <- c(.002623, pi, 812.33345)

Use round() on numbers and specify digits = 3.

# Round numbers to 3 decimal places
round(numbers, digits = 3)

## [1]   0.003   3.142 812.333

Nice job! Optional arguments are great for adding extra features or tweaks.

4.1.3 Functions in functions

To write clean code, sometimes it is useful to use functions inside of other functions. This let’s you use the result of one function directly in another one, without having to create an intermediate variable. You have actually already seen an example of this with print() and paste().

company <- c("Goldman Sachs", "J.P. Morgan", "Fidelity Investments")

for(i in 1:3) {
    print(paste("A large financial institution is", company[i]))
}
[1] "A large financial institution is Goldman Sachs"
[1] "A large financial institution is J.P. Morgan"
[1] "A large financial institution is Fidelity Investments"

paste() strings together the character vectors, and print() prints it to the console.

The exercise below explores simplifying the calculation of the correlation matrix using nested functions. Three vectors of stock prices, apple, ibm, and micr, are available for you to use.

# edited by cliex159

First, cbind() them together in the order of apple, ibm, micr. Save this as stocks.

# cbind() the stocks
stocks <- cbind(apple, ibm, micr)

Then, use cor() on stocks.

# cor() to create the correlation matrix
cor(stocks)

##       apple ibm micr
## apple     1  NA   NA
## ibm      NA   1   NA
## micr     NA  NA    1

Now, let’s see how this would work all at once. Use cbind() inside of cor() with the 3 stock vectors in the same order as above to create the correlation matrix.

# All at once! Nest cbind() inside of cor()
cor(cbind(apple, ibm, micr))

##       apple ibm micr
## apple     1  NA   NA
## ibm      NA   1   NA
## micr     NA  NA    1

Good Job!

4.2 Writing functions

4.2.1 Your first function

Time for your first function! This is a big step in an R programmer’s journey. “Functions are a fundamental building block of R: to master many of the more advanced techniques … you need a solid foundation in how functions work.” -Hadley Wickham

Here is the basic structure of a function:

func_name <- function(arguments) {
    body
}

And here is an example:

square <- function(x) {
    x^2
}

square(2)
[1] 4

Two things to remember from what Lore taught you are arguments and the function body. Arguments are user inputs that the function works on. They can be the data that the function manipulates, or options that affect the calculation. The body of the function is the code that actually performs the manipulation.

The value that a function returns is simply the last executed line of the function body. In the example, since x^2 is the last line of the body, that is what gets returned.

In the exercise, you will create your first function to turn a percentage into a decimal, a useful calculation in finance!

# edited by cliex159

Create a function named percent_to_decimal that takes 1 argument, percent, and returns percent divided by 100.

# Percent to decimal function
percent_to_decimal <- function(percent) {
    percent / 100
}

Call percent_to_decimal() on the percentage 6 (we aren’t using % here, but assume this is 6%).

# Use percent_to_decimal() on 6
percent_to_decimal(6)

## [1] 0.06

A variable pct has been created for you.

# Example percentage
pct <- 8

Call percent_to_decimal() on pct.

# Use percent_to_decimal() on pct
percent_to_decimal(pct)

## [1] 0.08

You just created your first function! Great job!

4.2.2 Multiple arguments (1)

As you saw in the optional arguments example, functions can have multiple arguments. These can help extend the flexibility of your function. Let’s see this in action.

pow <- function(x, power = 2) {
    x^power
}

pow(2)
[1] 4

pow(2, power = 3)
[1] 8

Instead of a square() function, we now have a version that works with any power.

The power argument is optional and has a default value of 2, but the user can easily change this. It is also an example of how you can add multiple arguments. Notice how the arguments are separated by a comma, and the default value is set using an equals sign.

Let’s add some more functionality to percent_to_decimal() that allows you to round the percentage to a certain number of digits.

# edited by cliex159

Fill in the blanks in the improved percent_to_decimal() function to do the following:

Add a second optional argument named digits that defaults to 2.

In the body of the function, divide percent by 100 and assign this to decimal.

Use the round function on decimal, and set the second argument to digits to specify the number of decimal places.

# Percent to decimal function
percent_to_decimal <- function(percent, digits = 2) {
    decimal <- percent / 100    
    round(decimal, digits)
}

Your function will work on vectors with length >1 too. percents has been defined for you.

# percents
percents <- c(25.88, 9.045, 6.23)

Call percent_to_decimal() on percents. Do not specify any optional arguments.

# percent_to_decimal() with default digits
percent_to_decimal(percents)

## [1] 0.26 0.09 0.06

Call percent_to_decimal() on percents again. Specify digits = 4.

# percent_to_decimal() with digits = 4
percent_to_decimal(percents, digits = 4)

## [1] 0.2588 0.0905 0.0623

Way to go! Adding optional arguments isn’t so hard, right?

4.2.3 Multiple arguments (2)

Let’s think about a more complicated example. Do you remember present value from the Introduction to R for Finance course? If not, you can review the video for that here. The idea is that you want to discount money that you will get in the future at a specific interest rate to represent the value of that money in today’s dollars. The following general formula was developed to help with this:

present_value <- cash_flow * (1 + i / 100) ^ -year

Wouldn’t it be nice to have a function that did this calculation for you? Maybe something of the form:

present_value <- pv(cash_flow, i, year)

This function should work if you pass in numerics like pv(1500, 5, 2) and it should work if you pass in vectors of equal length to calculate an entire present value vector at once!

The percent_to_decimal() function is available for you to use.

# edited by cliex159

Fill in the blanks in the function so it does the following:

Require the arguments: cash_flow, i, year

Create the discount multiplier: (1 + i / 100). Use the percent_to_decimal() function to convert i to a decimal.

Perform the present value calculation. Do not store this in a variable. As the last executed line, it will be returned automatically.

# Present value function
pv <- function(cash_flow, i, year) {
    
    # Discount multiplier
    mult <- 1 + percent_to_decimal(i)
    
    # Present value calculation
    cash_flow * mult ^ -year
}

Calculate the present value of $1200, at an interest rate of 7%, to be received 3 years from now.

# Calculate a present value
pv(1200, 7, 3)

## [1] 979.5575

Great! This seems like a pretty useful function!

4.2.4 Function scope (1)

Scoping is the process of how R looks a variable’s value when given a name. For example, given x <- 5, scoping is how R knows where to look to find that the value of x is 5.

Try this scoping exercise!

percent_to_decimal <- function(percent) {
    decimal <- percent / 100
    decimal
}

percent_to_decimal(6)
[1] 0.06

What does typing decimal now return?

Error
0
0.06

Great! decimal was defined to live only inside the percent_to_decimal() function. If you try to access decimal outside of the scope of that function, you will get an error because it does not exist!

4.2.5 Function scope (2)

Let’s try another one. Here, hundred is defined outside of the function scope, but is used inside of the function.

hundred <- 100

percent_to_decimal <- function(percent) {
    percent / hundred
}

What will percent_to_decimal(6) return?

Error
6
0.06

Good job! hundred was defined outside of the percent_to_decimal() function. When the percent_to_decimal function came across hundred, it first looked inside the scope of the function for hundred, and when it couldn’t find it, it looked up one level to find where it was defined in the global scope.

4.3 Packages

4.3.1 tidyquant package

The tidyquant package is focused on retrieving, manipulating, and scaling financial data analysis in the easiest way possible. To get the tidyquant package and start working with it, you first have to install it.

install.packages("tidyquant")

This places it on your local computer. You then have to load it into your current R session. This gives you access to all of the functions in the package.

library(tidyquant)

These steps of installing and librarying packages are necessary for any CRAN package you want to use.

The exercise code is already written for you. You will explore some of the functions that tidyquant has for financial analysis.

# edited by cliex159

The code is already written, but these instructions will walk you through the steps.

First, library the package to access its functions.

# Library tidquant
library(tidyquant)

Use the tidyquant function, tq_get() to get the stock price data for Apple.

# Pull Apple stock data
apple <- tq_get("AAPL", get = "stock.prices", 
                from = "2007-01-03", to = "2017-06-05")

Take a look at the data frame it returned.

# Take a look at what it returned
head(apple)

## # A tibble: 6 × 8
##   symbol date        open  high   low close     volume adjusted
##   <chr>  <date>     <dbl> <dbl> <dbl> <dbl>      <dbl>    <dbl>
## 1 AAPL   2007-01-03  3.08  3.09  2.92  2.99 1238319600     2.56
## 2 AAPL   2007-01-04  3.00  3.07  2.99  3.06  847260400     2.61
## 3 AAPL   2007-01-05  3.06  3.08  3.01  3.04  834741600     2.59
## 4 AAPL   2007-01-08  3.07  3.09  3.05  3.05  797106800     2.61
## 5 AAPL   2007-01-09  3.09  3.32  3.04  3.31 3349298400     2.82
## 6 AAPL   2007-01-10  3.38  3.49  3.34  3.46 2952880000     2.96

Plot the stock price over time.

# Plot the stock price over time
plot(apple$date, apple$adjusted, type = "l")

Calculate daily returns for the adjusted stock price using tq_mutate(). This function “mutates” your data frame by adding a new column onto it. Here, that new column is the daily returns.

# Calculate daily stock returns for the adjusted price
apple <- tq_mutate(data = apple,
                   select = "adjusted",
                   mutate_fun = dailyReturn)

Sort the returns.

# Sort the returns from least to greatest
sorted_returns <- sort(apple$daily.returns)

Plot the sorted returns. You can see that Apple had a few days of losses >10%, and a number of days with gains of >5%.

# Plot them
plot(sorted_returns)

Sweet! There are over 10000 packages out there. Check them out to see what other members of the community have done!

5 Apply

A popular alternative to loops in R are the apply functions. These are often more readable than loops, and are incredibly useful for scaling the data science workflow to perform a complicated calculation on any number of observations. Learn about them here!

5.1 Why use apply?

5.1.1 lapply() on a list

The first function in the apply family that you will learn is lapply(), which is short for “list apply.” When you have a list, and you want to apply the same function to each element of the list, lapply() is a potential solution that always returns another list. How might this work?

Let’s look at a simple example. Suppose you want to find the length of each vector in the following list.

my_list
$a
[1] 2 4 5

$b
[1] 10 14  5  3  4  5  6

# Using lapply
# Note that you don't need parenthesis when calling length
lapply(my_list, FUN = length)
$a
[1] 3

$b
[1] 7

As noted in the video, if at first you thought about looping over each element in the list, and using length() at each iteration, you aren’t wrong. lapply() is the vectorized version of this kind of loop, and is often preferred (and simpler) in the R world.

A list of daily stock returns as percentages called stock_return and the percent_to_decimal() function have been provided.

# edited by cliex159

stock_return = list(apple = c(0.37446342, -0.71883530, 0.76986527, 0.98226467, 0.98171665, 1.63217981, -0.57042563, 1.66813769, 0.00000000, 0.54692248, 0.12951131, 0.57773562, 0.26577503, 0.09405729, -0.65778233, 0.19778141, 0.63508411, -0.42640287, -0.02569373, -0.77957680),
     ibm = c(0.1251408, -0.1124859, 0.3190691, 2.7689429, 0.3458948, 0.7014998, -0.6125390, 1.6858006, 0.1307267, -0.2907839, -0.7677657, -0.0299886, 0.5519558, -0.1610979, -0.1613578, -0.2095056, 0.2579329, -0.5683858, 0.2467056, -0.3661465), 
     micr = c(0.08445946, 1.63713080, -0.44835603, 2.36864053, -0.58660583, 1.57351254, 0.32273681, 1.30287920, -0.47634170, -0.15954052, -0.44742729, 2.11878010, -0.12574662, 0.00000000, 0.01573812, -0.48780488, 0.06325111, -0.45828066, -0.14287982, -1.20826709))

Print stock_return.

# Print stock_return
stock_return

## $apple
##  [1]  0.37446342 -0.71883530  0.76986527  0.98226467  0.98171665  1.63217981
##  [7] -0.57042563  1.66813769  0.00000000  0.54692248  0.12951131  0.57773562
## [13]  0.26577503  0.09405729 -0.65778233  0.19778141  0.63508411 -0.42640287
## [19] -0.02569373 -0.77957680
## 
## $ibm
##  [1]  0.1251408 -0.1124859  0.3190691  2.7689429  0.3458948  0.7014998
##  [7] -0.6125390  1.6858006  0.1307267 -0.2907839 -0.7677657 -0.0299886
## [13]  0.5519558 -0.1610979 -0.1613578 -0.2095056  0.2579329 -0.5683858
## [19]  0.2467056 -0.3661465
## 
## $micr
##  [1]  0.08445946  1.63713080 -0.44835603  2.36864053 -0.58660583  1.57351254
##  [7]  0.32273681  1.30287920 -0.47634170 -0.15954052 -0.44742729  2.11878010
## [13] -0.12574662  0.00000000  0.01573812 -0.48780488  0.06325111 -0.45828066
## [19] -0.14287982 -1.20826709

Fill in the lapply() function to apply percent_to_decimal() to each element in stock_return.

# lapply to change percents to decimal
lapply(stock_return, FUN = percent_to_decimal)

## $apple
##  [1]  0.00 -0.01  0.01  0.01  0.01  0.02 -0.01  0.02  0.00  0.01  0.00  0.01
## [13]  0.00  0.00 -0.01  0.00  0.01  0.00  0.00 -0.01
## 
## $ibm
##  [1]  0.00  0.00  0.00  0.03  0.00  0.01 -0.01  0.02  0.00  0.00 -0.01  0.00
## [13]  0.01  0.00  0.00  0.00  0.00 -0.01  0.00  0.00
## 
## $micr
##  [1]  0.00  0.02  0.00  0.02 -0.01  0.02  0.00  0.01  0.00  0.00  0.00  0.02
## [13]  0.00  0.00  0.00  0.00  0.00  0.00  0.00 -0.01

Great work!

5.1.2 lapply() on a data frame

If, instead of a list, you had a data frame of stock returns, could you still use lapply()? Yes! Perhaps surprisingly, data frames are actually lists under the hood, and an lapply() call would apply the function to each column of the data frame.

df
  a b
1 1 4
2 2 6

class(df)
[1] "data.frame"

lapply(df, FUN = sum)
$a
[1] 3

$b
[1] 10

lapply() summed each column in the data frame, but still follows its convention of always returning a list. A data frame of daily stock returns as decimals called stock_return has been provided.

# edited by cliex159

Print stock_return to see the data frame.

# Print stock_return
stock_return

## $apple
##  [1]  0.37446342 -0.71883530  0.76986527  0.98226467  0.98171665  1.63217981
##  [7] -0.57042563  1.66813769  0.00000000  0.54692248  0.12951131  0.57773562
## [13]  0.26577503  0.09405729 -0.65778233  0.19778141  0.63508411 -0.42640287
## [19] -0.02569373 -0.77957680
## 
## $ibm
##  [1]  0.1251408 -0.1124859  0.3190691  2.7689429  0.3458948  0.7014998
##  [7] -0.6125390  1.6858006  0.1307267 -0.2907839 -0.7677657 -0.0299886
## [13]  0.5519558 -0.1610979 -0.1613578 -0.2095056  0.2579329 -0.5683858
## [19]  0.2467056 -0.3661465
## 
## $micr
##  [1]  0.08445946  1.63713080 -0.44835603  2.36864053 -0.58660583  1.57351254
##  [7]  0.32273681  1.30287920 -0.47634170 -0.15954052 -0.44742729  2.11878010
## [13] -0.12574662  0.00000000  0.01573812 -0.48780488  0.06325111 -0.45828066
## [19] -0.14287982 -1.20826709

Use lapply() to get the average (mean) of each column.

# lapply to get the average returns
lapply(stock_return, FUN = mean)

## $apple
## [1] 0.2838389
## 
## $ibm
## [1] 0.1926806
## 
## $micr
## [1] 0.2472939

Create a function for the sharpe ratio. It should take the average of the returns, subtract the risk free rate (.03%) from it, and then divide by the standard deviation of the returns.

# Sharpe ratio
sharpe <- function(returns) {
    (mean(returns) - .0003) / sd(returns)
}

Use lapply() to calculate the sharpe ratio of each column.

# lapply to get the sharpe ratio
lapply(stock_return, FUN = sharpe)

## $apple
## [1] 0.3961448
## 
## $ibm
## [1] 0.2366101
## 
## $micr
## [1] 0.2483864

Fantastic! lapply() row rowwise calculations can be very useful!

5.1.3 FUN arguments

Often, the function that you want to apply will have other optional arguments that you may want to tweak. Consider the percent_to_decimal() function that allows the user to specify the number of decimal places.

percent_to_decimal(5.4, digits = 3)
[1] 0.054

In the call to lapply() you can specify the named optional arguments after the FUN argument, and they will get passed to the function that you are applying.

my_list
$a
[1] 2.444 3.500

$b
[1] 1.100 2.678 3.450

lapply(my_list, FUN = percent_to_decimal, digits = 4)
$a
[1] 0.0244 0.0350

$b
[1] 0.0110 0.0268 0.0345

In the exercise, you will extend the capability of your sharpe ratio function to allow the user to input the risk free rate as an argument, and then use this with lapply(). A data frame of daily stock returns as decimals called stock_return is available.

# edited by cliex159

Extend sharpe to allow the input of the risk free rate as an optional argument. The default should be set at .0003.

# Extend sharpe() to allow optional argument
sharpe <- function(returns, rf = .0003) {
    (mean(returns) - rf) / sd(returns)
}

Use lapply() on stock_return to find the sharpe ratio if the risk free rate is .0004.

# First lapply()
lapply(stock_return, FUN = sharpe, rf = .0004)

## $apple
## [1] 0.3960051
## 
## $ibm
## [1] 0.2364871
## 
## $micr
## [1] 0.2482859

Use lapply() on stock_return to find the sharpe ratio if the risk free rate is .0009.

# Second lapply()
lapply(stock_return, FUN = sharpe, rf = .0009)

## $apple
## [1] 0.3953065
## 
## $ibm
## [1] 0.2358721
## 
## $micr
## [1] 0.247783

Nice! It is common to pass optional arguments to the function that lapply() calls.

5.2 sapply() - simplify it!

5.2.1 sapply() vs. lapply()

lapply() is great, but sometimes you might want the returned data in a nicer form than a list. For instance, with the sharpe ratio, wouldn’t it be great if the returned sharpe ratios were in a vector rather than a list? Further analysis would likely be easier!

For this, you might want to consider sapply(), or simplify apply. It performs exactly like lapply(), but will attempt to simplify the output if it can. The basic syntax is the same, with a few additional arguments:

sapply(X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE)

These additional optional arguments let you specify if you want sapply() to try and simplify the output, and if you want it to use the names of the object in the output.

In the exercise, you will recalculate sharpe ratios using sapply() to simplify the output. stock_return and the sharpe function are available for you.

# edited by cliex159

First, use lapply() on stock_return to get the sharpe ratio again.

# lapply() on stock_return
lapply(stock_return, FUN = sharpe)

## $apple
## [1] 0.3961448
## 
## $ibm
## [1] 0.2366101
## 
## $micr
## [1] 0.2483864

Now, use sapply() on stock_return to see the simplified sharpe ratio output.

# sapply() on stock_return
sapply(stock_return, FUN = sharpe)

##     apple       ibm      micr 
## 0.3961448 0.2366101 0.2483864

Use sapply() on stock_return to get the sharpe ratio with the arguments simplify = FALSE and USE.NAMES = FALSE. This is equivalent to lapply()!

# sapply() on stock_return with optional arguments
sapply(stock_return, FUN = sharpe, simplify = FALSE, USE.NAMES = FALSE)

## $apple
## [1] 0.3961448
## 
## $ibm
## [1] 0.2366101
## 
## $micr
## [1] 0.2483864

Perfect! It is interesting to see how sapply() can become lapply() with some additional options.

5.2.2 Failing to simplify

For interactive use, sapply() is great. It guesses the output type so that it can simplify, and normally that is fine. However, sapply() is not a safe option to be used when writing functions. If sapply() cannot simplify your output, then it will default to returning a list just like lapply(). This can be dangerous and break custom functions if you wrote them expecting sapply() to return a simplified vector.

Let’s look at an exercise using a list containing information about the stock market crash of 2008.

# edited by cliex159

The list market_crash has been created for you.

Use sapply() to get the class() of each element in market_crash.

# Market crash with as.Date()
market_crash <- list(dow_jones_drop = 777.68, 
                     date = as.Date("2008-09-28"))
                     
# Find the classes with sapply()
sapply(market_crash, class)

## dow_jones_drop           date 
##      "numeric"         "Date"

A new list, market_crash2 has been created. The difference is in the creation of the date!

Use lapply() to get the class() of each element in market_crash2.

Use sapply() to get the class() of each element in market_crash2.

# Market crash with as.POSIXct()
market_crash2 <- list(dow_jones_drop = 777.68, 
                      date = as.POSIXct("2008-09-28"))

# Find the classes with lapply()
lapply(market_crash2, class)

## $dow_jones_drop
## [1] "numeric"
## 
## $date
## [1] "POSIXct" "POSIXt"

# Find the classes with sapply()
sapply(market_crash2, class)

## $dow_jones_drop
## [1] "numeric"
## 
## $date
## [1] "POSIXct" "POSIXt"

date in market_crash2 has multiple classes. Why couldn’t sapply() simplify this?

Nice job! See how sapply() returns a list like lapply() when it fails to simplify?

5.3 vapply() - specify your output!

5.3.1 vapply() vs. sapply()

In the last example, sapply() failed to simplify because the date element of market_crash2 had two classes (POSIXct and POSIXt). Notice, however, that no error was thrown! If a function you had written expected a simplified vector to be returned by sapply(), this would be confusing.

To account for this, there is a more strict apply function called vapply(), which contains an extra argument FUN.VALUE where you can specify the type and length of the output that should be returned each time your applied function is called.

If you expected the return value of class() to be a character vector of length 1, you can specify that using vapply():

vapply(market_crash, class, FUN.VALUE = character(1))
dow_jones_drop           date 
     "numeric"         "Date"

Other examples of FUN.VALUE might be numeric(2) or logical(1). market_crash2 is again defined for you.

# edited by cliex159

Use sapply() again to find the class() of market_crash2 elements. Notice how it returns a list and not an error.

# Market crash with as.POSIXct()
market_crash2 <- list(dow_jones_drop = 777.68, 
                      date = as.POSIXct("2008-09-28"))

# Find the classes with sapply()
sapply(market_crash2, class)

## $dow_jones_drop
## [1] "numeric"
## 
## $date
## [1] "POSIXct" "POSIXt"

Use vapply() on market_crash2 to find the class(). Specify FUN.VALUE = character(1). It should appropriately fail.

# Find the classes with vapply()
#vapply(market_crash2, class, FUN.VALUE = character(1))

Great! This is much clearer since we expected a simplified vector.

5.3.2 More vapply()

The difference between vapply() and sapply() was shown in the last example to demonstrate vapply() appropriately failing, but what about when it doesn’t fail? When there are no errors, vapply() returns a simplified result according to the FUN.VALUE argument.

The stock_return dataset containing daily returns for Apple, IBM, and Microsoft has been provided. The sharpe() function is also available.

# edited by cliex159

Calculate the sharpe ratio for each stock using vapply().

# Sharpe ratio for all stocks
vapply(stock_return, sharpe, FUN.VALUE = numeric(1))

##     apple       ibm      micr 
## 0.3961448 0.2366101 0.2483864

Use summary() on the apple column to get a 6 number summary.

# Summarize Apple
summary(stock_return$apple)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -0.7796 -0.1259  0.2318  0.2838  0.6688  1.6681

vapply() the summary() function across stock_return to summarize each column.

# Summarize all stocks
vapply(stock_return, summary, FUN.VALUE = numeric(6))

##              apple        ibm        micr
## Min.    -0.7795768 -0.7677657 -1.20826709
## 1st Qu. -0.1258710 -0.2298252 -0.45083719
## Median   0.2317782  0.0475761 -0.06287331
## Mean     0.2838389  0.1926806  0.24729391
## 3rd Qu.  0.6687794  0.3257755  0.56777241
## Max.     1.6681377  2.7689429  2.36864053

Good job, vapply() requires more thought when writing the function, but its robustness far outweighs that cost!

5.3.3 Anonymous functions

As a last exercise, you’ll learn about a concept called anonymous functions. So far, when calling an apply function like vapply(), you have been passing in named functions to FUN. Doesn’t it seem like a waste to have to create a function just for that specific vapply() call? Instead, you can use anonymous functions!

Named function:

percent_to_decimal <- function(percent) {
    percent / 100
}

Anonymous function:

function(percent) { percent / 100 }

As you can see, anonymous functions are basically functions that aren’t assigned a name. To use them in vapply() you might do:

vapply(stock_return, FUN = function(percent) { percent / 100 }, 
       FUN.VALUE = numeric(2))
            apple          ibm
[1,]  0.003744634  0.001251408
[2,] -0.007188353 -0.001124859

stock_return is available to use.

# edited by cliex159

Use vapply() to apply an anonymous function that returns a vector of the max() and min() (in that order) of each column of stock_return.

# Max and min
vapply(stock_return, 
       FUN = function(x) { c(max(x), min(x)) }, 
       FUN.VALUE = numeric(2))

##           apple        ibm      micr
## [1,]  1.6681377  2.7689429  2.368641
## [2,] -0.7795768 -0.7677657 -1.208267

Congratulations! You have just completed the course!

5.4 Congratulations

5.4.1 Congratulations

Congratulations! You have completed Intermediate R for Finance. I hope that with your new knowledge of if statements, loops, and functions, you’re equipped with the skills to start working on more advanced financial analysis.

5.4.2 Popular R packages in Finance

I encourage you to check out all of the finance packages that R’s community has to offer. Check out the Empirical Finance Task View on CRAN for an extensive list. People around the world have already created some amazing projects, and hopefully, you can find one that interests you.

5.4.3 Let’s practice!

If you enjoyed this course, make sure you check out the other courses in DataCamp’s quantitative analyst track if you haven’t already. Thanks so much for working through this course with me - now get out there and put your skills to the test!