Intermediate R for Finance
Lore Dirick - DataCamp
Course Description
If you enjoyed the Introduction to R for Finance course, then you will love Intermediate R for Finance. Here, you will first learn the basics about how dates work in R, an important skill for the rest of the course. Your next step will be to explore the world of if statements, loops, and functions. These are powerful ideas that are essential to any financial data scientist’s toolkit. Finally, we will spend some time working with the family of apply functions as a vectorized alternative to loops. And of course, all examples will be finance related! Enjoy!
1 Dates
Welcome! Before we go deeper into the world of R, it will be nice to have an understanding of how dates and times are created. This chapter will teach you enough to begin working with dates, but only scratches the surface of what you can do with them.
1.1 An introduction to dates in R
1.1.1 What day is it?
R has a lot to offer in terms of dates and times. The two main classes
of data for this are Date and POSIXct. Date is used for calendar date
objects like “2015-01-22”
. POSIXct is a way to represent datetime objects like “2015-01-22 08:39:40 EST”
, meaning that it is 40 seconds after 8:39 AM Eastern Standard Time.
In practice, the best strategy is to use the simplest class that you need. Often, Date will be the simplest choice. This course will use the Date class almost exclusively, but it is important to be aware of POSIXct as well for storing intraday financial data.
In the exercise below, you will explore your first date and time objects by asking R to return the current date and the current time.
# edited by cliex159
Sys.Date()
to have R return the current date.
# What is the current date?
Sys.Date()
## [1] "2022-10-22"
Sys.time()
to have R return the current date and time. Notice the difference in capitalization of Date vs time.
# What is the current date and time?
Sys.time()
## [1] "2022-10-22 14:59:27 +07"
Sys.Date()
in the variable today
.
# Create the variable today
<- Sys.Date() today
class()
on today
to confirm its class.
# Confirm the class of today
class(today)
## [1] "Date"
Awesome! What else can you do with dates?
1.1.2 From char to date
You will often have to create dates yourself from character strings. The as.Date()
function is the best way to do this:
# The Great Crash of 1929
great_crash <- as.Date("1929-11-29")
great_crash
[1] "1929-11-29"
class(great_crash)
[1] "Date"
Notice that the date is given in the format of “yyyy-mm-dd”
.
This is known as ISO format (ISO = International Organization for
Standardization), and is the way R accepts and displays dates.
Internally, dates are stored as the number of days since January 1, 1970, and datetimes are stored as the number of seconds since then. You will confirm this in the exercises below.
# edited by cliex159
crash
for “2008-09-29”
, the date of the largest stock market point drop in a single day.
# Create crash
<- as.Date("2008-09-29") crash
crash
.
# Print crash
crash
## [1] "2008-09-29"
as.numeric()
on crash
to convert it to the number of days since January 1, 1970.
# crash as a numeric
as.numeric(crash)
## [1] 14151
as.numeric()
around Sys.time()
to see the current time in number of seconds since January 1, 1970.
# Current time as a numeric
as.numeric(Sys.time())
## [1] 1666425568
“09/29/2008”
. What happens?
# Incorrect date format
as.Date("09/29/2008", format = "%m/%d/%Y")
## [1] "2008-09-29"
Nice job! You’ll learn how to deal with non-standard date formats later on!
1.1.3 Many dates
Creating a single date is nice to know how to do, but with financial data you will often have a large number of dates to work with. When this is the case, you will need to convert multiple dates from character to date format. You can do this all at once using vectors. In fact, if you remembered that a single character is actually a vector of length 1, then you would know that you have been doing this all along!
# Create a vector of daily character dates
dates <- c("2017-01-01", "2017-01-02",
"2017-01-03", "2017-01-04")
as.Date(dates)
[1] "2017-01-01" "2017-01-02" "2017-01-03" "2017-01-04"
Like before, this might look like it returned another character vector, but internally these are all stored as numerics, with some special properties that only dates have.
# edited by cliex159
“2017-02-05”
to “2017-02-08”
inclusive. Call this dates
.
# Create dates from "2017-02-05" to "2017-02-08" inclusive
<- c("2017-02-05", "2017-02-06", "2017-02-07", "2017-02-08") dates
“Sunday”, “Monday”, “Tuesday”, “Wednesday”
, in that order, as names()
of the vector dates
.
# Add names to dates
names(dates) <- c("Sunday", "Monday", "Tuesday", "Wednesday")
dates
using [ ]
to retrieve only the date for “Monday”
.
# Subset dates to only return the date for Monday
"Monday"] dates[
## Monday
## "2017-02-06"
Nice job! Subsetting by name is very useful!
1.2 Date formats and extractor functions
1.2.1 Date formats (1)
As you saw earlier, R is picky about how it reads dates. To remind you, as.Date(“09/28/2008”)
threw an error because it was not in the correct format. The fix for this is to specify the format you are using through the format
argument:
as.Date("09/28/2008", format = "%m / %d / %Y")
[1] "2008-09-29"
This might look strange, but the basic idea is that you are defining a
character vector telling R that your date is in the form of mm/dd/yyyy
. It then knows how to extract the components and switch to yyyy-mm-dd
.
There are a number of different formats you can specify, here are a few of them:
%Y
: 4-digit year (1982)
%y
: 2-digit year (82)
%m
: 2-digit month (01)
%d
: 2-digit day of the month (13)
%A
: weekday (Wednesday)
%a
: abbreviated weekday (Wed)
%B
: month (January)
%b
: abbreviated month (Jan)
# edited by cliex159
In this exercise you will work with the date, “1930-08-30”, Warren Buffett’s birth date!
as.Date()
and an appropriate format to convert “08,30,1930”
to a date (it is in the form of “month,day,year”).
# "08,30,30"
as.Date("08,30,1930", format = "%m, %d, %Y")
## [1] "1930-08-30"
as.Date()
and an appropriate format to convert “Aug 30,1930”
to a date.
# "Aug 30,1930"
as.Date("Aug 30,1930", format = "%b %d, %Y")
## [1] "1930-08-30"
as.Date()
and an appropriate format to convert “30aug1930”
to a date.
# "30aug1930"
as.Date("30aug1930", format = "%d%b%Y")
## [1] "1930-08-30"
Nice! Now you can work with all kinds of date formats.
1.2.2 Date formats (2)
Not only can you convert characters to dates, but you can convert
objects that are already dates to differently formatted dates using format()
:
# The best point move in stock market history. A +936 point change in the Dow!
best_date
[1] "2008-10-13"
format(best_date, format = "%Y/%m/%d")
[1] "2008/10/13"
format(best_date, format = "%B %d, %Y")
[1] "October 13, 2008"
As a reminder, here are the formats:
%Y
: 4-digit year (1982)
%y
: 2-digit year (82)
%m
: 2-digit month (01)
%d
: 2-digit day of the month (13)
%A
: weekday (Wednesday)
%a
: abbreviated weekday (Wed)
%B
: month (January)
%b
: abbreviated month (Jan)
# edited by cliex159
dates
from char_date
, specifying the format
so R reads them correctly.
<- c("1jan17", "2jan17", "3jan17", "4jan17", "5jan17")
char_dates
# Create dates using as.Date() and the correct format
<- as.Date(char_dates, format = "%d%b%y") dates
dates
using format()
so that each date looks like “Jan 04, 17”
.
# Use format() to go from "2017-01-04" -> "Jan 04, 17"
format(dates, format = "%b %d, %y")
## [1] "Jan 01, 17" "Jan 02, 17" "Jan 03, 17" "Jan 04, 17" "Jan 05, 17"
dates
using format()
so that each date looks like “01,04,2017”
.
# Use format() to go from "2017-01-04" -> "01,04,2017"
format(dates, format = "%m,%d,%Y")
## [1] "01,01,2017" "01,02,2017" "01,03,2017" "01,04,2017" "01,05,2017"
Nice Job! This can be useful when reporting or exporting dates.
1.2.3 Subtraction of dates
Just like with numerics, arithmetic can be done on dates. In particular, you can find the difference between two dates, in days, by using subtraction:
today <- as.Date("2017-01-02")
tomorrow <- as.Date("2017-01-03")
one_year_away <- as.Date("2018-01-02")
tomorrow - today
Time difference of 1 days
one_year_away - today
Time difference of 365 days
Equivalently, you could use the difftime()
function to find the time interval instead.
difftime(tomorrow, today) Time difference of 1 days
# With some extra options! difftime(tomorrow, today, units = "secs") Time difference of 86400 secs
# edited by cliex159
dates
has been created for you.
# Dates
<- as.Date(c("2017-01-01", "2017-01-02", "2017-01-03")) dates
origin
containing “1970-01-01”
as a date.
# Create the origin
<- as.Date("1970-01-01") origin
as.numeric()
on dates
to see how many days from January 1, 1970 it has been.
# Use as.numeric() on dates
as.numeric(dates)
## [1] 17167 17168 17169
origin
from dates
to confirm the results! (Notice how recycling is used here!)
# Find the difference between dates and origin
- origin dates
## Time differences in days
## [1] 17167 17168 17169
Great work!
1.2.4 months() and weekdays() and quarters(), oh my!
As a final lesson on dates, there are a few functions that are useful for extracting date components. One of those is months()
.
my_date <- as.Date("2017-01-02")
months(my_date)
[1] "January"
Two other useful functions are weekdays()
to extract the day of the week that your date falls on, and quarters()
to determine which quarter of the year (Q1-Q4) that your date falls in.
# edited by cliex159
dates
has been created for you.
# dates
<- as.Date(c("2017-01-02", "2017-05-03", "2017-08-04", "2017-10-17")) dates
months()
from these dates.
# Extract the months
months(dates)
## [1] "January" "May" "August" "October"
quarters()
from these dates.
# Extract the quarters
quarters(dates)
## [1] "Q1" "Q2" "Q3" "Q4"
dates2
has also been created for you.
# dates2
<- as.Date(c("2017-01-02", "2017-01-03", "2017-01-04", "2017-01-05")) dates2
weekdays()
to determine what day of the week the dates fell on, and assign them to the names of dates2
using names()
.
# Assign the weekdays() of dates2 as the names()
names(dates2) <- weekdays(dates2)
dates2
.
# Print dates2
dates2
## Monday Tuesday Wednesday Thursday
## "2017-01-02" "2017-01-03" "2017-01-04" "2017-01-05"
Nice work! These functions, and a number of other ones, are useful for extracting information from dates.
2 If Statements and Operators
Imagine you own stock in a company. If the stock goes above a certain price, you might want to sell. If the stock drops below a certain price, you might want to buy it while it’s cheap! This kind of thinking can be implemented using operators and if statements. In this chapter, you will learn all about them, and create a program that tells you to buy or sell a stock.
2.1 Relational operators
2.1.1 Relational practice
In the video, Lore taught you all about different types of relational operators. For reference, here they are again:
>
: Greater than
>=
: Greater than or equal to
<
: Less than
<=
: Less than or equal to
==
: Equality
!=
: Not equal
These relational operators let us make comparisons in our data. If the
equation is true, then the relational operator will return TRUE
, otherwise it will return FALSE
.
apple <- 45.46
microsoft <- 67.88
apple <= microsoft
[1] TRUE
hello <- "Hello world"
# Case sensitive! hello == "hello world" [1] FALSE
micr
and apple
stock prices and two dates, today
and tomorrow
, have been created for you.
# edited by cliex159
apple
larger than micr
?
# Stock prices
<- 48.99
apple <- 77.93
micr
# Apple vs. Microsoft
> micr apple
## [1] FALSE
apple
and micr
are not equal using !=
.
# Not equals
!= micr apple
## [1] TRUE
tomorrow
less than today
?
# Dates - today and tomorrow
<- as.Date(Sys.Date())
today <- as.Date(Sys.Date() + 1)
tomorrow
# Today vs. Tomorrow
< today tomorrow
## [1] FALSE
Amazing! Relational operators will be used throughout the course!
2.1.2 Vectorized operations
You can extend the concept of relational operators to vectors of any arbitrary length. Compare two vectors using >
to get a logical vector back of the same length, holding TRUE
when the first is greater than the second, and FALSE
otherwise.
apple <- c(120.00, 120.08, 119.97, 121.88)
datacamp <- c(118.5, 124.21, 125.20, 120.22)
apple > datacamp
[1] TRUE FALSE FALSE TRUE
Comparing a vector and a single number works as well. R will recycle the number to be the same length as the vector:
apple > 120
[1] FALSE TRUE FALSE TRUE
Imagine how this could be used as a buy/sell signal in stock analysis! A data frame, stocks
, is available for you to use.
# edited by cliex159
= as.Date(c("2017-01-20",
date "2017-01-23",
"2017-01-24",
"2017-01-25"))
= c(170.55, 171.03, 175.90, 178.29)
ibm = c(216.65, 216.06, 213.55, 212.22)
panera = data.frame(date = date, ibm = ibm, panera = panera) stocks
stocks
.
# Print stocks
stocks
## date ibm panera
## 1 2017-01-20 170.55 216.65
## 2 2017-01-23 171.03 216.06
## 3 2017-01-24 175.90 213.55
## 4 2017-01-25 178.29 212.22
ibm
when it crosses below 175
. Use $
to select the ibm
column and a logical operator to know when this happens. Add it to stocks
as the column, ibm_buy
.
# IBM range
$ibm_buy <- stocks$ibm < 175 stocks
panera
crosses above 213
, sell. Use a logical operator to know when this happens. Add it to stocks
as the column, panera_sell
.
# Panera range
$panera_sell <- stocks$panera > 213 stocks
ibm
ever above panera
? Add the result to stocks
as the column, ibm_vs_panera
.
# IBM vs Panera
$ibm_vs_panera <- stocks$ibm > stocks$panera stocks
stocks
.
# Print stocks
stocks
## date ibm panera ibm_buy panera_sell ibm_vs_panera
## 1 2017-01-20 170.55 216.65 TRUE TRUE FALSE
## 2 2017-01-23 171.03 216.06 TRUE TRUE FALSE
## 3 2017-01-24 175.90 213.55 FALSE TRUE FALSE
## 4 2017-01-25 178.29 212.22 FALSE FALSE FALSE
Nice! More complex logic can always be created for useful buy and sell signals.
2.2 Logical operators
2.2.1 And / Or
You might want to check multiple relational conditions at once. What if
you wanted to know if Apple stock was above 120, but below 121? Simple
relational operators are not enough! For multiple conditions, you need
the And operator &
, and the Or operator |
.
&
(And): An intersection. a & b
is true only if both a
and b
are true.
|
(Or): A union. a | b
is true if either a
or b
is true.
apple <- c(120.00, 120.08, 119.97, 121.88)
# Both conditions must hold (apple > 120) & (apple < 121) [1] FALSE TRUE FALSE FALSE
# Only one condition has to hold (apple <= 120) | (apple > 121) [1] TRUE FALSE TRUE TRUE
The stocks
data frame is available for you to use.
# edited by cliex159
ibm
between 171
and 176
? Add the logical vector to stocks
as ibm_buy_range
.
# IBM buy range
$ibm_buy_range <- (stocks$ibm > 171) & (stocks$ibm < 176) stocks
panera
drops below 213.20
or rises above 216.50
, then add it to stocks
as the column panera_spike
.
# Panera spikes
$panera_spike <- (stocks$panera < 213.20) | (stocks$panera > 216.50) stocks
2017-01-21
but before 2017-01-25
, exclusive. Use as.Date()
and &
for this. Add the result to stocks
as good_dates
.
# Date range
$good_dates <- (stocks$date > as.Date("2017-01-21")) & (stocks$date < as.Date("2017-01-25")) stocks
stocks
.
# Print stocks
stocks
## date ibm panera ibm_buy panera_sell ibm_vs_panera ibm_buy_range
## 1 2017-01-20 170.55 216.65 TRUE TRUE FALSE FALSE
## 2 2017-01-23 171.03 216.06 TRUE TRUE FALSE TRUE
## 3 2017-01-24 175.90 213.55 FALSE TRUE FALSE TRUE
## 4 2017-01-25 178.29 212.22 FALSE FALSE FALSE FALSE
## panera_spike good_dates
## 1 TRUE FALSE
## 2 FALSE TRUE
## 3 FALSE TRUE
## 4 TRUE FALSE
Awesome! Combining logical and relational operators makes for powerful logic!
2.2.2 Not!
One last operator to introduce is !
or, Not. You have already seen a similar operator, !=
, so you might be able to guess what it does. Add !
in front of a logical expression, and it will flip that expression from TRUE
to FALSE
(and vice versa).
!TRUE
[1] FALSE
apple <- c(120.00, 120.08, 119.97, 121.88)
!(apple < 121)
[1] FALSE FALSE FALSE TRUE
The stocks
data frame is available for you to use.
# edited by cliex159
!
and a relational operator to know when ibm
is not above 176
.
# IBM range
!(stocks$ibm > 176)
## [1] TRUE TRUE TRUE FALSE
missing
, has been created, which contains missing data.
# Missing data
<- c(24.5, 25.7, NA, 28, 28.6, NA) missing
is.na()
checks for missing data. Use is.na()
on missing
.
# Is missing?
is.na(missing)
## [1] FALSE FALSE TRUE FALSE FALSE TRUE
!
can show you this. Use !
in front of is.na()
to show positions where you do have data.
# Not missing?
!is.na(missing)
## [1] TRUE TRUE FALSE TRUE TRUE FALSE
Nice! This can help you remove NA’s from your data easily.
2.2.3 Logicals and subset()
Here’s a fun problem. You know how to create logical vectors that tell you when a certain condition is true, but can you subset a data frame to only contains rows where that condition is true?
If you took Introduction to R for Finance, you might remember the subset()
function. subset()
takes as arguments a data frame (or vector/matrix) and a logical vector of which rows to return:
stocks
date ibm panera
1 2017-01-20 170.55 216.65
2 2017-01-23 171.03 216.06
3 2017-01-24 175.90 213.55
4 2017-01-25 178.29 212.22
subset(stocks, ibm < 175)
date ibm panera
1 2017-01-20 170.55 216.65
2 2017-01-23 171.03 216.06
Useful, right? The stocks
data frame is available for you to use.
# edited by cliex159
stocks
to include rows where panera
is greater than 216
.
# Panera range
subset(stocks, panera > 216)
## date ibm panera ibm_buy panera_sell ibm_vs_panera ibm_buy_range
## 1 2017-01-20 170.55 216.65 TRUE TRUE FALSE FALSE
## 2 2017-01-23 171.03 216.06 TRUE TRUE FALSE TRUE
## panera_spike good_dates
## 1 TRUE FALSE
## 2 FALSE TRUE
stocks
to retrieve the row where date
is equal to “2017-01-23”
. Don’t forget as.Date()
!
# Specific date
subset(stocks, date == as.Date("2017-01-23"))
## date ibm panera ibm_buy panera_sell ibm_vs_panera ibm_buy_range
## 2 2017-01-23 171.03 216.06 TRUE TRUE FALSE TRUE
## panera_spike good_dates
## 2 FALSE TRUE
stocks
to retrieve rows where ibm
is less than 175
and panera
is less than 216.50
.
# IBM and Panera joint range
subset(stocks, ibm < 175 & panera < 216.50)
## date ibm panera ibm_buy panera_sell ibm_vs_panera ibm_buy_range
## 2 2017-01-23 171.03 216.06 TRUE TRUE FALSE TRUE
## panera_spike good_dates
## 2 FALSE TRUE
Awesome! This is a great function for interactively looking at different pieces of your data frame.
2.2.4 All together now!
Great! You have learned a lot about operators and subsetting. This will serve you well in future data analysis projects. Let’s do one last exercise that combines a number of operators together.
A new version of the stocks
data frame is available for you to use.
# edited by cliex159
stocks
. It contains Apple and Microsoft prices for December, 2016.
# View stocks
stocks
## date ibm panera ibm_buy panera_sell ibm_vs_panera ibm_buy_range
## 1 2017-01-20 170.55 216.65 TRUE TRUE FALSE FALSE
## 2 2017-01-23 171.03 216.06 TRUE TRUE FALSE TRUE
## 3 2017-01-24 175.90 213.55 FALSE TRUE FALSE TRUE
## 4 2017-01-25 178.29 212.22 FALSE FALSE FALSE FALSE
## panera_spike good_dates
## 1 TRUE FALSE
## 2 FALSE TRUE
## 3 FALSE TRUE
## 4 TRUE FALSE
weekdays()
on the date
column, and assign it to stocks
as the column, weekday
.
# Weekday investigation
$weekday <- weekdays(stocks$date) stocks
stocks
now. The missing data is on weekends! This makes sense, the stock market is not open on weekends.
# View stocks again
stocks
## date ibm panera ibm_buy panera_sell ibm_vs_panera ibm_buy_range
## 1 2017-01-20 170.55 216.65 TRUE TRUE FALSE FALSE
## 2 2017-01-23 171.03 216.06 TRUE TRUE FALSE TRUE
## 3 2017-01-24 175.90 213.55 FALSE TRUE FALSE TRUE
## 4 2017-01-25 178.29 212.22 FALSE FALSE FALSE FALSE
## panera_spike good_dates weekday
## 1 TRUE FALSE Friday
## 2 FALSE TRUE Monday
## 3 FALSE TRUE Tuesday
## 4 TRUE FALSE Wednesday
subset()
. Use!is.na()
on apple
as your condition. Assign this new data frame to stocks_no_NA
.
# Remove missing data
<- subset(stocks, !is.na(apple)) stocks_no_NA
apple
was above 117
, or when micr
was above 63
. Use relational operators, |
, and subset()
to accomplish this with stocks_no_NA
.
# Apple and Microsoft joint range
subset(stocks_no_NA, apple > 117 | micr > 63)
## date ibm panera ibm_buy panera_sell ibm_vs_panera ibm_buy_range
## 1 2017-01-20 170.55 216.65 TRUE TRUE FALSE FALSE
## 2 2017-01-23 171.03 216.06 TRUE TRUE FALSE TRUE
## 3 2017-01-24 175.90 213.55 FALSE TRUE FALSE TRUE
## 4 2017-01-25 178.29 212.22 FALSE FALSE FALSE FALSE
## panera_spike good_dates weekday
## 1 TRUE FALSE Friday
## 2 FALSE TRUE Monday
## 3 FALSE TRUE Tuesday
## 4 TRUE FALSE Wednesday
Woo! Hopefully you can see how useful these operators can be in a fiancial data science workflow.
2.3 If statements
2.3.1 If this
If statements are great for adding extra logical flow to your code. First, let’s look at the basic structure of an if statement:
if(condition) {
code
}
The condition is anything that returns a single TRUE
or FALSE
. If the condition is TRUE
, then the code inside gets executed. Otherwise, the code gets skipped and the program continues. Here is an example:
apple <- 54.3
if(apple < 70) {
print("Apple is less than 70")
}
[1] "Apple is less than 70"
Relational operators are a common way to create the condition in the if statement! The variable, micr
, has been created for you.
# edited by cliex159
micr
is less than 55
, and if it is, then prints “Buy!”
.
<- 48.55
micr
# Print "Buy!" if micr is less than 55
if( micr < 55 ) {
print("Buy!")
}
## [1] "Buy!"
Great! Since micr
was less than 55
, the statement was printed.
2.3.2 If this, Else that
An extension of the if statement is to perform a different action if the condition is false. You can do this by adding else
after your if
statement:
if(condition) {
code if true
} else {
code if false
}
# edited by cliex159
else
statement that prints “Do nothing!”
.
<- 57.44
micr
# Fill in the blanks
if( micr < 55 ) {
print("Buy!")
else {
} print("Do nothing!")
}
## [1] "Do nothing!"
Great! Since micr
was greater than 55
, ‘Do Nothing!’
was printed.
2.3.3 If this, Else If that, Else that other thing
To add even more logic, you can follow the pattern of if, else if, else
. You can add as many else if
’s as you need for your control logic.
if(condition1) {
code if condition1 is true
} else if(condition2) {
code if condition2 is true
} else {
code if both are false
}
# edited by cliex159
micr
is less than 55
, print “Buy!”
55
and micr
is less than 75
, print “Do nothing!”
“Sell!”
<- 105.67
micr
# Fill in the blanks
if( micr < 55 ) {
print("Buy!")
else if( micr >= 55 & micr < 75 ){
} print("Do nothing!")
else {
} print("Sell!")
}
## [1] "Sell!"
Great! Since micr
was greater than all of the conditions, the final else statement was run.
2.3.4 Can you If inside an If?
Sometimes it makes sense to have nested if statements to add even more control. In the following exercise, you will add an if statement that checks if you are holding a share of the Microsoft stock before you attempt to sell it.
Here is the structure of nested if statements, it should look somewhat familiar:
if(condition1) {
if(condition2) {
code if both pass
} else {
code if 1 passes, 2 fails
}
} else {
code if 1 fails
}
The variables, micr
and shares
, have been created for you.
# edited by cliex159
shares
is greater than or equal to 1
before you decide to sell.
“Sell!”
.
“Not enough shares to sell!”
.
<- 105.67
micr <- 1
shares
# Fill in the blanks
if( micr < 55 ) {
print("Buy!")
else if( micr >= 55 & micr < 75 ) {
} print("Do nothing!")
else {
} if( shares >= 1 ) {
print("Sell!")
else {
} print("Not enough shares to sell!")
} }
## [1] "Sell!"
Great! Since micr
was greater than all of the conditions, the final else statement was run.
2.3.5 ifelse()
A powerful function to know about is ifelse()
. It creates an if statement in 1 line of code, and more than that, it works on entire vectors!
Suppose you have a vector of stock prices. What if you want to return “Buy!”
each time apple > 110
, and “Do nothing!”
, otherwise? A simple if statement would not be enough to solve this problem. However, with ifelse()
you can do:
apple
[1] 109.49 109.90 109.11 109.95 111.03 112.12
ifelse(test = apple > 110, yes = "Buy!", no = "Do nothing!")
[1] "Do nothing!" "Do nothing!" "Do nothing!" "Do nothing!" "Buy!"
[6] "Buy!"
ifelse()
evaluates the test
to get a logical vector, and where the logical vector is TRUE
it replaces TRUE
with whatever is in yes
. Similarly, FALSE
is replaced by no
.
The stocks
data frame is available for you to use.
# edited by cliex159
library(tidyverse)
= tribble(~date, ~apple, ~micr,
stocks "2016-12-01", 109.49, 59.20,
"2016-12-02", 109.90, 59.25,
"2016-12-05", 109.11, 60.22,
"2016-12-06", 109.95, 59.95,
"2016-12-07", 111.03, 61.37,
"2016-12-08", 112.12, 61.01,
"2016-12-09", 113.95, 61.97,
"2016-12-12", 113.30, 62.17,
"2016-12-13", 115.19, 62.98,
"2016-12-14", 115.19, 62.68,
"2016-12-15", 115.82, 62.58,
"2016-12-16", 115.97, 62.30,
"2016-12-19", 116.64, 63.62,
"2016-12-20", 116.95, 63.54,
"2016-12-21", 117.06, 63.54,
"2016-12-22", 116.29, 63.55,
"2016-12-23", 116.52, 63.24,
"2016-12-27", 117.26, 63.28,
"2016-12-28", 116.76, 62.99,
"2016-12-29", 116.73, 62.90,
"2016-12-30", 115.82, 62.14) %>% mutate(date = as.Date(date))
ifelse()
to test if micr
is above 60
but below 62
. When true, return a 1
and when false return a 0
. Add the result to stocks
as the column, micr_buy
.
# Microsoft test
$micr_buy <- ifelse(test = stocks$micr > 60 & stocks$micr < 62, yes = 1, no = 0) stocks
ifelse()
to test if apple
is greater than 117
. The returned value should be the date
column if TRUE
, and NA
otherwise.
# Apple test
$apple_date <- ifelse(test = stocks$apple > 117, yes = stocks$date, no = NA) stocks
stocks
. date
became a numeric! ifelse()
strips the date of its attribute before returning it, so it becomes a numeric.
# Print stocks
stocks
## # A tibble: 21 × 5
## date apple micr micr_buy apple_date
## <date> <dbl> <dbl> <dbl> <dbl>
## 1 2016-12-01 109. 59.2 0 NA
## 2 2016-12-02 110. 59.2 0 NA
## 3 2016-12-05 109. 60.2 1 NA
## 4 2016-12-06 110. 60.0 0 NA
## 5 2016-12-07 111. 61.4 1 NA
## 6 2016-12-08 112. 61.0 1 NA
## 7 2016-12-09 114. 62.0 1 NA
## 8 2016-12-12 113. 62.2 0 NA
## 9 2016-12-13 115. 63.0 0 NA
## 10 2016-12-14 115. 62.7 0 NA
## # … with 11 more rows
apple_date
column the class()
of “Date”
.
# Change the class() of apple_date.
class(stocks$apple_date) <- "Date"
stocks
again.
# Print stocks again
stocks
## # A tibble: 21 × 5
## date apple micr micr_buy apple_date
## <date> <dbl> <dbl> <dbl> <date>
## 1 2016-12-01 109. 59.2 0 NA
## 2 2016-12-02 110. 59.2 0 NA
## 3 2016-12-05 109. 60.2 1 NA
## 4 2016-12-06 110. 60.0 0 NA
## 5 2016-12-07 111. 61.4 1 NA
## 6 2016-12-08 112. 61.0 1 NA
## 7 2016-12-09 114. 62.0 1 NA
## 8 2016-12-12 113. 62.2 0 NA
## 9 2016-12-13 115. 63.0 0 NA
## 10 2016-12-14 115. 62.7 0 NA
## # … with 11 more rows
Nice job! ifelse()
is certainly powerful, just make sure it is working like you expect!
3 Loops
Loops can be useful for doing the same operation to each element of your data structure. In this chapter you will learn all about repeat, while, and for loops!
3.1 Repeat loops
3.1.1 Repeat, repeat, repeat
Loops are a core concept in programming. They are used in almost every language. In R, there is another way of performing repeated actions using apply functions, but we will save those until chapter 5. For now, let’s look at the repeat loop!
This is the simplest loop. You use repeat
, and inside the curly braces perform some action. You must specify when you want to break
out of the loop. Otherwise it runs for eternity!
repeat {
code
if(condition) {
break
}
}
Do not do the following. This is an infinite loop! In words, you are telling R to repeat
your code
for eternity.
repeat {
code
}
# edited by cliex159
condition
in the if statement to break when stock_price
is below 125
.
# Stock price
<- 126.34
stock_price
repeat {
# New stock price
<- stock_price * runif(1, .985, 1.01)
stock_price print(stock_price)
# Check
if(stock_price < 125) {
print("Stock price is below 125! Buy it while it's cheap!")
break
} }
## [1] 124.6735
## [1] "Stock price is below 125! Buy it while it's cheap!"
Great job!
3.1.2 When to break?
The order in which you execute your code inside the loop and check when
you should break is important. The following would run the code
a different number of times.
# Code, then check condition repeat { code if(condition) { break } }
# Check condition, then code repeat { if(condition) { break } code }
Let’s see this in an extension of the previous exercise. For the purposes of this example, the runif()
function has been replaced with a static multiplier to remove randomness.
# edited by cliex159
repeat
loop has been created. Fill in the blanks so that the loop checks if the stock_price
is below 66
, and break
s if so. Run this, and note the number of times that the stock price was printed.
print(stock_price)
to after the if statement, but still inside the repeat loop. Run the script again, how many times was the stock_price
printed now?
# Stock price
<- 67.55
stock_price
repeat {
# New stock price
<- stock_price * .995
stock_price print(stock_price)
# Check
if(stock_price < 66) {
print("Stock price is below 66! Buy it while it's cheap!")
break
}
}
## [1] 67.21225
## [1] 66.87619
## [1] 66.54181
## [1] 66.2091
## [1] 65.87805
## [1] "Stock price is below 66! Buy it while it's cheap!"
Nice work!
3.2 While loops
3.2.1 While with a print
While loops are slightly different from repeat loops. Like if statements, you specify the condition for them to run at the very beginning. There is no need for a break statement because the condition is checked at each iteration.
while (condition) {
code
}
It might seem like the while loop is doing the exact same thing as the repeat loop, just with less code. In our cases, this is true. So, why ever use the repeat loop? Occasionally, there are cases when using a repeat loop to run forever is desired. If you are interested, click here and check out Intentional Looping.
For the exercise, imagine that you have a debt of $5000 that you need to pay back. Each month, you pay off $500 dollars, until you’ve paid everything off. You will use a loop to model the process of paying off the debt each month, where each iteration you decrease your total debt and print out the new total!
The variable debt
has been created for you.
# edited by cliex159
debt
is greater than 0
. If this is true, decrease debt
by 500
.
# Initial debt
<- 5000
debt
# While loop to pay off your debt
while (debt > 0) {
<- debt - 500
debt print(paste("Debt remaining", debt))
}
## [1] "Debt remaining 4500"
## [1] "Debt remaining 4000"
## [1] "Debt remaining 3500"
## [1] "Debt remaining 3000"
## [1] "Debt remaining 2500"
## [1] "Debt remaining 2000"
## [1] "Debt remaining 1500"
## [1] "Debt remaining 1000"
## [1] "Debt remaining 500"
## [1] "Debt remaining 0"
Aren’t loops fun?
3.2.2 While with a plot
Loops can be used for all kinds of fun examples! What if you wanted to visualize your debt decreasing over time? Like the last exercise, this one uses a loop to model paying it off, $500 at a time. However, at each iteration you will also append your remaining debt total to a plot, so that you can visualize the total decreasing as you go.
This exercise has already been done for you. Let’s talk about what is happening here.
First, initialize some variables:
debt
= Your current debt
i
= Incremented each time debt is reduced. The next point on the x axis.
x_axis
= A vector of i
’s. The x axis for the plots.
y_axis
= A vector of debt
. The y axis for the plots.
Then, create a while loop. As long as you still have debt:
debt
is reduced by 500.
i
is incremented.
x_axis
is extended by 1 more point.
y_axis
is extended by the next debt point.
After you run the code, you can use Previous Plot to go back and view all 11 of the created plots!
# edited by cliex159
<- 5000 # initial debt
debt <- 0 # x axis counter
i <- i # x axis
x_axis <- debt # y axis
y_axis
# Initial plot
plot(x_axis, y_axis, xlim = c(0,10), ylim = c(0,5000))
# Graph your debt
while (debt > 0) {
# Updating variables
<- debt - 500
debt <- i + 1
i <- c(x_axis, i)
x_axis <- c(y_axis, debt)
y_axis
# Next plot
plot(x_axis, y_axis, xlim = c(0,10), ylim = c(0,5000))
}
I bet you didn’t know you could make that, did you?
3.2.3 Break it
Sometimes, you have to end your while loop early. With the debt example, if you don’t have enough cash
to pay off all of your debt, you won’t be able to continuing paying it
down. In this exercise, you will add an if statement and a break to let
you know if you run out of money!
while (condition) {
code
if (breaking_condition) {
break
}
}
The while loop will completely stop, and all lines after it will be run, if the breaking_condition
is met. In this case, that condition will be running out of cash
!
debt
and cash
have been defined for you.
# edited by cliex159
cash
and debt
by 500
each time. Run this. What happens to cash
when you reach 0
debt
?
# debt and cash
<- 5000
debt <- 4000
cash
# Pay off your debt...if you can!
while (debt > 0) {
<- debt - 500
debt <- cash - 500
cash print(paste("Debt remaining:", debt, "and Cash remaining:", cash))
# if (cash == 0) {
# print("You ran out of cash!")
# break
# }
}
## [1] "Debt remaining: 4500 and Cash remaining: 3500"
## [1] "Debt remaining: 4000 and Cash remaining: 3000"
## [1] "Debt remaining: 3500 and Cash remaining: 2500"
## [1] "Debt remaining: 3000 and Cash remaining: 2000"
## [1] "Debt remaining: 2500 and Cash remaining: 1500"
## [1] "Debt remaining: 2000 and Cash remaining: 1000"
## [1] "Debt remaining: 1500 and Cash remaining: 500"
## [1] "Debt remaining: 1000 and Cash remaining: 0"
## [1] "Debt remaining: 500 and Cash remaining: -500"
## [1] "Debt remaining: 0 and Cash remaining: -1000"
break
if you run out of cash
. Specifically, if cash
equals 0
. Run the entire program again.
# debt and cash
<- 5000
debt <- 4000
cash
# Pay off your debt...if you can!
while (debt > 0) {
<- debt - 500
debt <- cash - 500
cash print(paste("Debt remaining:", debt, "and Cash remaining:", cash))
if (cash == 0) {
print("You ran out of cash!")
break
} }
## [1] "Debt remaining: 4500 and Cash remaining: 3500"
## [1] "Debt remaining: 4000 and Cash remaining: 3000"
## [1] "Debt remaining: 3500 and Cash remaining: 2500"
## [1] "Debt remaining: 3000 and Cash remaining: 2000"
## [1] "Debt remaining: 2500 and Cash remaining: 1500"
## [1] "Debt remaining: 2000 and Cash remaining: 1000"
## [1] "Debt remaining: 1500 and Cash remaining: 500"
## [1] "Debt remaining: 1000 and Cash remaining: 0"
## [1] "You ran out of cash!"
Nice job! This allows you to create more than one stopping condition in your loop.
3.3 For loops
3.3.1 Loop over a vector
Last, but not least, in our discussion of loops is the for loop. When
you know how many times you want to repeat an action, a for loop is a
good option. The idea of the for loop is that you are stepping through a
sequence, one at a time, and performing an action at each step along
the way. That sequence is commonly a vector of numbers (such as the
sequence from 1:10
), but could also be numbers that are not in any order like c(2, 5, 4, 6)
, or even a sequence of characters!
for (value in sequence) {
code
}
In words this is saying, “for each value in my sequence, run this code.” Examples could be, “for each row of my data frame, print column 1”, or “for each word in my sentence, check if that word is DataCamp.”
Let’s try an example! First, you will create a loop that prints out the values in a sequence from 1 to 10. Then, you will modify that loop to also sum the values from 1 to 10, where at each iteration the next value in the sequence is added to the running sum.
A vector seq
and a variable sum
have been defined for you.
# edited by cliex159
seq
as your sequence. Print out value
during each iteration.
# Sequence
<- c(1:10)
seq
# Print loop
for (value in seq) {
print(value)
}
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6
## [1] 7
## [1] 8
## [1] 9
## [1] 10
seq
. Each iteration, value
should be added to sum
, then sum
is printed out.
# A sum variable
<- 0
sum
# Sum loop
for (value in seq) {
<- sum + value
sum print(sum)
}
## [1] 1
## [1] 3
## [1] 6
## [1] 10
## [1] 15
## [1] 21
## [1] 28
## [1] 36
## [1] 45
## [1] 55
Great job! Let’s see what else you can do with for loops.
3.3.2 Loop over data frame rows
Imagine that you are interested in the days where the stock price of Apple rises above 117
. If it goes above this value, you want to print out the current date and stock price. If you have a stock
data frame with a date
and apple
price column, could you loop over the rows of the data frame to accomplish this? You certainly could!
Before you do so, note that you can get the number of rows in your data frame using nrow(stock)
. Then, you can create a sequence to loop over from 1:nrow(stock)
.
for (row in 1:nrow(stock)) {
price <- stock[row, "apple"]
date <- stock[row, "date"]
if(price > 117) {
print(paste("On", date,
"the stock price was", price))
}
}
[1] "On 2016-12-21 the stock price was 117.06"
[1] "On 2016-12-27 the stock price was 117.26"
This incorporates a number of things we have learned so far. If statements, subsetting vectors, conditionals, and loops! Congratulations for learning so much!
The stocks
data frame is available for you to use.
# edited by cliex159
= stocks %>% select(date, apple) stock
price
should hold that iteration’s price
date
should hold that iteration’s date
apple
goes above 116
.
date
and price
.
116
, print out the date
and print that it was not an important day!
# Loop over stock rows
for (row in 1:nrow(stock)) {
<- stock[row, "apple"]
price <- stock[row, "date"]
date
if(price > 116) {
print(paste("On", date,
"the stock price was", price))
else {
} print(paste("The date:", date,
"is not an important day!"))
} }
## [1] "The date: 17136 is not an important day!"
## [1] "The date: 17137 is not an important day!"
## [1] "The date: 17140 is not an important day!"
## [1] "The date: 17141 is not an important day!"
## [1] "The date: 17142 is not an important day!"
## [1] "The date: 17143 is not an important day!"
## [1] "The date: 17144 is not an important day!"
## [1] "The date: 17147 is not an important day!"
## [1] "The date: 17148 is not an important day!"
## [1] "The date: 17149 is not an important day!"
## [1] "The date: 17150 is not an important day!"
## [1] "The date: 17151 is not an important day!"
## [1] "On 17154 the stock price was 116.64"
## [1] "On 17155 the stock price was 116.95"
## [1] "On 17156 the stock price was 117.06"
## [1] "On 17157 the stock price was 116.29"
## [1] "On 17158 the stock price was 116.52"
## [1] "On 17162 the stock price was 117.26"
## [1] "On 17163 the stock price was 116.76"
## [1] "On 17164 the stock price was 116.73"
## [1] "The date: 17165 is not an important day!"
Nice job!
3.3.3 Loop over matrix elements
So far, you have been looping over 1 dimensional data types. If you want to loop over elements in a matrix (columns and rows), then you will have to use nested loops. You will use this idea to print out the correlations between three stocks.
The easiest way to think about this is that you are going to start on row1, and move to the right, hitting col1, col2, …, up until the last column in row1. Then, you move down to row2 and repeat the process.
my_matrix [,1] [,2] [1,] "r1c1" "r1c2" [2,] "r2c1" "r2c2"
# Loop over my_matrix for(row in 1:nrow(my_matrix)) { for(col in 1:ncol(my_matrix)) { print(my_matrix[row, col]) } } [1] "r1c1" [1] "r1c2" [1] "r2c1" [1] "r2c2"
The correlation matrix, corr
, is available for you to use.
# edited by cliex159
= matrix(c(1.00, 0.96, 0.88, 0.96, 1.00, 0.74, 0.88, 0.74, 1.00),ncol=3,dimnames = list(c("apple","ibm","micr"),c("apple","ibm","micr"))) corr
corr
to get a peek at the data.
# Print out corr
corr
## apple ibm micr
## apple 1.00 0.96 0.88
## ibm 0.96 1.00 0.74
## micr 0.88 0.74 1.00
row
s of corr
.
col
s of corr
.
# Create a nested loop
for(row in 1:nrow(corr)) {
for(col in 1:ncol(corr)) {
print(paste(colnames(corr)[col], "and", rownames(corr)[row],
"have a correlation of", corr[row,col]))
} }
## [1] "apple and apple have a correlation of 1"
## [1] "ibm and apple have a correlation of 0.96"
## [1] "micr and apple have a correlation of 0.88"
## [1] "apple and ibm have a correlation of 0.96"
## [1] "ibm and ibm have a correlation of 1"
## [1] "micr and ibm have a correlation of 0.74"
## [1] "apple and micr have a correlation of 0.88"
## [1] "ibm and micr have a correlation of 0.74"
## [1] "micr and micr have a correlation of 1"
Nice! That was a tough one. Nested loops require a lot of computational energy, because so many calculations are required.
3.3.4 Break and next
To finish your lesson on loops, let’s return to the concept of break, and the related concept of next. Just like with repeat and while loops, you can break out of a for loop completely by using the break statement. Additionally, if you just want to skip the current iteration, and continue the loop, you can use the next statement. This can be useful if your loop encounters an error, but you don’t want it to break everything.
for (value in sequence) {
if(next_condition) {
next
}
code
if(breaking_condition) {
break
}
}
You don’t have to use both break and next at the same time, this simply shows the general structure of using them.
The point of using next at the beginning, before the code runs, is to check for a problem before it happens.
# edited by cliex159
The apple
vector is in your workspace.
apple
. You have some missing values!
# Print apple
apple
## [1] 48.99
value
is NA
. If so, go to the next iteration.
value
is above 117
. If so, break and sell!
“Nothing to do here!”
.
# Loop through apple. Next if NA. Break if above 117.
for (value in apple) {
if(is.na(value)) {
print("Skipping NA")
next
}
if(value > 117) {
print("Time to sell!")
break
else {
} print("Nothing to do here!")
} }
## [1] "Nothing to do here!"
Awesome! You’ve become a master looper!
4 Functions
If data structures like data frames and vectors are how you hold your data, functions are how you tell R what to do with your data. In this chapter, you will learn about using built-in functions, creating your own unique functions, and you will finish off with a brief introduction to packages.
4.1 What are functions?
4.1.1 Function help and documentation
When you don’t know how to use a function, or don’t know what arguments
it takes, where do you turn? Luckily for you, R has built in
documentation. For example, to get help for the names()
function, you can type one of:
?names
?names()
help(names)
These all do the same thing; they take you straight to the help page for names()
!
In the DataCamp console, this takes you to the RDocumentation site to get help from there, but the information is all the same!
Below, you will explore the documentation for a few other functions.
# edited by cliex159
?
to look at the documentation for subset()
.
# Look at the documentation for subset
?subset
?
to look at the documentation for Sys.time()
.
# Look at the documentation for Sys.time
?Sys.time
Great work!
4.1.2 Optional arguments
Let’s look at some of the round()
function’s help documentation. It simply rounds a numeric vector off to a specified number of decimal places.
round(x, digits = 0)
The first argument, x
is required. Without it, the function will not work!
The argument digits
is known as an optional argument.
Optional arguments are ones that don’t have to be set by the user,
either because they are given a default value, or because the function
can infer them from the other data you have given it. Even though they
don’t have to be set, they often provide extra flexibility. Here, digits
specifies the number of decimal places to round to.
Explore the round()
function in the exercise!
# edited by cliex159
round()
on 5.4
.
# Round 5.4
round(5.4)
## [1] 5
round()
on 5.4
, specify digits = 1
.
# Round 5.4 with 1 decimal place
round(5.4, digits = 1)
## [1] 5.4
numbers
has been created for you.
<- c(.002623, pi, 812.33345) numbers
round()
on numbers
and specify digits = 3
.
# Round numbers to 3 decimal places
round(numbers, digits = 3)
## [1] 0.003 3.142 812.333
Nice job! Optional arguments are great for adding extra features or tweaks.
4.1.3 Functions in functions
To write clean code, sometimes it is useful to use functions inside of
other functions. This let’s you use the result of one function directly
in another one, without having to create an intermediate variable. You
have actually already seen an example of this with print()
and paste()
.
company <- c("Goldman Sachs", "J.P. Morgan", "Fidelity Investments")
for(i in 1:3) {
print(paste("A large financial institution is", company[i]))
}
[1] "A large financial institution is Goldman Sachs"
[1] "A large financial institution is J.P. Morgan"
[1] "A large financial institution is Fidelity Investments"
paste()
strings together the character vectors, and print()
prints it to the console.
The exercise below explores simplifying the calculation of the
correlation matrix using nested functions. Three vectors of stock
prices, apple
, ibm
, and micr
, are available for you to use.
# edited by cliex159
cbind()
them together in the order of apple
, ibm
, micr
. Save this as stocks
.
# cbind() the stocks
<- cbind(apple, ibm, micr) stocks
cor()
on stocks
.
# cor() to create the correlation matrix
cor(stocks)
## apple ibm micr
## apple 1 NA NA
## ibm NA 1 NA
## micr NA NA 1
cbind()
inside of cor()
with the 3 stock vectors in the same order as above to create the correlation matrix.
# All at once! Nest cbind() inside of cor()
cor(cbind(apple, ibm, micr))
## apple ibm micr
## apple 1 NA NA
## ibm NA 1 NA
## micr NA NA 1
Good Job!
4.2 Writing functions
4.2.1 Your first function
Time for your first function! This is a big step in an R programmer’s journey. “Functions are a fundamental building block of R: to master many of the more advanced techniques … you need a solid foundation in how functions work.” -Hadley Wickham
Here is the basic structure of a function:
func_name <- function(arguments) {
body
}
And here is an example:
square <- function(x) {
x^2
}
square(2)
[1] 4
Two things to remember from what Lore taught you are arguments and the function body. Arguments are user inputs that the function works on. They can be the data that the function manipulates, or options that affect the calculation. The body of the function is the code that actually performs the manipulation.
The value that a function returns is simply the last executed line of the function body. In the example, since x^2
is the last line of the body, that is what gets returned.
In the exercise, you will create your first function to turn a percentage into a decimal, a useful calculation in finance!
# edited by cliex159
percent_to_decimal
that takes 1 argument, percent
, and returns percent
divided by 100.
# Percent to decimal function
<- function(percent) {
percent_to_decimal / 100
percent }
percent_to_decimal()
on the percentage 6
(we aren’t using % here, but assume this is 6%).
# Use percent_to_decimal() on 6
percent_to_decimal(6)
## [1] 0.06
pct
has been created for you.
# Example percentage
<- 8 pct
percent_to_decimal()
on pct
.
# Use percent_to_decimal() on pct
percent_to_decimal(pct)
## [1] 0.08
You just created your first function! Great job!
4.2.2 Multiple arguments (1)
As you saw in the optional arguments example, functions can have multiple arguments. These can help extend the flexibility of your function. Let’s see this in action.
pow <- function(x, power = 2) {
x^power
}
pow(2)
[1] 4
pow(2, power = 3)
[1] 8
Instead of a square()
function, we now have a version that works with any power.
The power
argument is optional and has a default value of 2
,
but the user can easily change this. It is also an example of how you
can add multiple arguments. Notice how the arguments are separated by a
comma, and the default value is set using an equals sign.
Let’s add some more functionality to percent_to_decimal()
that allows you to round the percentage to a certain number of digits.
# edited by cliex159
percent_to_decimal()
function to do the following:
digits
that defaults to 2
.
percent
by 100 and assign this to decimal
.
round
function on decimal
, and set the second argument to digits
to specify the number of decimal places.
# Percent to decimal function
<- function(percent, digits = 2) {
percent_to_decimal <- percent / 100
decimal round(decimal, digits)
}
percents
has been defined for you.
# percents
<- c(25.88, 9.045, 6.23) percents
percent_to_decimal()
on percents
. Do not specify any optional arguments.
# percent_to_decimal() with default digits
percent_to_decimal(percents)
## [1] 0.26 0.09 0.06
percent_to_decimal()
on percents
again. Specify digits = 4
.
# percent_to_decimal() with digits = 4
percent_to_decimal(percents, digits = 4)
## [1] 0.2588 0.0905 0.0623
Way to go! Adding optional arguments isn’t so hard, right?
4.2.3 Multiple arguments (2)
Let’s think about a more complicated example. Do you remember present value from the Introduction to R for Finance course? If not, you can review the video for that here. The idea is that you want to discount money that you will get in the future at a specific interest rate to represent the value of that money in today’s dollars. The following general formula was developed to help with this:
present_value <- cash_flow * (1 + i / 100) ^ -year
Wouldn’t it be nice to have a function that did this calculation for you? Maybe something of the form:
present_value <- pv(cash_flow, i, year)
This function should work if you pass in numerics like pv(1500, 5, 2)
and it should work if you pass in vectors of equal length to calculate an entire present value vector at once!
The percent_to_decimal()
function is available for you to use.
# edited by cliex159
cash_flow
, i
, year
(1 + i / 100)
. Use the percent_to_decimal()
function to convert i
to a decimal.
# Present value function
<- function(cash_flow, i, year) {
pv
# Discount multiplier
<- 1 + percent_to_decimal(i)
mult
# Present value calculation
* mult ^ -year
cash_flow }
# Calculate a present value
pv(1200, 7, 3)
## [1] 979.5575
Great! This seems like a pretty useful function!
4.2.4 Function scope (1)
Scoping is the process of how R looks a variable’s value when given a name. For example, given x <- 5
, scoping is how R knows where to look to find that the value of x
is 5.
Try this scoping exercise!
percent_to_decimal <- function(percent) {
decimal <- percent / 100
decimal
}
percent_to_decimal(6)
[1] 0.06
What does typing decimal
now return?
Error
0
0.06
Great! decimal
was defined to live only inside the percent_to_decimal()
function. If you try to access decimal
outside of the scope of that function, you will get an error because it does not exist!
4.2.5 Function scope (2)
Let’s try another one. Here, hundred
is defined outside of the function scope, but is used inside of the function.
hundred <- 100
percent_to_decimal <- function(percent) {
percent / hundred
}
What will percent_to_decimal(6)
return?
Error
6
0.06
Good job! hundred
was defined outside of the percent_to_decimal()
function. When the percent_to_decimal
function came across hundred
, it first looked inside the scope of the function for hundred
, and when it couldn’t find it, it looked up one level to find where it was defined in the global scope.
4.3 Packages
4.3.1 tidyquant package
The tidyquant
package is focused on retrieving, manipulating, and scaling financial data analysis in the easiest way possible. To get the tidyquant
package and start working with it, you first have to install it.
install.packages("tidyquant")
This places it on your local computer. You then have to load it into your current R session. This gives you access to all of the functions in the package.
library(tidyquant)
These steps of installing and librarying packages are necessary for any CRAN package you want to use.
The exercise code is already written for you. You will explore some of the functions that tidyquant
has for financial analysis.
# edited by cliex159
The code is already written, but these instructions will walk you through the steps.
# Library tidquant
library(tidyquant)
tidyquant
function, tq_get()
to get the stock price data for Apple.
# Pull Apple stock data
<- tq_get("AAPL", get = "stock.prices",
apple from = "2007-01-03", to = "2017-06-05")
# Take a look at what it returned
head(apple)
## # A tibble: 6 × 8
## symbol date open high low close volume adjusted
## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 AAPL 2007-01-03 3.08 3.09 2.92 2.99 1238319600 2.56
## 2 AAPL 2007-01-04 3.00 3.07 2.99 3.06 847260400 2.61
## 3 AAPL 2007-01-05 3.06 3.08 3.01 3.04 834741600 2.59
## 4 AAPL 2007-01-08 3.07 3.09 3.05 3.05 797106800 2.61
## 5 AAPL 2007-01-09 3.09 3.32 3.04 3.31 3349298400 2.82
## 6 AAPL 2007-01-10 3.38 3.49 3.34 3.46 2952880000 2.96
# Plot the stock price over time
plot(apple$date, apple$adjusted, type = "l")
tq_mutate()
. This function “mutates” your data frame by adding a new column onto it. Here, that new column is the daily returns.
# Calculate daily stock returns for the adjusted price
<- tq_mutate(data = apple,
apple select = "adjusted",
mutate_fun = dailyReturn)
# Sort the returns from least to greatest
<- sort(apple$daily.returns) sorted_returns
# Plot them
plot(sorted_returns)
Sweet! There are over 10000 packages out there. Check them out to see what other members of the community have done!
5 Apply
A popular alternative to loops in R are the apply functions. These are often more readable than loops, and are incredibly useful for scaling the data science workflow to perform a complicated calculation on any number of observations. Learn about them here!
5.1 Why use apply?
5.1.1 lapply() on a list
The first function in the apply family that you will learn is lapply()
, which is short for “list apply.” When you have a list, and you want to apply the same function to each element of the list, lapply()
is a potential solution that always returns another list. How might this work?
Let’s look at a simple example. Suppose you want to find the length of each vector in the following list.
my_list $a [1] 2 4 5 $b [1] 10 14 5 3 4 5 6
# Using lapply
# Note that you don't need parenthesis when calling length lapply(my_list, FUN = length) $a [1] 3 $b [1] 7
As noted in the video, if at first you thought about looping over each element in the list, and using length()
at each iteration, you aren’t wrong. lapply()
is the vectorized version of this kind of loop, and is often preferred (and simpler) in the R world.
A list of daily stock returns as percentages called stock_return
and the percent_to_decimal()
function have been provided.
# edited by cliex159
= list(apple = c(0.37446342, -0.71883530, 0.76986527, 0.98226467, 0.98171665, 1.63217981, -0.57042563, 1.66813769, 0.00000000, 0.54692248, 0.12951131, 0.57773562, 0.26577503, 0.09405729, -0.65778233, 0.19778141, 0.63508411, -0.42640287, -0.02569373, -0.77957680),
stock_return ibm = c(0.1251408, -0.1124859, 0.3190691, 2.7689429, 0.3458948, 0.7014998, -0.6125390, 1.6858006, 0.1307267, -0.2907839, -0.7677657, -0.0299886, 0.5519558, -0.1610979, -0.1613578, -0.2095056, 0.2579329, -0.5683858, 0.2467056, -0.3661465),
micr = c(0.08445946, 1.63713080, -0.44835603, 2.36864053, -0.58660583, 1.57351254, 0.32273681, 1.30287920, -0.47634170, -0.15954052, -0.44742729, 2.11878010, -0.12574662, 0.00000000, 0.01573812, -0.48780488, 0.06325111, -0.45828066, -0.14287982, -1.20826709))
stock_return
.
# Print stock_return
stock_return
## $apple
## [1] 0.37446342 -0.71883530 0.76986527 0.98226467 0.98171665 1.63217981
## [7] -0.57042563 1.66813769 0.00000000 0.54692248 0.12951131 0.57773562
## [13] 0.26577503 0.09405729 -0.65778233 0.19778141 0.63508411 -0.42640287
## [19] -0.02569373 -0.77957680
##
## $ibm
## [1] 0.1251408 -0.1124859 0.3190691 2.7689429 0.3458948 0.7014998
## [7] -0.6125390 1.6858006 0.1307267 -0.2907839 -0.7677657 -0.0299886
## [13] 0.5519558 -0.1610979 -0.1613578 -0.2095056 0.2579329 -0.5683858
## [19] 0.2467056 -0.3661465
##
## $micr
## [1] 0.08445946 1.63713080 -0.44835603 2.36864053 -0.58660583 1.57351254
## [7] 0.32273681 1.30287920 -0.47634170 -0.15954052 -0.44742729 2.11878010
## [13] -0.12574662 0.00000000 0.01573812 -0.48780488 0.06325111 -0.45828066
## [19] -0.14287982 -1.20826709
lapply()
function to apply percent_to_decimal()
to each element in stock_return
.
# lapply to change percents to decimal
lapply(stock_return, FUN = percent_to_decimal)
## $apple
## [1] 0.00 -0.01 0.01 0.01 0.01 0.02 -0.01 0.02 0.00 0.01 0.00 0.01
## [13] 0.00 0.00 -0.01 0.00 0.01 0.00 0.00 -0.01
##
## $ibm
## [1] 0.00 0.00 0.00 0.03 0.00 0.01 -0.01 0.02 0.00 0.00 -0.01 0.00
## [13] 0.01 0.00 0.00 0.00 0.00 -0.01 0.00 0.00
##
## $micr
## [1] 0.00 0.02 0.00 0.02 -0.01 0.02 0.00 0.01 0.00 0.00 0.00 0.02
## [13] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 -0.01
Great work!
5.1.2 lapply() on a data frame
If, instead of a list, you had a data frame of stock returns, could you still use lapply()
? Yes! Perhaps surprisingly, data frames are actually lists under the hood, and an lapply()
call would apply the function to each column of the data frame.
df
a b
1 1 4
2 2 6
class(df)
[1] "data.frame"
lapply(df, FUN = sum)
$a
[1] 3
$b
[1] 10
lapply()
summed each column in the data frame, but still
follows its convention of always returning a list. A data frame of daily
stock returns as decimals called stock_return
has been provided.
# edited by cliex159
stock_return
to see the data frame.
# Print stock_return
stock_return
## $apple
## [1] 0.37446342 -0.71883530 0.76986527 0.98226467 0.98171665 1.63217981
## [7] -0.57042563 1.66813769 0.00000000 0.54692248 0.12951131 0.57773562
## [13] 0.26577503 0.09405729 -0.65778233 0.19778141 0.63508411 -0.42640287
## [19] -0.02569373 -0.77957680
##
## $ibm
## [1] 0.1251408 -0.1124859 0.3190691 2.7689429 0.3458948 0.7014998
## [7] -0.6125390 1.6858006 0.1307267 -0.2907839 -0.7677657 -0.0299886
## [13] 0.5519558 -0.1610979 -0.1613578 -0.2095056 0.2579329 -0.5683858
## [19] 0.2467056 -0.3661465
##
## $micr
## [1] 0.08445946 1.63713080 -0.44835603 2.36864053 -0.58660583 1.57351254
## [7] 0.32273681 1.30287920 -0.47634170 -0.15954052 -0.44742729 2.11878010
## [13] -0.12574662 0.00000000 0.01573812 -0.48780488 0.06325111 -0.45828066
## [19] -0.14287982 -1.20826709
lapply()
to get the average (mean
) of each column.
# lapply to get the average returns
lapply(stock_return, FUN = mean)
## $apple
## [1] 0.2838389
##
## $ibm
## [1] 0.1926806
##
## $micr
## [1] 0.2472939
.03%
) from it, and then divide by the standard deviation of the returns.
# Sharpe ratio
<- function(returns) {
sharpe mean(returns) - .0003) / sd(returns)
( }
lapply()
to calculate the sharpe ratio of each column.
# lapply to get the sharpe ratio
lapply(stock_return, FUN = sharpe)
## $apple
## [1] 0.3961448
##
## $ibm
## [1] 0.2366101
##
## $micr
## [1] 0.2483864
Fantastic! lapply()
row rowwise calculations can be very useful!
5.1.3 FUN arguments
Often, the function that you want to apply will have other optional arguments that you may want to tweak. Consider the percent_to_decimal()
function that allows the user to specify the number of decimal places.
percent_to_decimal(5.4, digits = 3)
[1] 0.054
In the call to lapply()
you can specify the named optional
arguments after the FUN argument, and they will get passed to the
function that you are applying.
my_list
$a
[1] 2.444 3.500
$b
[1] 1.100 2.678 3.450
lapply(my_list, FUN = percent_to_decimal, digits = 4)
$a
[1] 0.0244 0.0350
$b
[1] 0.0110 0.0268 0.0345
In the exercise, you will extend the capability of your sharpe ratio
function to allow the user to input the risk free rate as an argument,
and then use this with lapply()
. A data frame of daily stock returns as decimals called stock_return
is available.
# edited by cliex159
sharpe
to allow the input of the risk free rate as an optional argument. The default should be set at .0003
.
# Extend sharpe() to allow optional argument
<- function(returns, rf = .0003) {
sharpe mean(returns) - rf) / sd(returns)
( }
lapply()
on stock_return
to find the sharpe ratio if the risk free rate is .0004
.
# First lapply()
lapply(stock_return, FUN = sharpe, rf = .0004)
## $apple
## [1] 0.3960051
##
## $ibm
## [1] 0.2364871
##
## $micr
## [1] 0.2482859
lapply()
on stock_return
to find the sharpe ratio if the risk free rate is .0009
.
# Second lapply()
lapply(stock_return, FUN = sharpe, rf = .0009)
## $apple
## [1] 0.3953065
##
## $ibm
## [1] 0.2358721
##
## $micr
## [1] 0.247783
Nice! It is common to pass optional arguments to the function that lapply()
calls.
5.2 sapply() - simplify it!
5.2.1 sapply() vs. lapply()
lapply()
is great, but sometimes you might want the
returned data in a nicer form than a list. For instance, with the sharpe
ratio, wouldn’t it be great if the returned sharpe ratios were in a
vector rather than a list? Further analysis would likely be easier!
For this, you might want to consider sapply()
, or simplify apply. It performs exactly like lapply()
, but will attempt to simplify the output if it can. The basic syntax is the same, with a few additional arguments:
sapply(X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE)
These additional optional arguments let you specify if you want sapply()
to try and simplify the output, and if you want it to use the names of the object in the output.
In the exercise, you will recalculate sharpe ratios using sapply()
to simplify the output. stock_return
and the sharpe
function are available for you.
# edited by cliex159
lapply()
on stock_return
to get the sharpe ratio again.
# lapply() on stock_return
lapply(stock_return, FUN = sharpe)
## $apple
## [1] 0.3961448
##
## $ibm
## [1] 0.2366101
##
## $micr
## [1] 0.2483864
sapply()
on stock_return
to see the simplified sharpe ratio output.
# sapply() on stock_return
sapply(stock_return, FUN = sharpe)
## apple ibm micr
## 0.3961448 0.2366101 0.2483864
sapply()
on stock_return
to get the sharpe ratio with the arguments simplify = FALSE
and USE.NAMES = FALSE
. This is equivalent to lapply()
!
# sapply() on stock_return with optional arguments
sapply(stock_return, FUN = sharpe, simplify = FALSE, USE.NAMES = FALSE)
## $apple
## [1] 0.3961448
##
## $ibm
## [1] 0.2366101
##
## $micr
## [1] 0.2483864
Perfect! It is interesting to see how sapply()
can become lapply()
with some additional options.
5.2.2 Failing to simplify
For interactive use, sapply()
is great. It guesses the output type so that it can simplify, and normally that is fine. However, sapply()
is not a safe option to be used when writing functions. If sapply()
cannot simplify your output, then it will default to returning a list just like lapply()
. This can be dangerous and break custom functions if you wrote them expecting sapply()
to return a simplified vector.
Let’s look at an exercise using a list containing information about the stock market crash of 2008.
# edited by cliex159
The list market_crash
has been created for you.
sapply()
to get the class()
of each element in market_crash
.
# Market crash with as.Date()
<- list(dow_jones_drop = 777.68,
market_crash date = as.Date("2008-09-28"))
# Find the classes with sapply()
sapply(market_crash, class)
## dow_jones_drop date
## "numeric" "Date"
A new list, market_crash2
has been created. The difference is in the creation of the date!
lapply()
to get the class()
of each element in market_crash2
.
sapply()
to get the class()
of each element in market_crash2
.
# Market crash with as.POSIXct()
<- list(dow_jones_drop = 777.68,
market_crash2 date = as.POSIXct("2008-09-28"))
# Find the classes with lapply()
lapply(market_crash2, class)
## $dow_jones_drop
## [1] "numeric"
##
## $date
## [1] "POSIXct" "POSIXt"
# Find the classes with sapply()
sapply(market_crash2, class)
## $dow_jones_drop
## [1] "numeric"
##
## $date
## [1] "POSIXct" "POSIXt"
date
in market_crash2
has multiple classes. Why couldn’t sapply()
simplify this?
Nice job! See how sapply()
returns a list like lapply()
when it fails to simplify?
5.3 vapply() - specify your output!
5.3.1 vapply() vs. sapply()
In the last example, sapply()
failed to simplify because the date
element of market_crash2
had two classes (POSIXct and POSIXt). Notice, however, that no error
was thrown! If a function you had written expected a simplified vector
to be returned by sapply()
, this would be confusing.
To account for this, there is a more strict apply function called vapply()
, which contains an extra argument FUN.VALUE
where you can specify the type and length of the output that should be returned each time your applied function is called.
If you expected the return value of class()
to be a character vector of length 1, you can specify that using vapply()
:
vapply(market_crash, class, FUN.VALUE = character(1))
dow_jones_drop date
"numeric" "Date"
Other examples of FUN.VALUE
might be numeric(2)
or logical(1)
. market_crash2
is again defined for you.
# edited by cliex159
sapply()
again to find the class()
of market_crash2
elements. Notice how it returns a list and not an error.
# Market crash with as.POSIXct()
<- list(dow_jones_drop = 777.68,
market_crash2 date = as.POSIXct("2008-09-28"))
# Find the classes with sapply()
sapply(market_crash2, class)
## $dow_jones_drop
## [1] "numeric"
##
## $date
## [1] "POSIXct" "POSIXt"
vapply()
on market_crash2
to find the class()
. Specify FUN.VALUE = character(1)
. It should appropriately fail.
# Find the classes with vapply()
#vapply(market_crash2, class, FUN.VALUE = character(1))
Great! This is much clearer since we expected a simplified vector.
5.3.2 More vapply()
The difference between vapply()
and sapply()
was shown in the last example to demonstrate vapply()
appropriately failing, but what about when it doesn’t fail? When there are no errors, vapply()
returns a simplified result according to the FUN.VALUE
argument.
The stock_return
dataset containing daily returns for Apple, IBM, and Microsoft has been provided. The sharpe()
function is also available.
# edited by cliex159
vapply()
.
# Sharpe ratio for all stocks
vapply(stock_return, sharpe, FUN.VALUE = numeric(1))
## apple ibm micr
## 0.3961448 0.2366101 0.2483864
summary()
on the apple
column to get a 6 number summary.
# Summarize Apple
summary(stock_return$apple)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.7796 -0.1259 0.2318 0.2838 0.6688 1.6681
vapply()
the summary()
function across stock_return
to summarize each column.
# Summarize all stocks
vapply(stock_return, summary, FUN.VALUE = numeric(6))
## apple ibm micr
## Min. -0.7795768 -0.7677657 -1.20826709
## 1st Qu. -0.1258710 -0.2298252 -0.45083719
## Median 0.2317782 0.0475761 -0.06287331
## Mean 0.2838389 0.1926806 0.24729391
## 3rd Qu. 0.6687794 0.3257755 0.56777241
## Max. 1.6681377 2.7689429 2.36864053
Good job, vapply()
requires more thought when writing the function, but its robustness far outweighs that cost!
5.3.3 Anonymous functions
As a last exercise, you’ll learn about a concept called anonymous functions. So far, when calling an apply function like vapply()
, you have been passing in named functions to FUN
. Doesn’t it seem like a waste to have to create a function just for that specific vapply()
call? Instead, you can use anonymous functions!
Named function:
percent_to_decimal <- function(percent) {
percent / 100
}
Anonymous function:
function(percent) { percent / 100 }
As you can see, anonymous functions are basically functions that aren’t assigned a name. To use them in vapply()
you might do:
vapply(stock_return, FUN = function(percent) { percent / 100 },
FUN.VALUE = numeric(2))
apple ibm
[1,] 0.003744634 0.001251408
[2,] -0.007188353 -0.001124859
stock_return
is available to use.
# edited by cliex159
vapply()
to apply an anonymous function that returns a vector of the max()
and min()
(in that order) of each column of stock_return
.
# Max and min
vapply(stock_return,
FUN = function(x) { c(max(x), min(x)) },
FUN.VALUE = numeric(2))
## apple ibm micr
## [1,] 1.6681377 2.7689429 2.368641
## [2,] -0.7795768 -0.7677657 -1.208267
Congratulations! You have just completed the course!
5.4 Congratulations
5.4.1 Congratulations
Congratulations! You have completed Intermediate R for Finance. I hope that with your new knowledge of if statements, loops, and functions, you’re equipped with the skills to start working on more advanced financial analysis.
5.4.2 Popular R packages in Finance
I encourage you to check out all of the finance packages that R’s community has to offer. Check out the Empirical Finance Task View on CRAN for an extensive list. People around the world have already created some amazing projects, and hopefully, you can find one that interests you.
5.4.3 Let’s practice!
If you enjoyed this course, make sure you check out the other courses in DataCamp’s quantitative analyst track if you haven’t already. Thanks so much for working through this course with me - now get out there and put your skills to the test!