Importing and Managing Financial Data in R
Lore Dirick & Davis Vaughan
Course Description
If you’ve ever done anything with financial or economic time series, you know the data come in various shapes, sizes, and periodicities. Getting the data into R can be stressful and time-consuming, especially when you need to merge data from several different sources into one data set. This course will cover importing data from local files as well as from internet sources.
1 Introduction and downloading data
A wealth of financial and economic data are available online. Learn how getSymbols() and Quandl() make it easy to access data from a variety of sources.
1.1 Welcome to the course!
1.1.1 Introducing getSymbols()
The getSymbols()
function from the quantmod package provides a consistent interface to import data from various sources into your workspace. By default, getSymbols()
imports the data as a xts object.
This exercise will introduce you to getSymbols()
. You will use it to import QQQ data from Yahoo! Finance. QQQ is an exchange-traded fund that tracks the Nasdaq 100 index, and Yahoo! Finance is the default data source for getSymbols()
.
You use the Symbols
argument to specify the instrument (i.e. the ticker symbol) you want to import. Since Symbols
is the first argument to getSymbols()
, you usually just type the instrument name and omit Symbols =
.
library()
function.
# Load the quantmod package
library(quantmod)
getSymbols()
to import QQQ data. Make sure the data gets assigned to QQQ
(auto.assign = TRUE
).
# Import QQQ data from Yahoo! Finance
getSymbols("QQQ")
## [1] "QQQ"
str()
function to view the structure of the QQQ
object that getSymbols()
created. Note the src
and updated
attributes.
# Look at the structure of the object getSymbols created
str(QQQ)
## An 'xts' object on 2007-01-03/2022-10-21 containing:
## Data: num [1:3980, 1:6] 43.5 43.3 44 43.9 44 ...
## - attr(*, "dimnames")=List of 2
## ..$ : NULL
## ..$ : chr [1:6] "QQQ.Open" "QQQ.High" "QQQ.Low" "QQQ.Close" ...
## Indexed by objects of class: [Date] TZ: UTC
## xts Attributes:
## List of 2
## $ src : chr "yahoo"
## $ updated: POSIXct[1:1], format: "2022-10-22 13:20:14"
head()
function to view the first few rows of QQQ
.
# Look at the first few rows of QQQ
head(QQQ)
## QQQ.Open QQQ.High QQQ.Low QQQ.Close QQQ.Volume QQQ.Adjusted
## 2007-01-03 43.46 44.06 42.52 43.24 167689500 37.88621
## 2007-01-04 43.30 44.21 43.15 44.06 136853500 38.60469
## 2007-01-05 43.95 43.95 43.48 43.85 138958800 38.42067
## 2007-01-08 43.89 44.12 43.64 43.88 106401600 38.44696
## 2007-01-09 44.01 44.29 43.63 44.10 121577500 38.63972
## 2007-01-10 43.96 44.66 43.82 44.62 121070100 39.09535
Awesome! How easy was that!?
1.1.2 Data sources
In the last exercise, you imported data from Yahoo! Finance. The src
argument allows you to tell getSymbols()
to import data from a different data source.
In this exercise, you will import data from Alpha Vantage and FRED. Alpha Vantage is a source similar to Yahoo! Finance. FRED is an online database of economic time series data created and maintained by the Federal Reserve Bank of St. Louis.
getSymbols()
imports data from Yahoo! Finance by default because src = “yahoo”
by default. The src
values for Alpha Vantage and FRED are “av”
and “FRED”
, respectively.
The quantmod package has already been loaded for you.
getSymbols()
to import QQQ data from Alpha Vantage by setting src
.
# Import QQQ data from Alpha Vantage
getSymbols("QQQ", src = "av", api.key = "4EQRAZ3VA8ERX4BV")
## [1] "QQQ"
str()
to view the structure of QQQ
; note the src
attribute.
# Look at the structure of QQQ
str(QQQ)
## An 'xts' object on 2022-06-01/2022-10-21 containing:
## Data: num [1:100, 1:5] 310 305 309 311 304 ...
## - attr(*, "dimnames")=List of 2
## ..$ : NULL
## ..$ : chr [1:5] "QQQ.Open" "QQQ.High" "QQQ.Low" "QQQ.Close" ...
## Indexed by objects of class: [Date] TZ: UTC
## xts Attributes:
## List of 2
## $ src : chr "alphavantage"
## $ updated: Date[1:1], format: "2022-10-21"
getSymbols()
to import U.S. GDP data from FRED by setting src
(the symbol for U.S. GDP is “GDP”
).
# Import GDP data from FRED
getSymbols("GDP", src = "FRED")
## [1] "GDP"
str()
to view the structure of GDP
; note the src
attribute.
# Look at the structure of GDP
str(GDP)
## An 'xts' object on 1947-01-01/2022-04-01 containing:
## Data: num [1:302, 1] 243 246 250 260 266 ...
## - attr(*, "dimnames")=List of 2
## ..$ : NULL
## ..$ : chr "GDP"
## Indexed by objects of class: [Date] TZ: UTC
## xts Attributes:
## List of 2
## $ src : chr "FRED"
## $ updated: POSIXct[1:1], format: "2022-10-22 13:20:15"
There are a number of data sources out there, providing a wealth of information. Try them all!
1.1.3 Make getSymbols() return the data it retrieves
In the last exercise, getSymbols()
automatically created an object named like the symbol you provided. This exercise will teach you to make getSymbols()
return the data, so you can assign the output yourself.
There are two arguments that will make getSymbols()
return the data:
-
Set
auto.assign = FALSE
. -
Set
env = NULL
.
The two methods are functionally equivalent, but auto.assign = FALSE
describes the behavior better. Use it because you will be more likely to remember what auto.assign = FALSE
means in the future.
spy
by using the auto.assign
argument.
# Assign SPY data to 'spy' using auto.assign argument
<- getSymbols("SPY", auto.assign = FALSE) spy
str()
function to view the structure of the spy
object.
# Look at the structure of the 'spy' object
str(spy)
## An 'xts' object on 2007-01-03/2022-10-21 containing:
## Data: num [1:3980, 1:6] 142 141 141 141 141 ...
## - attr(*, "dimnames")=List of 2
## ..$ : NULL
## ..$ : chr [1:6] "SPY.Open" "SPY.High" "SPY.Low" "SPY.Close" ...
## Indexed by objects of class: [Date] TZ: UTC
## xts Attributes:
## List of 2
## $ src : chr "yahoo"
## $ updated: POSIXct[1:1], format: "2022-10-22 13:20:16"
jnj
by using the env
argument.
# Assign JNJ data to 'jnj' using env argument
<- getSymbols("JNJ", env = NULL) jnj
str()
again, but on the jnj
object.
# Look at the structure of the 'jnj' object
str(jnj)
## An 'xts' object on 2007-01-03/2022-10-21 containing:
## Data: num [1:3980, 1:6] 66.1 65.9 66.8 66.5 66.6 ...
## - attr(*, "dimnames")=List of 2
## ..$ : NULL
## ..$ : chr [1:6] "JNJ.Open" "JNJ.High" "JNJ.Low" "JNJ.Close" ...
## Indexed by objects of class: [Date] TZ: UTC
## xts Attributes:
## List of 2
## $ src : chr "yahoo"
## $ updated: POSIXct[1:1], format: "2022-10-22 13:20:17"
Great! Turning off auto.assign
is useful if you want to assign the data to an object yourself.
1.2 Introduction to Quandl
1.2.1 Introducing Quandl()
Similar to how the quantmod package provides getSymbols()
to import data from various sources, the Quandl package provides access to the Quandl databases via one simple function: Quandl()
.
Recall that getSymbols()
uses the Symbols
and src
arguments to specify the instrument and data source, respectively. The Quandl()
function specifies both the data source and the instrument via its code
argument, in the form “DATABASE/DATASET”
.
Two other ways Quandl()
differs from getSymbols()
are:
-
Quandl()
returns a data.frame by default. -
Quandl()
will not automatically assign the data to an object.
If you plan on importing a lot of data using Quandl()
, you might consider opening a free account with them in order to get an API key.
library()
function.
# Load the Quandl package
library(Quandl)
Quandl.api_key('u4L9zFqBdxLdKiczee_G')
Quandl()
function to download U.S. GDP data from FRED and assign it to an object, gdp
(the Quandl code is “FRED/GDP”
).
# Import GDP data from FRED
<- Quandl("FRED/GDP") gdp
str()
function to view the structure of the gdp
object.
# Look at the structure of the object returned by Quandl
str(gdp)
## 'data.frame': 300 obs. of 2 variables:
## $ Date : Date, format: "2021-10-01" "2021-07-01" ...
## $ Value: num 23992 23202 22741 22038 21478 ...
## - attr(*, "freq")= chr "quarterly"
Quandl provides access to a large amount of data series. Their website has documentation for it all!
1.2.2 Return data type
The Quandl()
function returns a data.frame
by default. It can return other classes via the type
argument.
The possible values for type
are:
-
“raw”
(adata.frame
), -
“ts”
(time-series objects from the stats package), -
“zoo”
, -
“xts”
, and -
“timeSeries”
(from the timeSeries package in the RMetrics suite).
In this exercise, you will learn how to use the type
argument to make Quandl()
return an xts and a zoo object.
The Quandl package has already been loaded for you.
type
argument to make Quandl()
import the GDP data as a xts object. Assign the result to gdp_xts
.
# Import GDP data from FRED as xts
<- Quandl("FRED/GDP", type = "xts") gdp_xts
str()
to view the structure of gdp_xts
.
# Look at the structure of gdp_xts
str(gdp_xts)
## An 'xts' object on 1947 Q1/2021 Q4 containing:
## Data: num [1:300, 1] 243 246 250 260 266 ...
## Indexed by objects of class: [yearqtr] TZ: UTC
## xts Attributes:
## NULL
type
argument to make Quandl()
import the GDP data as a zoo object. Assign the result to gdp_zoo
.
# Import GDP data from FRED as zoo
<- Quandl("FRED/GDP", type = "zoo") gdp_zoo
str()
to view the structure of gdp_zoo
.
# Look at the structure of gdp_zoo
str(gdp_zoo)
## 'zooreg' series from 1947 Q1 to 2021 Q4
## Data: num [1:300] 243 246 250 260 266 ...
## Index: 'yearqtr' num [1:300] 1947 Q1 1947 Q2 1947 Q3 1947 Q4 ...
## Frequency: 4
1.3 Finding data from internet sources
1.3.1 Find stock ticker from Yahoo Finance
You need the instrument identifier to import data from an internet data source. They can often be found on the data source website. In this exercise, you will search Yahoo Finance for the ticker symbol for Pfizer stock.
Note that some sources may not provide data for certain symbols, even if
you can see the data displayed on their website in tables and/or
charts. getSymbols()
will error if the data is not available for download.
symbol
from the three-letter Yahoo Finance ticker symbol for Pfizer stock. To find the symbol, go to Yahoo Finance
and search for “Pfizer”. Note that there are two parts to this ticker.
We only need what comes before the period (the period and suffix specify
other exchanges).
# Create an object containing the Pfizer ticker symbol
<- "PFE" symbol
getSymbols()
to import the data.
# Use getSymbols to import the data for 'symbol'
getSymbols(symbol)
## [1] "PFE"
head()
function to view the first few rows of the object created by getSymbols()
.
# Look at the data in the object getSymbols created
head(PFE)
## PFE.Open PFE.High PFE.Low PFE.Close PFE.Volume PFE.Adjusted
## 2007-01-03 24.81025 25.06641 24.64896 24.94307 42839619 13.26248
## 2007-01-04 25.02846 25.20873 24.94307 25.02846 33987495 13.30789
## 2007-01-05 25.18975 25.26566 24.82922 24.95256 33046589 13.26753
## 2007-01-08 24.94307 25.06641 24.56357 24.81974 45557569 13.19690
## 2007-01-09 24.89564 24.99051 24.70588 24.82922 32994627 13.20195
## 2007-01-10 24.76281 24.91461 24.66793 24.85769 36412854 13.21709
Looking up identifiers online is common when seeking data about a new instrument, so it’s good to get comfortable with the process!
1.3.2 Download exchange rate data from Oanda
Oanda.com provides historical
foreign exchange data for many currency pairs. Currency pairs are
expressed as two currencies, the “base” and the “quote”, separated by a
“/”. For example, the U.S. Dollar to Euro exchange rate would be “USD/EUR”
.
Note that getSymbols()
will automatically convert “USD/EUR”
to a valid name by removing the “/”. For example, getSymbols(“USD/EUR”)
would create an object named USDEUR
.
Also, Oanda.com only provides 180 days of historical data. getSymbols()
will warn and return as much data as possible if you request data from more than 180 days ago. You can use the from
and to
arguments to set a date range; both should be strings in “%Y-%m-%d”
format (e.g. “2016-02-06”
).
currency_pair
with the symbols for
the British Pound and the Canadian Dollar so we can use this to get
British Pound to Canadian Dollar exchange rate data. quantmod::oanda.currencies
contains a list of currencies provided by Oanda.com.
# Create a currency_pair object
<- "GBP/CAD" currency_pair
getSymbols()
to load data for currency_pair
. Remember to specify src
!
# Load British Pound to Canadian Dollar exchange rate data
getSymbols(currency_pair, src = "oanda")
## [1] "GBP/CAD"
str()
to examine the data getSymbols()
created. Remember, it will remove the /
!
# Examine object using str()
str(GBPCAD)
## An 'xts' object on 2022-04-26/2022-10-21 containing:
## Data: num [1:179, 1] 1.62 1.61 1.6 1.6 1.62 ...
## - attr(*, "dimnames")=List of 2
## ..$ : NULL
## ..$ : chr "GBP.CAD"
## Indexed by objects of class: [Date] TZ: UTC
## xts Attributes:
## List of 2
## $ src : chr "oanda"
## $ updated: POSIXct[1:1], format: "2022-10-22 13:20:20"
Sys.Date()
will give you today’s date.
# Try to load data from 190 days ago
getSymbols(currency_pair, from = Sys.Date() - 190, to = Sys.Date(), src = "oanda")
## [1] "GBP/CAD"
Easy enough, right? Did you notice what happened when you tried to get data further back than 180 days?
1.3.3 Find and import Unemployment Rate data from FRED
Both getSymbols()
and Quandl()
provide access to the FRED database.
In this exercise, you will find the FRED symbol for the United States
civilian unemployment rate. Then you will use the series name to
download the data directly from FRED using getSymbols()
, and from the Quandl database using Quandl()
.
Remember that getSymbols()
specifies the data source using the src
argument and that Quandl()
specifies it as part of the Quandl code (i.e. database/series).
Both the quantmod and Quandl packages have been loaded for you, so you just need to find the symbol and call getSymbols()
and Quandl()
.
series_name
. To find the FRED series name, go to the FRED website and search for “United States civilian unemployment rate”.
# Create a series_name object
<- "UNRATE" series_name
getSymbols()
to load the data. Remember to set the src
argument!
# Load the data using getSymbols
getSymbols(series_name, src = "FRED")
## [1] "UNRATE"
quandl_code
(prepend the Quandl database name (FRED) to the series_name
).
# Create a quandl_code object
<- "FRED/UNRATE" quandl_code
Quandl()
to load the data. Assign the data to an object named unemploy_rate
. Remember to include the database in the Quandl code!
# Load the data using Quandl
<- Quandl(quandl_code) unemploy_rate
FRED contains a huge number of economic data series. Be sure to check them out for more information!
2 Extracting and transforming data
You’ve learned how to import data from online sources, now it’s time to see how to extract columns from the imported data. After you’ve learned how to extract columns from a single object, you will explore how to import, transform, and extract data from multiple instruments.
2.1 Extracting columns from financial time series
2.1.1 Extract one column from one instrument
The quantmod package provides several helper functions to extract specific columns from an object, based on the column name. The Op()
, Hi()
, Lo()
, Cl()
, Vo()
, and Ad()
functions can be used to extract the open, high, low, close, volume, and adjusted close column, respectively.
In this exercise, you will use two of these functions on an xts object named DC
. The DC
object contains fictitious DataCamp OHLC (open, high, low, close) stock
prices created by randomizing some real financial market data. DC
is similar to the xts objects created by getSymbols()
.
While it’s not necessary to complete the exercise, you can learn more about all the extractor functions from help(“OHLC.Transformations”)
.
# edited by cliex159
<- "DC"
symbol
# Use getSymbols to import the data for 'symbol'
getSymbols(symbol)
## [1] "DC"
dc_close
that contains only the close column of DC
.
# Extract the close column
<- Cl(DC) dc_close
dc_close
.
# Look at the head of dc_close
head(dc_close)
## DC.Close
## 2022-04-05 6.90
## 2022-04-06 5.41
## 2022-04-07 5.09
## 2022-04-08 5.06
## 2022-04-11 5.10
## 2022-04-12 4.94
DC
and assign it to an object named dc_volume
.
# Extract the volume column
<- Vo(DC) dc_volume
dc_volume
.
# Look at the head of dc_volume
head(dc_volume)
## DC.Volume
## 2022-04-05 829300
## 2022-04-06 1675200
## 2022-04-07 836700
## 2022-04-08 520700
## 2022-04-11 411400
## 2022-04-12 252900
Great! Now let’s look at how to extract multiple columns from one instrument.
2.1.2 Extract multiple columns from one instrument
The quantmod package provides functions to extract a single column, and also has functions to extract specific sets of columns.
Recall OHLC stands for open, high, low, close. Now you can guess which columns the OHLC()
and HLC()
functions extract. There’s also an OHLCV()
function, which adds the volume column.
These functions are helpful when you need to pass a set of columns to another function. For example, you might need to pass the high, low, and close columns (in that order) to a technical indicator function.
In this exercise, you will use two of these extractor functions on the same DC
object you used in the last exercise.
dc_hlc
that contains the high, low, and close columns of DC
.
# Extract the high, low, and close columns
<- HLC(DC) dc_hlc
dc_hlc
.
# Look at the head of dc_hlc
head(dc_hlc)
## DC.High DC.Low DC.Close
## 2022-04-05 8.47 5.02 6.90
## 2022-04-06 8.00 5.08 5.41
## 2022-04-07 5.20 4.41 5.09
## 2022-04-08 5.30 4.88 5.06
## 2022-04-11 5.58 4.97 5.10
## 2022-04-12 5.15 4.92 4.94
DC
and assign the result to an object named dc_ohlcv
.
# Extract the open, high, low, close, and volume columns
<- OHLCV(DC) dc_ohlcv
dc_ohlcv
.
# Look at the head of dc_ohlcv
head(dc_ohlcv)
## DC.Open DC.High DC.Low DC.Close DC.Volume
## 2022-04-05 5.02 8.47 5.02 6.90 829300
## 2022-04-06 7.81 8.00 5.08 5.41 1675200
## 2022-04-07 5.05 5.20 4.41 5.09 836700
## 2022-04-08 5.03 5.30 4.88 5.06 520700
## 2022-04-11 5.10 5.58 4.97 5.10 411400
## 2022-04-12 5.15 5.15 4.92 4.94 252900
Good job! There is also a more flexible function called getPrice()
. Let’s look at that next.
2.1.3 Use getPrice to extract other columns
The extractor functions you learned in the previous two exercises do not cover all use cases. Sometimes you might have one object that contains the same price column for multiple instruments. Other times, you might have an object with price data (e.g. bid, ask, trade) that do not have an explicit extractor function.
The getPrice()
function in the quantmod package can extract any column by name by using the prefer
argument. It can also extract columns for a specific instrument by using the symbol
argument, which is useful when an object contains several instruments with the same price type.
You can use regular expressions for both the prefer
and symbol
arguments, because they are passed to the base::grep()
function internally.
Quandl()
call to import CME futures data for CL and BZ into an xts object, oil_data
.
# Download CME data for CL and BZ as an xts object
<- Quandl(code = c("CHRIS/CME_CL1", "CHRIS/CME_BZ1"), type = "xts") oil_data
colnames()
to view the column names of the oil_data
object.
# Look at the column names of the oil_data object
colnames(oil_data)
## [1] "CHRIS.CME_CL1 - Open"
## [2] "CHRIS.CME_CL1 - High"
## [3] "CHRIS.CME_CL1 - Low"
## [4] "CHRIS.CME_CL1 - Last"
## [5] "CHRIS.CME_CL1 - Change"
## [6] "CHRIS.CME_CL1 - Settle"
## [7] "CHRIS.CME_CL1 - Volume"
## [8] "CHRIS.CME_CL1 - Previous Day Open Interest"
## [9] "CHRIS.CME_BZ1 - Open"
## [10] "CHRIS.CME_BZ1 - High"
## [11] "CHRIS.CME_BZ1 - Low"
## [12] "CHRIS.CME_BZ1 - Last"
## [13] "CHRIS.CME_BZ1 - Change"
## [14] "CHRIS.CME_BZ1 - Settle"
## [15] "CHRIS.CME_BZ1 - Volume"
## [16] "CHRIS.CME_BZ1 - Previous Day Open Interest"
getPrice()
call to extract the Open price for CL into cl_open
. Set prefer = “Open$”
to match “Open”
and not “Open Interest”
.
# Extract the Open price for CL1
<- getPrice(oil_data, symbol = "CL1", prefer = "Open$") cl_open
cl_open
for the month of January, 2016 (don’t use a range, and you don’t need to specify any days).
# Look at January, 2016 using xts' ISO-8601 subsetting
"2016-01"] cl_open[
## CHRIS.CME_CL1 - Open
## 2016-01-04 37.60
## 2016-01-05 36.90
## 2016-01-06 36.18
## 2016-01-07 34.09
## 2016-01-08 33.30
## 2016-01-11 32.94
## 2016-01-12 30.44
## 2016-01-13 30.48
## 2016-01-14 30.60
## 2016-01-15 31.18
## 2016-01-19 29.20
## 2016-01-20 28.33
## 2016-01-21 28.35
## 2016-01-22 29.84
## 2016-01-25 32.05
## 2016-01-26 29.81
## 2016-01-27 30.55
## 2016-01-28 32.19
## 2016-01-29 33.70
Good job! getPrice()
is a very flexible way to retrieve the columns you need.
2.2 Importing and transforming multiple instruments
2.2.1 Use Quandl to download quarterly returns data
Sometimes you need to aggregate and/or transform raw data before you can continue your analysis. The Quandl()
function allows you to specify common aggregations and transformations via the collapse
and/or transform
arguments. The Quandl API takes care of the details for you.
You will use the same CL and BZ data from the CME as last exercise, but
this time you will apply some aggregations and transformations on the
raw data.
The quandl codes for CL and BZ are provided in the object quandl_codes
.
# edited by cliex159
<- c("CHRIS/CME_CL1", "CHRIS/CME_BZ1") quandl_codes
collapse
and type
arguments with Quandl()
to download quarterly data as an xts object named qtr_price
.
# Download quarterly CL and BZ prices
<- Quandl(quandl_codes, type = "xts", collapse = "quarterly") qtr_price
# View the high prices for both series
Hi(qtr_price)
## CHRIS.CME_CL1 - High CHRIS.CME_BZ1 - High
## 1983 Q1 29.60 NA
## 1983 Q2 31.40 NA
## 1983 Q3 30.55 NA
## 1983 Q4 29.80 NA
## 1984 Q1 30.85 NA
## 1984 Q2 29.80 NA
## 1984 Q3 29.68 NA
## 1984 Q4 26.62 NA
## 1985 Q1 28.33 NA
## 1985 Q2 27.14 NA
## 1985 Q3 29.10 NA
## 1985 Q4 26.60 NA
## 1986 Q1 11.20 NA
## 1986 Q2 13.00 NA
## 1986 Q3 15.15 NA
## 1986 Q4 18.00 NA
## 1987 Q1 18.85 NA
## 1987 Q2 20.53 NA
## 1987 Q3 19.70 NA
## 1987 Q4 16.90 NA
## 1988 Q1 17.10 NA
## 1988 Q2 15.49 NA
## 1988 Q3 13.82 NA
## 1988 Q4 17.28 NA
## 1989 Q1 21.48 NA
## 1989 Q2 20.38 NA
## 1989 Q3 20.14 NA
## 1989 Q4 21.91 NA
## 1990 Q1 20.30 NA
## 1990 Q2 17.31 NA
## 1990 Q3 39.95 NA
## 1990 Q4 28.70 NA
## 1991 Q1 19.72 NA
## 1991 Q2 20.71 NA
## 1991 Q3 22.32 NA
## 1991 Q4 19.15 NA
## 1992 Q1 19.49 NA
## 1992 Q2 22.15 NA
## 1992 Q3 21.76 NA
## 1992 Q4 19.65 NA
## 1993 Q1 20.47 NA
## 1993 Q2 19.14 NA
## 1993 Q3 18.95 NA
## 1993 Q4 14.48 NA
## 1994 Q1 14.90 NA
## 1994 Q2 19.50 NA
## 1994 Q3 18.42 NA
## 1994 Q4 17.84 NA
## 1995 Q1 19.35 NA
## 1995 Q2 17.59 NA
## 1995 Q3 17.64 NA
## 1995 Q4 19.57 NA
## 1996 Q1 21.65 NA
## 1996 Q2 21.25 NA
## 1996 Q3 24.62 NA
## 1996 Q4 25.96 NA
## 1997 Q1 20.88 NA
## 1997 Q2 19.90 NA
## 1997 Q3 21.48 NA
## 1997 Q4 17.84 NA
## 1998 Q1 15.89 NA
## 1998 Q2 14.55 NA
## 1998 Q3 16.18 NA
## 1998 Q4 12.14 NA
## 1999 Q1 17.05 NA
## 1999 Q2 19.37 NA
## 1999 Q3 24.70 NA
## 1999 Q4 26.55 NA
## 2000 Q1 27.15 NA
## 2000 Q2 32.95 NA
## 2000 Q3 30.95 NA
## 2000 Q4 26.95 NA
## 2001 Q1 26.64 NA
## 2001 Q2 26.50 NA
## 2001 Q3 23.49 NA
## 2001 Q4 20.43 NA
## 2002 Q1 26.38 NA
## 2002 Q2 26.99 NA
## 2002 Q3 30.90 NA
## 2002 Q4 31.35 NA
## 2003 Q1 31.30 NA
## 2003 Q2 30.21 NA
## 2003 Q3 29.38 NA
## 2003 Q4 33.20 NA
## 2004 Q1 36.00 NA
## 2004 Q2 37.10 NA
## 2004 Q3 50.10 NA
## 2004 Q4 43.80 NA
## 2005 Q1 56.10 NA
## 2005 Q2 57.90 NA
## 2005 Q3 67.03 NA
## 2005 Q4 61.25 NA
## 2006 Q1 67.14 NA
## 2006 Q2 74.15 NA
## 2006 Q3 63.10 NA
## 2006 Q4 61.20 NA
## 2007 Q1 66.70 NA
## 2007 Q2 71.00 NA
## 2007 Q3 83.76 NA
## 2007 Q4 96.78 NA
## 2008 Q1 106.78 NA
## 2008 Q2 143.67 NA
## 2008 Q3 102.46 NA
## 2008 Q4 45.54 NA
## 2009 Q1 50.00 NA
## 2009 Q2 73.38 NA
## 2009 Q3 70.72 NA
## 2009 Q4 80.00 NA
## 2010 Q1 83.85 NA
## 2010 Q2 76.83 NA
## 2010 Q3 80.18 NA
## 2010 Q4 92.06 NA
## 2011 Q1 106.83 NA
## 2011 Q2 95.82 NA
## 2011 Q3 83.23 NA
## 2011 Q4 100.16 107.27
## 2012 Q1 104.15 124.04
## 2012 Q2 85.34 98.28
## 2012 Q3 92.71 113.35
## 2012 Q4 91.99 111.31
## 2013 Q1 97.35 110.08
## 2013 Q2 97.82 103.43
## 2013 Q3 102.76 108.66
## 2013 Q4 99.39 111.49
## 2014 Q1 101.97 108.33
## 2014 Q2 105.76 113.10
## 2014 Q3 94.90 97.79
## 2014 Q4 54.02 57.78
## 2015 Q1 48.73 56.37
## 2015 Q2 59.69 63.82
## 2015 Q3 45.85 49.00
## 2015 Q4 37.79 37.95
## 2016 Q1 39.04 40.02
## 2016 Q2 49.62 50.30
## 2016 Q3 48.30 49.28
## 2016 Q4 54.09 56.13
## 2017 Q1 50.85 53.32
## 2017 Q2 46.35 48.20
## 2017 Q3 51.77 57.79
## 2017 Q4 60.51 66.78
## 2018 Q1 65.26 70.43
## 2018 Q2 74.46 79.70
## 2018 Q3 73.73 82.85
## 2018 Q4 46.53 53.80
## 2019 Q1 60.73 68.89
## 2019 Q2 59.80 66.80
## 2019 Q3 56.57 62.09
## 2019 Q4 61.88 68.96
## 2020 Q1 21.89 23.86
## 2020 Q2 40.08 41.62
## 2020 Q3 40.37 40.97
## 2020 Q4 48.58 51.91
## 2021 Q1 61.17 64.77
## 2021 Q2 73.81 72.88
qtr_return
object that contains the % change of the raw data. See the Quandl API documentation for how to specify row-on-row % change.
# Download quarterly CL and BZ returns
<- Quandl(quandl_codes, type = "xts", collapse = "quarterly", transform = "rdiff") qtr_return
getPrice()
to view the Settle returns.
# View the settle price returns for both series
getPrice(qtr_return, prefer = "Settle")
## CHRIS.CME_CL1 - Settle CHRIS.CME_BZ1 - Settle
## 1983 Q2 0.071355411 NA
## 1983 Q3 -0.032504780 NA
## 1983 Q4 -0.025032938 NA
## 1984 Q1 0.042229730 NA
## 1984 Q2 -0.035656402 NA
## 1984 Q3 -0.003025210 NA
## 1984 Q4 -0.109575185 NA
## 1985 Q1 0.071185157 NA
## 1985 Q2 -0.050194415 NA
## 1985 Q3 0.082247860 NA
## 1985 Q4 -0.095598349 NA
## 1986 Q1 -0.603802281 NA
## 1986 Q2 0.226487524 NA
## 1986 Q3 0.155712050 NA
## 1986 Q4 0.214624238 NA
## 1987 Q1 0.049609810 NA
## 1987 Q2 0.077535847 NA
## 1987 Q3 -0.034499754 NA
## 1987 Q4 -0.147524247 NA
## 1988 Q1 0.022754491 NA
## 1988 Q2 -0.112412178 NA
## 1988 Q3 -0.118073879 NA
## 1988 Q4 0.289454001 NA
## 1989 Q1 0.171113689 NA
## 1989 Q2 0.003962358 NA
## 1989 Q3 -0.006906759 NA
## 1989 Q4 0.083954297 NA
## 1990 Q1 -0.070577452 NA
## 1990 Q2 -0.158284024 NA
## 1990 Q3 1.314586995 NA
## 1990 Q4 -0.280182232 NA
## 1991 Q1 -0.309774965 NA
## 1991 Q2 0.047376465 NA
## 1991 Q3 0.081225681 NA
## 1991 Q4 -0.139901035 NA
## 1992 Q1 0.016736402 NA
## 1992 Q2 0.111111111 NA
## 1992 Q3 0.005092593 NA
## 1992 Q4 -0.101796407 NA
## 1993 Q1 0.048205128 NA
## 1993 Q2 -0.077788650 NA
## 1993 Q3 -0.003183024 NA
## 1993 Q4 -0.245875466 NA
## 1994 Q1 0.043754411 NA
## 1994 Q2 0.309668695 NA
## 1994 Q3 -0.050593702 NA
## 1994 Q4 -0.034257749 NA
## 1995 Q1 0.079391892 NA
## 1995 Q2 -0.092331768 NA
## 1995 Q3 0.008045977 NA
## 1995 Q4 0.114595211 NA
## 1996 Q1 0.098209719 NA
## 1996 Q2 -0.025617140 NA
## 1996 Q3 0.165391969 NA
## 1996 Q4 0.063166530 NA
## 1997 Q1 -0.212577160 NA
## 1997 Q2 -0.029887310 NA
## 1997 Q3 0.069696970 NA
## 1997 Q4 -0.167138810 NA
## 1998 Q1 -0.115079365 NA
## 1998 Q2 -0.091607944 NA
## 1998 Q3 0.138222849 NA
## 1998 Q4 -0.253407683 NA
## 1999 Q1 0.390871369 NA
## 1999 Q2 0.150954654 NA
## 1999 Q3 0.270606532 NA
## 1999 Q4 0.044471644 NA
## 2000 Q1 0.050781250 NA
## 2000 Q2 0.208178439 NA
## 2000 Q3 -0.051076923 NA
## 2000 Q4 -0.130998703 NA
## 2001 Q1 -0.019029851 NA
## 2001 Q2 -0.001521491 NA
## 2001 Q3 -0.107428571 NA
## 2001 Q4 -0.153222364 NA
## 2002 Q1 0.326108871 NA
## 2002 Q2 0.020904599 NA
## 2002 Q3 0.133655994 NA
## 2002 Q4 0.024630542 NA
## 2003 Q1 -0.005128205 NA
## 2003 Q2 -0.027384021 NA
## 2003 Q3 -0.032792315 NA
## 2003 Q4 0.113698630 NA
## 2004 Q1 0.099630996 NA
## 2004 Q2 0.036073826 NA
## 2004 Q3 0.339811066 NA
## 2004 Q4 -0.124697824 NA
## 2005 Q1 0.275028769 NA
## 2005 Q2 0.019855596 NA
## 2005 Q3 0.172389381 NA
## 2005 Q4 -0.078502415 NA
## 2006 Q1 0.091579292 NA
## 2006 Q2 0.109560258 NA
## 2006 Q3 -0.149059922 NA
## 2006 Q4 -0.029566047 NA
## 2007 Q1 0.078951679 NA
## 2007 Q2 0.073022620 NA
## 2007 Q3 0.155348048 NA
## 2007 Q4 0.175361254 NA
## 2008 Q1 0.058345489 NA
## 2008 Q2 0.378224060 NA
## 2008 Q3 -0.281142857 NA
## 2008 Q4 -0.556836248 NA
## 2009 Q1 0.113452915 NA
## 2009 Q2 0.407370117 NA
## 2009 Q3 0.010301903 NA
## 2009 Q4 0.123920125 NA
## 2010 Q1 0.055443548 NA
## 2010 Q2 -0.097063037 NA
## 2010 Q3 0.057384636 NA
## 2010 Q4 0.142678504 NA
## 2011 Q1 0.167870431 NA
## 2011 Q2 -0.105884558 NA
## 2011 Q3 -0.169985328 NA
## 2011 Q4 0.247853535 NA
## 2012 Q1 0.042396034 0.144347178
## 2012 Q2 -0.175305766 -0.204101562
## 2012 Q3 0.085098870 0.149182004
## 2012 Q4 -0.004013450 -0.011388914
## 2013 Q1 0.058919625 -0.009810098
## 2013 Q2 -0.006890877 -0.071441556
## 2013 Q3 0.059755592 0.060787001
## 2013 Q4 -0.038209714 0.022423180
## 2014 Q1 0.032107295 -0.027436823
## 2014 Q2 0.037310494 0.042687454
## 2014 Q3 -0.134858119 -0.157440370
## 2014 Q4 -0.415642826 -0.394422732
## 2015 Q1 -0.106438896 -0.038723182
## 2015 Q2 0.249369748 0.153874070
## 2015 Q3 -0.241802590 -0.239345809
## 2015 Q4 -0.178531825 -0.229274344
## 2016 Q1 0.035097192 0.062231760
## 2016 Q2 0.260563380 0.254545455
## 2016 Q3 -0.001862197 -0.012479871
## 2016 Q4 0.113598673 0.144109254
## 2017 Q1 -0.058078928 -0.058792090
## 2017 Q2 -0.090118577 -0.092939618
## 2017 Q3 0.122284970 0.200751252
## 2017 Q4 0.169343913 0.157455683
## 2018 Q1 0.074809666 0.055105105
## 2018 Q2 0.141823221 0.130496656
## 2018 Q3 -0.012137559 0.041289023
## 2018 Q4 -0.380068259 -0.367988395
## 2019 Q1 0.324377890 0.308148432
## 2019 Q2 -0.027768540 -0.026904518
## 2019 Q3 -0.075252266 -0.086701728
## 2019 Q4 0.129276863 0.125041132
## 2020 Q1 -0.664592204 -0.667446622
## 2020 Q2 0.917480469 0.809586631
## 2020 Q3 0.024191495 -0.004860267
## 2020 Q4 0.206364993 0.264957265
## 2021 Q1 0.219291014 0.226640927
## 2021 Q2 0.233603786 0.136606862
Quick Quandling! Quandl()
simplifies aggregating and transforming price data.
2.2.2 Extract the Close column from many instruments
The previous exercise taught you how to use do.call(merge, eapply(env, fun))
to apply a function to each object in an environment and then combine all the results into one object.
Let’s use what you learned to solve a very common problem. Often you will need to load similar data for many instruments, extract a column, and create one object that contains that specific column for every instrument.
new.env()
to create a new environment named data_env
.
# Symbols
<- c("AAPL", "MSFT", "IBM")
symbols
# Create new environment
<- new.env() data_env
getSymbols()
to load data into data_env
by using the env
argument.
# Load symbols into data_env
getSymbols(symbols, env = data_env)
## [1] "AAPL" "MSFT" "IBM"
# Extract the close column from each object and combine into one xts object
<- do.call(merge, eapply(data_env, Cl)) close_data
close_data
.
# View the head of close_data
head(close_data)
## AAPL.Close IBM.Close MSFT.Close
## 2007-01-03 2.992857 92.99236 29.86
## 2007-01-04 3.059286 93.98662 29.81
## 2007-01-05 3.037500 93.13576 29.64
## 2007-01-08 3.052500 94.55067 29.93
## 2007-01-09 3.306071 95.66921 29.96
## 2007-01-10 3.464286 94.54111 29.66
Great job!
3 Managing data from multiple sources
Learn how to simplify and streamline your workflow by taking advantage of the ability to customize default arguments to getSymbols()
.
You will see how to customize defaults by data source, and then how to
customize defaults by symbol. You will also learn how to handle
problematic instrument symbols.
3.1 Setting default arguments for getSymbols()
3.1.1 Set a default data source
Recall that getSymbols()
imports from Yahoo Finance by default. This exercise will teach you how to change the default data source with the setDefaults()
function.
The first argument to setDefaults()
is the function you
want to update, and the remaining arguments are name = value pairs of
the arguments you want to update and the new default value.
Note that this only works with getSymbols()
because getSymbols()
actively checks to see if you want to use a different default value.
src
argument to pull from Alpha Vantage instead of Yahoo by default.
# Set the default to pull data from Alpha Vantage
setDefaults(getSymbols, src = "av")
getSymbols()
to pull data for “GOOG”
.
# Load GOOG data
getSymbols("GOOG", api.key = "4EQRAZ3VA8ERX4BV")
## [1] "GOOG"
str()
to verify that src
attribute on GOOG
is “alphavantage”
.
# Verify the data was actually pulled from Alpha Vantage
str(GOOG)
## An 'xts' object on 2022-06-01/2022-10-21 containing:
## Data: num [1:100, 1:5] 2299 2284 2320 2335 2313 ...
## - attr(*, "dimnames")=List of 2
## ..$ : NULL
## ..$ : chr [1:5] "GOOG.Open" "GOOG.High" "GOOG.Low" "GOOG.Close" ...
## Indexed by objects of class: [Date] TZ: UTC
## xts Attributes:
## List of 2
## $ src : chr "alphavantage"
## $ updated: Date[1:1], format: "2022-10-21"
Great! Setting a default source can be useful if you use that source often.
3.1.2 Set default arguments for a getSymbols source
You can also use setDefaults()
on individual getSymbols()
source methods. This exercise will teach you how to change the default value for the from
argument to getSymbols.yahoo()
.
You can find the arguments for a specific method by using help()
(e.g. help(“getSymbols.yahoo”)
or by calling args()
to print them to the console (e.g. args(getSymbols.yahoo)
). Calling getDefaults()
will show you the current default values (if there are any).
Remember, you are not supposed to call getSymbols.yahoo()
directly!
args()
function to print all the default arguments to getSymbols.yahoo()
.
# Look at getSymbols.yahoo arguments
args(getSymbols.yahoo)
## function (Symbols, env, return.class = "xts", index.class = "Date",
## from = "2007-01-01", to = Sys.Date(), ..., periodicity = "daily",
## curl.options = list())
## NULL
from
for getSymbols.yahoo()
to “2000-01-01”.
# Set default 'from' value for getSymbols.yahoo
setDefaults(getSymbols.yahoo, from = "2000-01-01")
getDefaults()
to confirm that the from
default was set correctly.
# Confirm defaults were set correctly
getDefaults("getSymbols.yahoo")
## $from
## [1] "'2000-01-01'"
Awesome! Now you know how to set default arguments for specific getSymbols()
sources.
3.2 Setting per-instrument default arguments
3.2.1 Set default data source for one symbol
Changing the default source for one instrument is useful if multiple
sources use the same symbol for different instruments. For example, getSymbols(“CP”, src = “yahoo”)
would load Canadian Pacific Railway data from the New York Stock Exchange. But getSymbols(“CP”, src = “FRED”)
would load Corporate Profits After Tax from the U.S. Bureau of Economic Analysis.
You can use setSymbolLookup()
to specify the default data source for an instrument. In this exercise, you will learn how to make getSymbols(“CP”)
load the corporate profit data from FRED instead of the railway stock data from Yahoo Finance.
setSymbolLookup()
can take any number of name = value
pairs, where name
is the symbol and value
is a named list of getSymbols()
arguments for that one symbol.
# edited by cliex159
getSymbols("CP", , api.key = "4EQRAZ3VA8ERX4BV")
## [1] "CP"
head()
to look at the first few rows of data for CP
, which has been loaded for you.
# Look at the first few rows of CP
head(CP)
## CP.Open CP.High CP.Low CP.Close CP.Volume
## 2022-06-01 71.91 73.620 71.410 73.05 2949267
## 2022-06-02 73.17 76.550 73.170 76.36 3651456
## 2022-06-03 75.73 76.080 74.685 75.18 2178119
## 2022-06-06 75.99 76.460 75.420 75.69 1993966
## 2022-06-07 74.87 76.475 74.240 76.41 1932792
## 2022-06-08 75.78 75.820 73.950 74.30 1812080
setSymbolLookup()
to use FRED as the source for CP
.
# Set the source for CP to FRED
setSymbolLookup(CP = list(src = "FRED"))
getSymbols()
to load the data again.
# Load CP data again
getSymbols("CP", api.key = "4EQRAZ3VA8ERX4BV")
## [1] "CP"
CP
and compare the output to the output from the first instruction.
# Look at the first few rows of CP
head(CP)
## CP.Open CP.High CP.Low CP.Close CP.Volume
## 2022-06-01 71.91 73.620 71.410 73.05 2949267
## 2022-06-02 73.17 76.550 73.170 76.36 3651456
## 2022-06-03 75.73 76.080 74.685 75.18 2178119
## 2022-06-06 75.99 76.460 75.420 75.69 1993966
## 2022-06-07 74.87 76.475 74.240 76.41 1932792
## 2022-06-08 75.78 75.820 73.950 74.30 1812080
Great! Occasionally this happens, and it is useful to set a single symbol to be pulled from a specific source.
3.2.2 Save and load symbol lookup table
The previous exercise taught you how to set default arguments on a per-symbol basis, but those settings only last for the current session.
This exercise will teach you how to save and load symbol-based defaults by using saveSymbolLookup()
and loadSymbolLookup()
, respectively. You can use the file
arguments to specify a file to store your defaults.
You can also use the getSymbolLookup()
function to check per-symbol defaults before you try to load data using getSymbols()
.
The source for CP
has already been set to “FRED”
.
saveSymbolLookup()
to save the symbol lookup table to a file named “my_symbol_lookup.rda”
.
# Save symbol lookup table to file
saveSymbolLookup(file = "my_symbol_lookup.rda")
CP
to “yahoo”
using what you learned in the prior exercise.
# Set default source for CP to "yahoo"
setSymbolLookup(CP = "yahoo")
getSymbolLookup()
to verify that the default source for CP
actually changed from “FRED”
to “yahoo”
.
# Verify the default source is "yahoo" for "CP"
getSymbolLookup("CP")
## $CP
## $CP$src
## [1] "yahoo"
loadSymbolLookup()
to load the saved symbol lookup table.
# Load symbol lookup table from file
loadSymbolLookup(file = "my_symbol_lookup.rda")
Awesome! This will let you load the same lookup table even if you close out of R.
3.2.3 How not to specify the getSymbols() source
There are several ways you can specify the data source when you use getSymbols()
. Which of the following is a way you should not use getSymbols()
to pull data from a specific source?
- Use the
src
argument. - Use
setSymbolLookup()
. - Call
getSymbols.source
directly.
Correct! The methods of getSymbols()
should not be called directly.
3.3 Handling instrument symbols that clash or are not valid R names
3.3.1 Access the object using get() or backticks
At some point, you might download data for an instrument that does not have a syntactically valid name. Syntactically valid names contain letters, numbers, “.”, and “_“, and must start with a letter or a”.” followed by a non-number.
For example, the symbol for Berkshire Hathaway class A shares is “BRK-A”, which is not a syntactically valid name because it contains a “-” character. Another example are Chinese stocks, which have symbols composed of numbers. The Yahoo Finance symbol for the SSE Composite Index is “000001.SS”.
You can use the get
function or backticks (`
) to access objects that do not have syntactically valid names.
getSymbols()
to load data for “BRK-A”.
# Load BRK-A data
getSymbols("BRK-A", api.key = "4EQRAZ3VA8ERX4BV")
## [1] "BRK-A"
`
) and head()
to look at the first few rows of data.
# Use backticks (`) and head() to look at the loaded data
head(`BRK-A`)
## BRK-A.Open BRK-A.High BRK-A.Low BRK-A.Close BRK-A.Volume
## 2022-06-01 476594.3 476594.3 464900.0 469620.0 2143
## 2022-06-02 472134.3 472886.0 463312.3 472710.0 2056
## 2022-06-03 469925.1 471789.2 465850.0 467650.0 1949
## 2022-06-06 472899.5 473900.0 467632.3 468400.0 2003
## 2022-06-07 467649.5 471499.9 464551.0 471499.9 1808
## 2022-06-08 472089.5 472089.5 462300.5 463300.0 2059
get()
to assign the data loaded by getSymbols()
to an object named BRK.A
.
# Use get() to assign the BRK-A data to an object named BRK.A
<- get("BRK-A") BRK.A
Great! Just remember to use backticks or get()
if you ever run into invalid characters.
3.3.2 Create valid name for one instrument
If you are only downloading data for a single symbol and that symbol is not a syntactically valid name, you can set auto.assign = FALSE
in your call to getSymbols()
. That will allow you to directly assign the output to a syntactically valid name.
You may also want to convert the column names to syntactically valid
names. That is a good idea if you plan to use the data in functions that
expect column names to be syntactically valid names (e.g. lm()
).
getSymbols()
and its auto.assign
argument to create an object named BRK.A
containing data for “BRK-A”.
# Create BRK.A object
setDefaults(getSymbols, src = "yahoo")
<- getSymbols("BRK-A", auto.assign = FALSE)
BRK.A
setDefaults(getSymbols, src = "av")
colnames()
to create an object named col_names
that contains the column names for the BRK.A
object.
# Create col_names object from the colnames() of BRK.A
<- colnames(BRK.A) col_names
make.names()
to convert col_names
to syntactically valid names.
# Convert col_names into syntactically valid names
colnames(BRK.A) <- make.names(col_names)
Fantastic! Now you can fix tricky ticker symbols in the column names of your data.
3.3.3 Create valid names for multiple instruments
An earlier exercise taught you how to use setSymbolLookup()
to set a default data source for getSymbols()
. You can also use setSymbolLookup()
to create a mapping between the instrument symbol and the name of the R object.
This is useful if you want to download data for a lot symbols that are not syntactically valid names, or symbols that have names that conflict with other R variable names.
An example of a name that conflicts is the symbol for AT&T’s stock, T
, which is often used as a short form for the logical value TRUE
.
To change the name of a given symbol, arguments must be passed to setSymbolLookup()
as a list, like so: setSymbolLookup(NEW_NAME = list(name = “OLD_NAME”))
.
“BRK-A”
to “BRK.A”
.
# Set name for BRK-A to BRK.A
setSymbolLookup(BRK.A = list(name = "BRK-A"))
“T”
(AT&T) to “ATT”
.
# Set name for T to ATT
setSymbolLookup(ATT = list(name = "T"))
getSymbols()
call to load data for both instruments using the new names you assigned to them.
# Load BRK.A and ATT data
getSymbols(c("BRK.A", "ATT"), api.key = "4EQRAZ3VA8ERX4BV")
## NULL
Awesome job! Now you can map troublesome tickers to new names with setSymbolLookup()
.
4 Aligning data with different periodicities
You’ve learned how to import, extract, and transform data from multiple data sources. You often have to manipulate data from different sources in order to combine them into a single data set. First, you will learn how to convert sparse, irregular data into a regular series. Then you will review how to aggregate dense data to a lower frequency. Finally, you will learn how to handle issues with intra-day data.
4.1 Making irregular data regular
4.1.1 Create a zero-width and regular xts object
In order to create regular data from an irregular dataset, the first thing you need is a regular sequence of date-times that span the dates of your irregular dataset. A “regular” sequence of date-times has equally-spaced time points.
In this exercise, you will use the irregular_xts
object to
create a zero-width xts object that has a regular daily index. A
zero-width xts object has an index of date-times, but no data columns.
# edited by cliex159
library(xts)
= xts(c(4,21,1,34),order.by = c(as.Date(2016-01-02),
irregular_xts as.Date(2016-01-05),
as.Date(2016-01-07),
as.Date(2016-01-11)))
colnames(irregular_xts) = "data"
start()
function to create an object named start_date
.
# Extract the start date of the series
<- start(irregular_xts) start_date
end()
function to create an object named end_date
.
# Extract the end date of the series
<- end(irregular_xts) end_date
seq()
function to create an object named regular_index
containing a sequence of date-times from start_date
to end_date
by day.
# Create a regular date sequence
<- seq(start_date, end_date, by = "day") regular_index
xts()
constructor to create a zero-width xts object. Store it in regular_xts
.
# Create a zero-width xts object
<- xts(, order.by = regular_index) regular_xts
Fantastic! Making regular date-time sequences is useful in many time-series applications.
4.1.2 Use merge to make an irregular index regular
The previous exercise taught you how to make a zero-width xts object with a regular time index. You can use the zero-width object to regularize an irregular xts object.
The regularized series usually has missing values (NA
)
because the irregular data does not have a value for all observations in
the regular index. This exercise will teach you how to handle these
missing values when you merge()
the two series.
The irregular_xts
and regular_xts
objects from the previous exercise are available in your workspace.
merge()
function to combine irregular_xts
and regular_xts
into an object named merged_xts
.
# Merge irregular_xts and regular_xts
<- merge(irregular_xts, regular_xts) merged_xts
head()
to look at the first few rows of merged_xts
.
# Look at the first few rows of merged_xts
head(merged_xts)
## data
## 1975-06-28 34
## 1975-06-29 NA
## 1975-06-30 NA
## 1975-07-01 NA
## 1975-07-02 1
## 1975-07-03 NA
merged_filled_xts
by using the fill
argument to merge()
to replace the NA
with the last observation, carried forward (na.locf
).
# Use the fill argument to fill NA with their previous value
<- merge(irregular_xts, regular_xts, fill = na.locf) merged_filled_xts
head
to look at the first few rows of merged_filled_xts
.
# Look at the first few rows of merged_filled_xts
head(merged_filled_xts)
## data
## 1975-06-28 34
## 1975-06-29 34
## 1975-06-30 34
## 1975-07-01 34
## 1975-07-02 1
## 1975-07-03 1
Awesome! Filling forward is a useful operation, but be careful to make sure it is what you want!
4.2 Aggregating to lower frequency
4.2.1 Aggregate daily data and merge with monthly data
Sometimes two series have the same periodicity, but represent timestamps
differently. For example, monthly series may be timestamped with the
first or last date of the month. This difference creates many NA
when series are merged. The yearmon
class from the zoo package helps solve this problem.
In this exercise, you will aggregate the FRED daily Fed Funds rate (DFF) to a monthly periodicity and merge it with the FRED monthly Fed Funds rate (FEDFUNDS).The DFF aggregate will be timestamped with the last day of the month, while FEDFUNDS is timestamped with the first day of the month.
The FEDFUNDS
and DFF
data have already been downloaded from FRED for you, using getSymbols(c(“FEDFUNDS”, “DFF”), src = “FRED”)
.
# edited by cliex159
getSymbols(c("FEDFUNDS", "DFF"), src = "FRED")
## [1] "FEDFUNDS" "DFF"
apply.monthly()
with mean()
to calculate the average of all days for each month. Assign the result to monthly_fedfunds
.
# Aggregate DFF to monthly averages
<- apply.monthly(DFF, mean) monthly_fedfunds
as.yearmon()
to convert the index to yearmon
.
# Convert index to yearmon
index(monthly_fedfunds) <- as.yearmon(index(monthly_fedfunds))
merged_fedfunds
by merging FEDFUNDS
with the monthly aggregate you created in the first step.
# Merge FEDFUNDS with the monthly aggregate
<- merge(FEDFUNDS, monthly_fedfunds) merged_fedfunds
head()
to check the output of merged_fedfunds
.
# Look at the first few rows of the merged object
head(merged_fedfunds)
## FEDFUNDS DFF
## 1954-07-01 0.80 0.7993548
## 1954-08-01 1.22 1.2206452
## 1954-09-01 1.07 1.0666667
## 1954-10-01 0.85 0.8487097
## 1954-11-01 0.83 0.8336667
## 1954-12-01 1.28 1.2777419
Awesome! You will often need to aggregate to a lower frequency to align multiple time series.
4.2.2 Align series to first and last day of month
Sometimes you may not be able to use convenience classes like yearmon
to represent timestamps. This exercise will teach you how to manually
align merged data to the timestamp representation you prefer.
First you merge the lower-frequency data with the aggregate data, then use na.locf()
to fill the NA
forward (or backward, using fromLast = TRUE
). Then you can subset the result using the index of the object with the representation you prefer.
Your workspace contains FEDFUNDS
, monthly_fedfunds
(the result of apply.monthly(DFF, mean)
), and merged_fedfunds
(the result of merge(FEDFUNDS, monthly_fedfunds)
where the monthly_fedfunds
index is a Date
). Note the NA
values in merged_fedfunds
.
na.locf()
to fill the NA
values in merged_fedfunds
. Assign the result to merged_fedfunds_locf
.
# Fill NA forward
<- na.locf(merged_fedfunds) merged_fedfunds_locf
merged_fedfunds_locf
by index(monthly_fedfunds)
to create an xts object with timestamps at month-end. Name the result aligned_last_day
.
# Extract index values containing last day of month
<- merged_fedfunds_locf[index(monthly_fedfunds)] aligned_last_day
fromLast
argument to na.locf()
to fill the NA
values with the next observation. Assign the result to merged_fedfunds_locb
.
# Fill NA backward
<- na.locf(merged_fedfunds, fromLast = TRUE) merged_fedfunds_locb
merged_fedfunds_locb
by index(FEDFUNDS)
to create an xts object with timestamps on the first of the month. Name the result aligned_first_day
.
# Extract index values containing first day of month
<- merged_fedfunds_locb[index(FEDFUNDS)] aligned_first_day
Great! Knowing how to manually align merged data will definitely come in handy!
4.2.3 Aggregate to weekly, ending on Wednesdays
In this exercise, you will learn a general aggregation technique to aggregate daily data to weekly, but with weeks ending on Wednesdays. This is often done in stock market research to avoid intra-week seasonality.
The period.apply()
function takes an xts object, time
period end points, and an aggregation function. Then it applies the
function to each group of data between the end points.
The endpoints()
function can calculate the time period end points for period.apply()
,
and you can use custom end points too. But your custom end points
vector must start with zero and end with the total number of
observations you’re going to aggregate, just like the output of endpoints()
.
You’ll use .indexwday()
to find the Wednesday of each week. It returns a number between 0-6, where Sunday=0.
This exercise will use the daily Fed Funds data (DFF
) from prior exercises.
.indexwday()
to get the week days from the DFF
index. Assign the result to index_weekdays
.
# Extract index weekdays (Sunday = 0)
<- .indexwday(DFF) index_weekdays
which()
function to find the locations of the Wednesdays in index_weekdays
. Store the result in wednesdays
.
# Find locations of Wednesdays
<- which(index_weekdays == 3) wednesdays
end_points
start with 0 and end with the total number of rows, like the output of endpoints()
.
# Create custom end points
<- c(0, wednesdays, nrow(DFF)) end_points
period.apply()
and end_points
to aggregate DFF
to weekly averages. Assign the result to weekly_mean
.
# Calculate weekly mean using custom end points
<- period.apply(DFF, end_points, mean) weekly_mean
Wonderful! There are many ways to convert a time series to a lower frequency.
4.3 Aggregating and combining intraday data
4.3.1 Combine data that have timezones
Recall that xts objects store the time index as seconds since midnight, 1970-01-01 in the UTC timezone. merge()
uses this underlying index and returns a result with the first object’s timezone.
This exercise provides an example. The two objects in your workspace are
identical except for the index timezone. The index values are the same
instances in time, but measured in different locations. The london
object’s timezone is Europe/London and the chicago
object’s timezone is America/Chicago.
# edited by cliex159
library(lubridate)
= xts(c(1,2,3,4,5), order.by = ymd_hms("2010-01-09 01:00:00","2010-01-09 04:00:00","2010-01-09 06:00:00","2010-01-09 14:00:00","2010-01-10 12:00:00"), tz = "GMT")
london colnames(london) = "London"
= xts(c(1,2,3,4,5), order.by = ymd_hms("2010-01-08 19:00:00", "2010-01-08 22:00:00", "2010-01-09 00:00:00", "2010-01-09 08:00:00", "2010-01-10 06:00:00"), tz = "CST")
chicago colnames(chicago) = "Chicago"
merge()
will have a London timezone.
# Create merged object with a London timezone
<- merge(london, chicago) tz_london
str()
to look at the structure of tz_london
and note the TZ
and the local time range printed in the first line of output.
# Look at tz_london structure
str(tz_london)
## An 'xts' object on 2010-01-08 19:00:00/2010-01-10 12:00:00 containing:
## Data: num [1:10, 1:2] NA NA NA 1 2 3 NA 4 NA 5 ...
## - attr(*, "dimnames")=List of 2
## ..$ : NULL
## ..$ : chr [1:2] "London" "Chicago"
## Indexed by objects of class: [POSIXct,POSIXt] TZ: GMT
## xts Attributes:
## NULL
merge()
command so the result will have a Chicago timezone.
# Create merged object with a Chicago timezone
<- merge(chicago, london) tz_chicago
str()
to look at the structure of tz_chicago
and note how the TZ
and local time range have changed.
# Look at tz_chicago structure
str(tz_chicago)
## An 'xts' object on 2010-01-08 19:00:00/2010-01-10 12:00:00 containing:
## Data: num [1:10, 1:2] 1 2 3 NA NA NA 4 NA 5 NA ...
## - attr(*, "dimnames")=List of 2
## ..$ : NULL
## ..$ : chr [1:2] "Chicago" "London"
## Indexed by objects of class: [POSIXct,POSIXt] TZ: CST
## xts Attributes:
## NULL
Great job! Combining multiple timezones is easy with xts!
4.3.2 Make irregular intra-day data regular
Earlier you learned how to create a regular daily series from irregular daily data. Now you will create regular intra-day data from an irregular series.
Intra-day financial data often does not span a full 24 hours. Most markets are usually closed part of the day. This exercise assumes markets open at 9AM and close at 4PM Monday-Friday.
Your data may not have an observation exactly at the market open and/or close. So, you would not be able to use start()
and end()
as you could with the daily data. You need to specify the start and end date-times to create this regular sequence.
The regular date-time sequence will include periods when markets are closed, but you can use xts’ time-of-day subsetting to extract only the periods the market is open.
# edited by cliex159
= xts(seq(1,20,1),ymd_hms(
irregular_xts "2010-01-04 09:00:00",
"2010-01-04 11:00:00",
"2010-01-04 18:00:00",
"2010-01-04 22:00:00",
"2010-01-05 00:00:00",
"2010-01-05 05:00:00",
"2010-01-05 13:00:00",
"2010-01-05 17:00:00",
"2010-01-05 22:00:00",
"2010-01-06 02:00:00",
"2010-01-06 06:00:00",
"2010-01-06 07:00:00",
"2010-01-06 17:00:00",
"2010-01-06 22:00:00",
"2010-01-06 23:00:00",
"2010-01-07 02:00:00",
"2010-01-07 03:00:00",
"2010-01-07 08:00:00",
"2010-01-08 00:00:00",
"2010-01-08 06:00:00"), tz="UTC")
colnames(irregular_xts) = "data"
# Create a regular date-time sequence
<- seq(as.POSIXct("2010-01-04 09:00"), as.POSIXct("2010-01-08 16:00"), by = "30 min") regular_index
x
and order.by
to create a zero-width xts object.
# Create a zero-width xts object
<- xts(x = NULL, order.by = regular_index) regular_xts
merged_xts
by merging irregular_xts
and regular_xts
. Use the fill
argument to replace NA
with their previous value using na.locf()
.
# Merge irregular_xts and regular_xts, filling NA with their previous value
<- merge(regular_xts, irregular_xts, fill = na.locf) merged_xts
trade_day
.
# Subset to trading day (09:00-16:00)
<- merged_xts["T09:00/T16:00"] trade_day
Great job! Now you know how to subset your intra-day data to only contain the trading day!
4.3.3 Fill missing values by trading day
The previous exercise carried the last observation of the prior day forward into the first observation of the following day. This exercise will show you how to fill missing values by trading day, without using the prior day’s final value.
You will use the same split-lapply-rbind paradigm from the Introduction to xts and zoo course. For reference, the pattern is below.
x_split <- split(x, f = "months")
x_list <- lapply(x_split, cummax)
x_list_rbind <- do.call(rbind, x_list)
Recall that the do.call(rbind, …)
syntax allows you to pass a list of objects to rbind()
instead of having to type all their names.
Your workspace has a trade_day
object that contains the regular series from the previous exercise, but without any NA
filled in.
# edited by cliex159
= xts(c(1,rep(NA,3),2,rep(NA,18),7,rep(NA,51)),index(trade_day)) trade_day
daily_list
object by using split()
to put the trade_day
data into a list of data for each day.
# Split trade_day into days
<- split(trade_day , f = "days") daily_list
lapply()
to fill the NA
for each day’s data in the daily_list
list.
# Use lapply to call na.locf for each day in daily_list
<- lapply(daily_list, FUN = na.locf) daily_filled
do.call()
and rbind()
to convert daily_filled
to a single xts object named filled_by_trade_day
.
# Use do.call to rbind the results
<- do.call(rbind, daily_filled) filled_by_trade_day
You’re amazing! You used advanced functions to transform data for each trading day!
4.3.4 Aggregate irregular intra-day data
Intraday data can be huge, with hundreds of thousands of observations per day, millions per month, and hundreds of millions per year. These datasets often need to be aggregated before you can work with them.
You learned how to aggregate daily data in the Introduction to xts and zoo course. This exercise will use to.period()
to aggregate intraday data to an OHLC series. You often need to specify both period
and k
arguments to aggregate intraday data.
The intraday_xts
object contains one trading day of random data.
# edited by cliex159
load("DC.RData")
<- DC[,1] intraday_xts
to.period()
to convert intraday_xts
into a 5-second price series called xts_5sec
.
# Convert raw prices to 5-second prices
<- to.period(intraday_xts, period = "seconds", k = 5) xts_5sec
to.period()
to convert intraday_xts
into a 10-minute price series called xts_10min
.
# Convert raw prices to 10-minute prices
<- to.period(intraday_xts, period = "minutes", k = 10) xts_10min
to.period()
to convert intraday_xts
into a 1-hour price series called xts_1hour
.
# Convert raw prices to 1-hour prices
<- to.period(intraday_xts, period = "hours", k = 1) xts_1hour
Fantastic! You just aggregated a large amount of data like a pro!
5 Importing text data, and adjusting for corporate actions
You’ve learned the core workflow of importing and manipulating financial data. Now you will see how to import data from text files of various formats. Then you will learn how to check data for weirdness and handle missing values. Finally, you will learn how to adjust stock prices for splits and dividends.
5.1 Importing text files
5.1.1 Import well-formatted daily OHLC data
You can use getSymbols()
to import a well-formatted CSV. In
this case, well-formatted means the file contains data for a single
instrument with date, open, high, low, close, volume, and adjusted close
columns, in that order. You might have noticed that this is the same
format as getSymbols()
returns when you download data from internet sources.
getSymbols()
allows you to use a directory of CSV files as a source (like Yahoo Finance and FRED). In this exercise, you will be using AMZN.csv
in your working directory. It contains some randomized Amazon.com data from the first half of 2002. You can use dir()
to see the file in your working directory.
getSymbols()
to load AMZN.csv
.
# Load AMZN sourced from the csv
getSymbols("AMZN", src = "csv")
## [1] "AMZN"
AMZN
.
# Look at AMZN structure
str(AMZN)
## An 'xts' object on 2002-01-02/2002-06-28 containing:
## Data: num [1:124, 1:6] 11.1 11.3 12.5 11.8 12.8 ...
## - attr(*, "dimnames")=List of 2
## ..$ : NULL
## ..$ : chr [1:6] "AMZN.Open" "AMZN.High" "AMZN.Low" "AMZN.Close" ...
## Indexed by objects of class: [Date] TZ: UTC
## xts Attributes:
## List of 2
## $ src : chr "csv"
## $ updated: POSIXct[1:1], format: "2022-10-22 13:20:43"
Great! Well-formatted CSV files are very easy to work with.
5.1.2 Import text files in other formats
The previous exercise taught you how to import well-formatted CSV data using getSymbols()
. Unfortunately, most data are not well-formatted.
The zoo package provides several functions to import text files as zoo objects. The main function is read.zoo()
, which wraps read.table()
. The xts class extends zoo, so you can easily convert the result of read.zoo()
into an xts object by using as.xts()
.
In this exercise, you will learn how you can use read.zoo()
to import the same AMZN.csv
data from the previous exercise.
read.zoo()
to import AMZN.csv
. Assign the output to an object named amzn_zoo
.
# Import the AMZN.csv file using read.zoo
<- read.zoo("AMZN.csv", sep = ",", header = TRUE) amzn_zoo
amzn_zoo
to an xts object. Assign the output to an object named amzn_xts
.
# Convert to xts
<- as.xts(amzn_zoo) amzn_xts
amzn_xts
.
# Look at the first few rows of amzn_xts
head(amzn_xts)
## AMZN.Open AMZN.High AMZN.Low AMZN.Close AMZN.Volume AMZN.Adjusted
## 2002-01-02 11.13 11.01 10.46 10.87 6674703 10.87
## 2002-01-03 11.26 12.25 10.76 11.99 11441553 11.99
## 2002-01-04 12.46 12.62 11.71 12.10 12619402 12.10
## 2002-01-07 11.76 12.73 12.01 12.48 12296059 12.48
## 2002-01-08 12.82 12.43 11.35 11.96 6821418 11.96
## 2002-01-09 12.07 12.49 10.93 11.59 10669376 11.59
Awesome! As you will see, read.zoo()
is a very flexible import function for time series.
5.1.3 Handle date and time in separate columns
read.zoo()
makes it easy to import data when the date and time are in separate columns. The index.column
argument allows you to specify the name or number of the column(s)
containing the index data. That’s all you need to do if the date and
time are specified in the standard format (“%Y-%m-%d”
for date, and “%H:%M:%S”
for time).
In this exercise, you will use the index.column
argument to specify the date and time columns of the file. Your working directory has a file named UNE.csv
that contains some 5-minute OHLC data for the energy company, Unron. You will still use read.csv()
find the column names of the date and time columns.
UNE.csv
using read.csv()
. Assign the output to une_data
.
# Read data with read.csv
<- read.csv("UNE.csv", nrows = 5) une_data
une_data
and note the column names and locations.
# Look at the structure of une_data
str(une_data)
## 'data.frame': 5 obs. of 6 variables:
## $ Date : chr "2016-11-14" "2016-11-14" "2016-11-14" "2016-11-14" ...
## $ Time : chr "09:05:00" "09:10:00" "09:15:00" "09:20:00" ...
## $ Open : num 10.4 10.1 10.2 10.1 10.2
## $ High : num 10.4 10.2 10.2 10.2 10.2
## $ Low : num 10.2 10.1 10.2 10.1 10.2
## $ Close: num 10.2 10.2 10.2 10.2 10.2
read.zoo()
to import UNE.csv
, specifying index.columns
as the names of the date and time columns. Assign the output to une_zoo
.
# Specify Date and Time index column names
<- read.zoo("UNE.csv", index.column = c("Date", "Time"), sep = ",", header = TRUE) une_zoo
une_zoo
.
# Look at first few rows of data
head(une_zoo)
## Open High Low Close
## 2016-11-14 09:05:00 10.38 10.38 10.21 10.22
## 2016-11-14 09:10:00 10.12 10.22 10.12 10.22
## 2016-11-14 09:15:00 10.18 10.22 10.15 10.22
## 2016-11-14 09:20:00 10.09 10.22 10.07 10.22
## 2016-11-14 09:25:00 10.22 10.22 10.21 10.22
## 2016-11-14 09:30:00 10.16 10.22 10.15 10.22
Good job! The index.column
argument is great if your dates and times are in separate columns!
5.1.4 Read text file containing multiple instruments
The previous exercises work if each file contains only one instrument. Some software and data vendors may provide data for all instruments in one file. This exercise will teach you how to import a file that contains multiple instruments.
Once again, you can use read.zoo()
. This time you will be using its split
argument, which allows you to specify the name or number of the
columns(s) that contain the variables that identify unique observations.
The two_symbols.csv
file in your working directory contains
bid/ask data for two instruments, where each row has one bid or ask
observation for one instrument. You will use the split
argument to import the data into an object that has both bid and ask prices for both instruments on one row.
two_symbols.csv
using read.csv()
. Assign the output to two_symbols_data
.
# Read data with read.csv
<- read.csv("two_symbols.csv", nrows = 5) two_symbols_data
two_symbols_data
and note the column names and locations.
# Look at the structure of two_symbols_data
str(two_symbols_data)
## 'data.frame': 5 obs. of 4 variables:
## $ Date : chr "2016-01-01 10:43:01" "2016-01-01 10:43:01" "2016-01-01 10:43:01" "2016-01-01 10:43:01" ...
## $ Symbol: chr "A" "A" "B" "B" ...
## $ Type : chr "Bid" "Ask" "Bid" "Ask" ...
## $ Price : num 58.2 58.2 29 29 58.2
read.zoo()
to import two_symbols.csv
, specifying split
as the names of the symbol and type columns. Assign the output to two_symbols_zoo
.
# Specify Symbol and Type index column names
<- read.zoo("two_symbols.csv", split = c("Symbol", "Type"), sep = ",", header = TRUE) two_symbols_zoo
two_symbols_zoo
.
# Look at first few rows of data
head(two_symbols_zoo)
## A.Ask B.Ask A.Bid B.Bid
## 2016-01-01 10:43:01 58.24 28.98 58.23 28.96
## 2016-01-01 10:43:02 58.25 28.99 58.24 28.97
Great! The split
argument is the key for importing multiple instruments.
5.2 Checking for weirdness
5.2.1 Handle missing values
In chapter 3, you used na.locf()
to fill missing values
with the previous non-missing value. You can use interpolation when
carrying the previous value forward isn’t appropriate. In this exercise,
you will explore two interpolation methods: linear and spline.
Linear interpolation calculates values that lie on a line between two known data points. This is a good choice for fairly linear data, like a series with a strong trend. Spline interpolation is more appropriate for series without a strong trend, because it calculates a non-linear approximation using multiple data points.
Use these two methods to interpolate the three missing values for the 10-year Treasury rate in the object DGS10
. Then compare the results with the output of na.locf()
.
# edited by cliex159
= xts(c(4.94,4.85,4.78,4.79,4.85,NA,4.99,4.97,4.86,4.80,4.84,NA,NA,4.64,4.57,4.63),ymd("2001-08-27",
DGS10 "2001-08-28",
"2001-08-29",
"2001-08-30",
"2001-08-31",
"2001-09-03",
"2001-09-04",
"2001-09-05",
"2001-09-06",
"2001-09-07",
"2001-09-10",
"2001-09-11",
"2001-09-12",
"2001-09-13",
"2001-09-14",
"2001-09-17"))
na.approx()
to fill in missing values using linear interpolation.
# fill NA using last observation carried forward
<- na.locf(DGS10)
locf
# fill NA using linear interpolation
<- na.approx(DGS10) approx
na.spline()
to fill in missing values using spline interpolation.
# fill NA using spline interpolation
<- na.spline(DGS10) spline
locf
, approx
, and spline
into one object named na_filled
.
# merge into one object
<- merge(locf, approx, spline) na_filled
na_filled
.
# plot combined object
plot(na_filled, col = c("black", "red", "green"))
Excellent! Now you know multiple ways to interpolate missing values.
5.2.2 Visualize imported data
It’s important to check your imported data is reasonable. A plot is a
quick and easy way to spot oddities. In this exercise, you will use the plot()
function to visualize some AAPL data from Yahoo Finance.
A stock split caused a huge price change in June 2014. Apple simultaneously increased the number of shares outstanding and decreased its stock price, leaving the company value unchanged. For example, a 2-for-1 split would double the shares outstanding, and reduce the stock price by 1/2.
You will also use the quantmod extractor functions Cl()
and Ad()
to access the close and adjusted close columns, respectively. Yahoo
Finance provides the split- and/or dividend-adjusted close column. You
will learn more about the adjustment process in the next video.
# edited by cliex159
getSymbols("AAPL", src='yahoo', from = "2007-01-01", to = "2017-09-17")
## [1] "AAPL"
AAPL
is available in your workspace - consisting of Apple
stock data (imported from Yahoo Finance) from January 2007 through
September 15, 2017.
tail()
of AAPL
.
# Look at the last few rows of AAPL data
tail(AAPL)
## AAPL.Open AAPL.High AAPL.Low AAPL.Close AAPL.Volume AAPL.Adjusted
## 2017-09-08 40.2150 40.2875 39.6325 39.6575 114446000 37.63755
## 2017-09-11 40.1250 40.5125 39.9725 40.3750 126323200 38.31851
## 2017-09-12 40.6525 40.9900 39.6925 40.2150 286856000 38.16667
## 2017-09-13 39.9675 39.9900 39.4775 39.9125 179629600 37.87957
## 2017-09-14 39.7475 39.8500 39.5225 39.5700 95042800 37.55452
## 2017-09-15 39.6175 40.2425 39.5000 39.9700 196458400 37.93414
# Plot close price
plot(Cl(AAPL))
# Plot adjusted close price
plot(Ad(AAPL))
Good job! Visualization is a great way to get a feel for what the data is like.
5.3 Adjusting for corporate actions
5.3.1 Adjust for stock splits and dividends
Stock splits can create large historical price changes even though they do not change the value of the company. So, you must adjust all pre-split prices in order to calculate historical returns correctly.
Similarly, you must adjust all pre-dividend prices. Dividends do reduce the company’s value by the amount of the dividend payment, but the investor’s return isn’t affected because they receive the offsetting dividend payment.
In this exercise, you will learn how to use the adjustOHLC()
function to adjust raw historical OHLC prices for splits and dividends, so historical returns can be calculated accurately.
Yahoo Finance provides raw prices and a split- and dividend-adjusted close column. The output of adjustOHLC()
should match Yahoo’s adjusted close column. AAPL
data from Yahoo Finance is already loaded in your workspace.
While not necessary to complete this exercise, Yahoo Finance provides an accessible example of the adjusted close calculation, if you’re interested in learning more.
AAPL
# Look at first few rows of AAPL
head(AAPL)
## AAPL.Open AAPL.High AAPL.Low AAPL.Close AAPL.Volume AAPL.Adjusted
## 2007-01-03 3.081786 3.092143 2.925000 2.992857 1238319600 2.555398
## 2007-01-04 3.001786 3.069643 2.993571 3.059286 847260400 2.612116
## 2007-01-05 3.063214 3.078571 3.014286 3.037500 834741600 2.593514
## 2007-01-08 3.070000 3.090357 3.045714 3.052500 797106800 2.606322
## 2007-01-09 3.087500 3.320714 3.041071 3.306071 3349298400 2.822829
## 2007-01-10 3.383929 3.492857 3.337500 3.464286 2952880000 2.957918
adjustOHLC()
to adjust the AAPL
data. Assign the result to an object named aapl_adjusted
.
# Adjust AAPL for splits and dividends
<- adjustOHLC(AAPL) aapl_adjusted
aapl_adjusted
. Notice the close column of aapl_adjusted
is the same as the adjusted close column of AAPL
.
# Look at first few rows of aapl_adjusted
head(aapl_adjusted)
## AAPL.Open AAPL.High AAPL.Low AAPL.Close AAPL.Volume AAPL.Adjusted
## 2007-01-03 0.3960780 0.3974091 0.3759275 0.3846486 1238319600 2.555398
## 2007-01-04 0.3857962 0.3945173 0.3847404 0.3931862 847260400 2.612116
## 2007-01-05 0.3936911 0.3956648 0.3874027 0.3903862 834741600 2.593514
## 2007-01-08 0.3945632 0.3971795 0.3914419 0.3923141 797106800 2.606322
## 2007-01-09 0.3968124 0.4267855 0.3908452 0.4249036 3349298400 2.822829
## 2007-01-10 0.4349101 0.4489097 0.4289429 0.4452377 2952880000 2.957918
Great! The adjustOHLC()
function takes care of the hard work for you!
5.3.2 Download split and dividend data
In the previous exercise, you used adjustOHLC()
to adjust
raw historical OHLC prices for splits and dividends, but it only works
for OHLC data. It will not work if you only have close prices, and it
does not return any of the split or dividend data it uses.
You need the dates and values for each split and dividend to adjust a non-OHLC price series, or if you simply want to analyze the raw split and dividend data.
You can download the split and dividend data from Yahoo Finance using the quantmod functions getSplits()
and getDividends()
,
respectively. The historical dividend data from Yahoo Finance is
adjusted for splits. If you want to download unadjusted dividend data,
you need to set split.adjust = FALSE
in your call to getDividends()
.
getSplits()
to import AAPL split data into splits
.
# Download AAPL split data
<- getSplits("AAPL") splits
getDividends()
to import split-adjusted AAPL dividend data into dividends
.
# Download AAPL dividend data
<- getDividends("AAPL") dividends
dividends
. The values are small and in fractional cents because they have been split-adjusted.
# Look at the first few rows of dividends
head(dividends)
## AAPL.div
## 1987-05-11 2.392857e-06
## 1987-08-10 4.785714e-06
## 1987-11-17 6.375000e-06
## 1988-02-12 6.375000e-06
## 1988-05-16 6.375000e-06
## 1988-08-15 6.375000e-06
getDividends()
to import unadjusted AAPL dividend data into raw_dividends
.
# Download AAPL dividend data that is not split-adjusted
<- getDividends("AAPL", split.adjust = FALSE) raw_dividends
raw_dividends
. Note they differ from the values in dividends
.
# Look at the first few rows of raw_dividends
head(raw_dividends)
## AAPL.div
## 1987-05-11 0.000536
## 1987-08-10 0.000536
## 1987-11-17 0.000714
## 1988-02-12 0.000714
## 1988-05-16 0.000714
## 1988-08-15 0.000714
Good job! It’s important to get splits and dividends correct when calculating historical returns.
5.3.3 Adjust univariate data for splits and dividends
If you only have close prices, you can adjust them with adjRatios()
. It has 3 arguments: splits
, dividends
, and close
. It returns an xts object with split and dividend adjustment ratios in columns “Split”
and “Div”
, respectively.
You need to provide split data via the splits
argument to calculate the split ratio. To calculate the dividend ratio, you need to provide raw dividends and raw prices via the dividends
and close
arguments, respectively.
Once you have the split and dividend adjustment ratios, you calculate the adjusted price multiplying the unadjusted price by both the split and dividend adjustment ratios.
Your workspace contains splits
, dividends
, and raw_dividends
from prior exercises, and AAPL
data from Yahoo Finance.
adjRatios()
to calculate the split and dividend adjustment ratios. Assign the output to ratios
.
# Calculate split and dividend adjustment ratios
<- adjRatios(splits = splits, dividends = raw_dividends, close = Cl(AAPL)) ratios
aapl_adjusted
.
# Use the Split and Div columns to calculate adjusted close for AAPL
<- Cl(AAPL) * ratios[, "Split"] * ratios[, "Div"] aapl_adjusted
# Look at first few rows of adjusted close for AAPL
head(Ad(AAPL))
## AAPL.Adjusted
## 2007-01-03 2.555398
## 2007-01-04 2.612116
## 2007-01-05 2.593514
## 2007-01-08 2.606322
## 2007-01-09 2.822829
## 2007-01-10 2.957918
# Look at first few rows of aapl_adjusted
head(aapl_adjusted)
## AAPL.Close
## 2007-01-03 0.3846486
## 2007-01-04 0.3931862
## 2007-01-05 0.3903862
## 2007-01-08 0.3923141
## 2007-01-09 0.4249036
## 2007-01-10 0.4452377
Amazing! You just calculated the adjusted close price! Congrats!
5.3.4 When to adjust data
When do you need to back-adjust all historical data for an instrument?
- Every time you receive new price data.
- Never.
- Any time there’s a new split or dividend.
Good job! Continue to the next exercise.
5.4 Congratulations!
That concludes this course on importing and managing financial data in R.
Well done! You’ve learned tools that will help you wrangle many types of time-series data that you will encounter in the real world. Go forth and analyze!