Importing and Managing Financial Data in R

Lore Dirick & Davis Vaughan


Course Description

If you’ve ever done anything with financial or economic time series, you know the data come in various shapes, sizes, and periodicities. Getting the data into R can be stressful and time-consuming, especially when you need to merge data from several different sources into one data set. This course will cover importing data from local files as well as from internet sources.

1 Introduction and downloading data

A wealth of financial and economic data are available online. Learn how getSymbols() and Quandl() make it easy to access data from a variety of sources.

1.1 Welcome to the course!

1.1.1 Introducing getSymbols()

The getSymbols() function from the quantmod package provides a consistent interface to import data from various sources into your workspace. By default, getSymbols() imports the data as a xts object.

This exercise will introduce you to getSymbols(). You will use it to import QQQ data from Yahoo! Finance. QQQ is an exchange-traded fund that tracks the Nasdaq 100 index, and Yahoo! Finance is the default data source for getSymbols().

You use the Symbols argument to specify the instrument (i.e. the ticker symbol) you want to import. Since Symbols is the first argument to getSymbols(), you usually just type the instrument name and omit Symbols =.

  • Load the quantmod package using the library() function.
  • # Load the quantmod package
    library(quantmod)
  • Now use getSymbols() to import QQQ data. Make sure the data gets assigned to QQQ (auto.assign = TRUE).
  • # Import QQQ data from Yahoo! Finance
    getSymbols("QQQ")
    ## [1] "QQQ"
  • Use the str() function to view the structure of the QQQ object that getSymbols() created. Note the src and updated attributes.
  • # Look at the structure of the object getSymbols created
    str(QQQ)
    ## An 'xts' object on 2007-01-03/2022-10-21 containing:
    ##   Data: num [1:3980, 1:6] 43.5 43.3 44 43.9 44 ...
    ##  - attr(*, "dimnames")=List of 2
    ##   ..$ : NULL
    ##   ..$ : chr [1:6] "QQQ.Open" "QQQ.High" "QQQ.Low" "QQQ.Close" ...
    ##   Indexed by objects of class: [Date] TZ: UTC
    ##   xts Attributes:  
    ## List of 2
    ##  $ src    : chr "yahoo"
    ##  $ updated: POSIXct[1:1], format: "2022-10-22 13:20:14"
  • Use the head() function to view the first few rows of QQQ.
  • # Look at the first few rows of QQQ
    head(QQQ)
    ##            QQQ.Open QQQ.High QQQ.Low QQQ.Close QQQ.Volume QQQ.Adjusted
    ## 2007-01-03    43.46    44.06   42.52     43.24  167689500     37.88621
    ## 2007-01-04    43.30    44.21   43.15     44.06  136853500     38.60469
    ## 2007-01-05    43.95    43.95   43.48     43.85  138958800     38.42067
    ## 2007-01-08    43.89    44.12   43.64     43.88  106401600     38.44696
    ## 2007-01-09    44.01    44.29   43.63     44.10  121577500     38.63972
    ## 2007-01-10    43.96    44.66   43.82     44.62  121070100     39.09535

    Awesome! How easy was that!?

    1.1.2 Data sources

    In the last exercise, you imported data from Yahoo! Finance. The src argument allows you to tell getSymbols() to import data from a different data source.

    In this exercise, you will import data from Alpha Vantage and FRED. Alpha Vantage is a source similar to Yahoo! Finance. FRED is an online database of economic time series data created and maintained by the Federal Reserve Bank of St. Louis.

    getSymbols() imports data from Yahoo! Finance by default because src = “yahoo” by default. The src values for Alpha Vantage and FRED are “av” and “FRED”, respectively.

    The quantmod package has already been loaded for you.

  • Use getSymbols() to import QQQ data from Alpha Vantage by setting src.
  • # Import QQQ data from Alpha Vantage
    getSymbols("QQQ", src = "av", api.key = "4EQRAZ3VA8ERX4BV")
    ## [1] "QQQ"
  • Use str() to view the structure of QQQ; note the src attribute.
  • # Look at the structure of QQQ
    str(QQQ)
    ## An 'xts' object on 2022-06-01/2022-10-21 containing:
    ##   Data: num [1:100, 1:5] 310 305 309 311 304 ...
    ##  - attr(*, "dimnames")=List of 2
    ##   ..$ : NULL
    ##   ..$ : chr [1:5] "QQQ.Open" "QQQ.High" "QQQ.Low" "QQQ.Close" ...
    ##   Indexed by objects of class: [Date] TZ: UTC
    ##   xts Attributes:  
    ## List of 2
    ##  $ src    : chr "alphavantage"
    ##  $ updated: Date[1:1], format: "2022-10-21"
  • Now use getSymbols() to import U.S. GDP data from FRED by setting src (the symbol for U.S. GDP is “GDP”).
  • # Import GDP data from FRED
    getSymbols("GDP", src = "FRED")
    ## [1] "GDP"
  • Use str() to view the structure of GDP; note the src attribute.
  • # Look at the structure of GDP
    str(GDP)
    ## An 'xts' object on 1947-01-01/2022-04-01 containing:
    ##   Data: num [1:302, 1] 243 246 250 260 266 ...
    ##  - attr(*, "dimnames")=List of 2
    ##   ..$ : NULL
    ##   ..$ : chr "GDP"
    ##   Indexed by objects of class: [Date] TZ: UTC
    ##   xts Attributes:  
    ## List of 2
    ##  $ src    : chr "FRED"
    ##  $ updated: POSIXct[1:1], format: "2022-10-22 13:20:15"

    There are a number of data sources out there, providing a wealth of information. Try them all!

    1.1.3 Make getSymbols() return the data it retrieves

    In the last exercise, getSymbols() automatically created an object named like the symbol you provided. This exercise will teach you to make getSymbols() return the data, so you can assign the output yourself.

    There are two arguments that will make getSymbols() return the data:

    1. Set auto.assign = FALSE.
    2. Set env = NULL.

    The two methods are functionally equivalent, but auto.assign = FALSE describes the behavior better. Use it because you will be more likely to remember what auto.assign = FALSE means in the future.

  • Download SPY data and assign it to an object named spy by using the auto.assign argument.
  • # Assign SPY data to 'spy' using auto.assign argument
    spy <- getSymbols("SPY", auto.assign = FALSE)
  • Use the str() function to view the structure of the spy object.
  • # Look at the structure of the 'spy' object
    str(spy)
    ## An 'xts' object on 2007-01-03/2022-10-21 containing:
    ##   Data: num [1:3980, 1:6] 142 141 141 141 141 ...
    ##  - attr(*, "dimnames")=List of 2
    ##   ..$ : NULL
    ##   ..$ : chr [1:6] "SPY.Open" "SPY.High" "SPY.Low" "SPY.Close" ...
    ##   Indexed by objects of class: [Date] TZ: UTC
    ##   xts Attributes:  
    ## List of 2
    ##  $ src    : chr "yahoo"
    ##  $ updated: POSIXct[1:1], format: "2022-10-22 13:20:16"
  • Download JNJ data and assign it to an object named jnj by using the env argument.
  • # Assign JNJ data to 'jnj' using env argument
    jnj <- getSymbols("JNJ", env = NULL)
  • Use the str() again, but on the jnj object.
  • # Look at the structure of the 'jnj' object
    str(jnj)
    ## An 'xts' object on 2007-01-03/2022-10-21 containing:
    ##   Data: num [1:3980, 1:6] 66.1 65.9 66.8 66.5 66.6 ...
    ##  - attr(*, "dimnames")=List of 2
    ##   ..$ : NULL
    ##   ..$ : chr [1:6] "JNJ.Open" "JNJ.High" "JNJ.Low" "JNJ.Close" ...
    ##   Indexed by objects of class: [Date] TZ: UTC
    ##   xts Attributes:  
    ## List of 2
    ##  $ src    : chr "yahoo"
    ##  $ updated: POSIXct[1:1], format: "2022-10-22 13:20:17"

    Great! Turning off auto.assign is useful if you want to assign the data to an object yourself.

    1.2 Introduction to Quandl

    1.2.1 Introducing Quandl()

    Similar to how the quantmod package provides getSymbols() to import data from various sources, the Quandl package provides access to the Quandl databases via one simple function: Quandl().

    Recall that getSymbols() uses the Symbols and src arguments to specify the instrument and data source, respectively. The Quandl() function specifies both the data source and the instrument via its code argument, in the form “DATABASE/DATASET”.

    Two other ways Quandl() differs from getSymbols() are:

    1. Quandl() returns a data.frame by default.
    2. Quandl() will not automatically assign the data to an object.

    If you plan on importing a lot of data using Quandl(), you might consider opening a free account with them in order to get an API key.

  • Load the Quandl package using the library() function.
  • # Load the Quandl package
    library(Quandl)
    Quandl.api_key('u4L9zFqBdxLdKiczee_G')
  • Now use the Quandl() function to download U.S. GDP data from FRED and assign it to an object, gdp (the Quandl code is “FRED/GDP”).
  • # Import GDP data from FRED
    gdp <- Quandl("FRED/GDP")
  • Use the str() function to view the structure of the gdp object.
  • # Look at the structure of the object returned by Quandl
    str(gdp)
    ## 'data.frame':    300 obs. of  2 variables:
    ##  $ Date : Date, format: "2021-10-01" "2021-07-01" ...
    ##  $ Value: num  23992 23202 22741 22038 21478 ...
    ##  - attr(*, "freq")= chr "quarterly"

    Quandl provides access to a large amount of data series. Their website has documentation for it all!

    1.2.2 Return data type

    The Quandl() function returns a data.frame by default. It can return other classes via the type argument.

    The possible values for type are:

    1. “raw” (a data.frame),
    2. “ts” (time-series objects from the stats package),
    3. “zoo”,
    4. “xts”, and
    5. “timeSeries” (from the timeSeries package in the RMetrics suite).

    In this exercise, you will learn how to use the type argument to make Quandl() return an xts and a zoo object.

    The Quandl package has already been loaded for you.

  • Use the type argument to make Quandl() import the GDP data as a xts object. Assign the result to gdp_xts.
  • # Import GDP data from FRED as xts
    gdp_xts <- Quandl("FRED/GDP", type = "xts")
  • Use str() to view the structure of gdp_xts.
  • # Look at the structure of gdp_xts
    str(gdp_xts)
    ## An 'xts' object on 1947 Q1/2021 Q4 containing:
    ##   Data: num [1:300, 1] 243 246 250 260 266 ...
    ##   Indexed by objects of class: [yearqtr] TZ: UTC
    ##   xts Attributes:  
    ##  NULL
  • Use the type argument to make Quandl() import the GDP data as a zoo object. Assign the result to gdp_zoo.
  • # Import GDP data from FRED as zoo
    gdp_zoo <- Quandl("FRED/GDP", type = "zoo")
  • Use str() to view the structure of gdp_zoo.
  • # Look at the structure of gdp_zoo
    str(gdp_zoo)
    ## 'zooreg' series from 1947 Q1 to 2021 Q4
    ##   Data: num [1:300] 243 246 250 260 266 ...
    ##   Index:  'yearqtr' num [1:300] 1947 Q1 1947 Q2 1947 Q3 1947 Q4 ...
    ##   Frequency: 4

    1.3 Finding data from internet sources

    1.3.1 Find stock ticker from Yahoo Finance

    You need the instrument identifier to import data from an internet data source. They can often be found on the data source website. In this exercise, you will search Yahoo Finance for the ticker symbol for Pfizer stock.

    Note that some sources may not provide data for certain symbols, even if you can see the data displayed on their website in tables and/or charts. getSymbols() will error if the data is not available for download.

  • Create an object called symbol from the three-letter Yahoo Finance ticker symbol for Pfizer stock. To find the symbol, go to Yahoo Finance and search for “Pfizer”. Note that there are two parts to this ticker. We only need what comes before the period (the period and suffix specify other exchanges).
  • # Create an object containing the Pfizer ticker symbol
    symbol <- "PFE"
  • Use getSymbols() to import the data.
  • # Use getSymbols to import the data for 'symbol'
    getSymbols(symbol)
    ## [1] "PFE"
  • Use the head() function to view the first few rows of the object created by getSymbols().
  • # Look at the data in the object getSymbols created
    head(PFE)
    ##            PFE.Open PFE.High  PFE.Low PFE.Close PFE.Volume PFE.Adjusted
    ## 2007-01-03 24.81025 25.06641 24.64896  24.94307   42839619     13.26248
    ## 2007-01-04 25.02846 25.20873 24.94307  25.02846   33987495     13.30789
    ## 2007-01-05 25.18975 25.26566 24.82922  24.95256   33046589     13.26753
    ## 2007-01-08 24.94307 25.06641 24.56357  24.81974   45557569     13.19690
    ## 2007-01-09 24.89564 24.99051 24.70588  24.82922   32994627     13.20195
    ## 2007-01-10 24.76281 24.91461 24.66793  24.85769   36412854     13.21709

    Looking up identifiers online is common when seeking data about a new instrument, so it’s good to get comfortable with the process!

    1.3.2 Download exchange rate data from Oanda

    Oanda.com provides historical foreign exchange data for many currency pairs. Currency pairs are expressed as two currencies, the “base” and the “quote”, separated by a “/”. For example, the U.S. Dollar to Euro exchange rate would be “USD/EUR”.

    Note that getSymbols() will automatically convert “USD/EUR” to a valid name by removing the “/”. For example, getSymbols(“USD/EUR”) would create an object named USDEUR.

    Also, Oanda.com only provides 180 days of historical data. getSymbols() will warn and return as much data as possible if you request data from more than 180 days ago. You can use the from and to arguments to set a date range; both should be strings in “%Y-%m-%d” format (e.g. “2016-02-06”).

  • Create an object called currency_pair with the symbols for the British Pound and the Canadian Dollar so we can use this to get British Pound to Canadian Dollar exchange rate data. quantmod::oanda.currencies contains a list of currencies provided by Oanda.com.
  • # Create a currency_pair object
    currency_pair <- "GBP/CAD"
  • Use getSymbols() to load data for currency_pair. Remember to specify src!
  • # Load British Pound to Canadian Dollar exchange rate data
    getSymbols(currency_pair, src = "oanda")
    ## [1] "GBP/CAD"
  • Use str() to examine the data getSymbols() created. Remember, it will remove the /!
  • # Examine object using str()
    str(GBPCAD)
    ## An 'xts' object on 2022-04-26/2022-10-21 containing:
    ##   Data: num [1:179, 1] 1.62 1.61 1.6 1.6 1.62 ...
    ##  - attr(*, "dimnames")=List of 2
    ##   ..$ : NULL
    ##   ..$ : chr "GBP.CAD"
    ##   Indexed by objects of class: [Date] TZ: UTC
    ##   xts Attributes:  
    ## List of 2
    ##  $ src    : chr "oanda"
    ##  $ updated: POSIXct[1:1], format: "2022-10-22 13:20:20"
  • Try to load data from 190 days ago to today. Remember that Sys.Date() will give you today’s date.
  • # Try to load data from 190 days ago
    getSymbols(currency_pair, from = Sys.Date() - 190, to = Sys.Date(), src = "oanda")
    ## [1] "GBP/CAD"

    Easy enough, right? Did you notice what happened when you tried to get data further back than 180 days?

    1.3.3 Find and import Unemployment Rate data from FRED

    Both getSymbols() and Quandl() provide access to the FRED database. In this exercise, you will find the FRED symbol for the United States civilian unemployment rate. Then you will use the series name to download the data directly from FRED using getSymbols(), and from the Quandl database using Quandl().

    Remember that getSymbols() specifies the data source using the src argument and that Quandl() specifies it as part of the Quandl code (i.e. database/series).

    Both the quantmod and Quandl packages have been loaded for you, so you just need to find the symbol and call getSymbols() and Quandl().

  • Assign the FRED series name to an object called series_name. To find the FRED series name, go to the FRED website and search for “United States civilian unemployment rate”.
  • # Create a series_name object
    series_name <- "UNRATE"
  • Use getSymbols() to load the data. Remember to set the src argument!
  • # Load the data using getSymbols
    getSymbols(series_name, src = "FRED")
    ## [1] "UNRATE"
  • Create an object named quandl_code (prepend the Quandl database name (FRED) to the series_name).
  • # Create a quandl_code object
    quandl_code <- "FRED/UNRATE"
  • Use Quandl() to load the data. Assign the data to an object named unemploy_rate. Remember to include the database in the Quandl code!
  • # Load the data using Quandl
    unemploy_rate <- Quandl(quandl_code)

    FRED contains a huge number of economic data series. Be sure to check them out for more information!

    2 Extracting and transforming data

    You’ve learned how to import data from online sources, now it’s time to see how to extract columns from the imported data. After you’ve learned how to extract columns from a single object, you will explore how to import, transform, and extract data from multiple instruments.

    2.1 Extracting columns from financial time series

    2.1.1 Extract one column from one instrument

    The quantmod package provides several helper functions to extract specific columns from an object, based on the column name. The Op(), Hi(), Lo(), Cl(), Vo(), and Ad() functions can be used to extract the open, high, low, close, volume, and adjusted close column, respectively.

    In this exercise, you will use two of these functions on an xts object named DC. The DC object contains fictitious DataCamp OHLC (open, high, low, close) stock prices created by randomizing some real financial market data. DC is similar to the xts objects created by getSymbols().

    While it’s not necessary to complete the exercise, you can learn more about all the extractor functions from help(“OHLC.Transformations”).

    # edited by cliex159
    symbol <- "DC"
    
    # Use getSymbols to import the data for 'symbol'
    getSymbols(symbol)
    ## [1] "DC"
  • Use the appropriate extractor function to create an object named dc_close that contains only the close column of DC.
  • # Extract the close column
    dc_close <- Cl(DC)
  • View the first few rows of dc_close.
  • # Look at the head of dc_close
    head(dc_close)
    ##            DC.Close
    ## 2022-04-05     6.90
    ## 2022-04-06     5.41
    ## 2022-04-07     5.09
    ## 2022-04-08     5.06
    ## 2022-04-11     5.10
    ## 2022-04-12     4.94
  • Now extract the volume column from DC and assign it to an object named dc_volume.
  • # Extract the volume column
    dc_volume <- Vo(DC)
  • View the first few rows of dc_volume.
  • # Look at the head of dc_volume
    head(dc_volume)
    ##            DC.Volume
    ## 2022-04-05    829300
    ## 2022-04-06   1675200
    ## 2022-04-07    836700
    ## 2022-04-08    520700
    ## 2022-04-11    411400
    ## 2022-04-12    252900

    Great! Now let’s look at how to extract multiple columns from one instrument.

    2.1.2 Extract multiple columns from one instrument

    The quantmod package provides functions to extract a single column, and also has functions to extract specific sets of columns.

    Recall OHLC stands for open, high, low, close. Now you can guess which columns the OHLC() and HLC() functions extract. There’s also an OHLCV() function, which adds the volume column.

    These functions are helpful when you need to pass a set of columns to another function. For example, you might need to pass the high, low, and close columns (in that order) to a technical indicator function.

    In this exercise, you will use two of these extractor functions on the same DC object you used in the last exercise.

  • Use one extractor function to create an object named dc_hlc that contains the high, low, and close columns of DC.
  • # Extract the high, low, and close columns
    dc_hlc <- HLC(DC)
  • View the first few rows of dc_hlc.
  • # Look at the head of dc_hlc
    head(dc_hlc)
    ##            DC.High DC.Low DC.Close
    ## 2022-04-05    8.47   5.02     6.90
    ## 2022-04-06    8.00   5.08     5.41
    ## 2022-04-07    5.20   4.41     5.09
    ## 2022-04-08    5.30   4.88     5.06
    ## 2022-04-11    5.58   4.97     5.10
    ## 2022-04-12    5.15   4.92     4.94
  • Now extract the open, high, low, close, and volume columns from DC and assign the result to an object named dc_ohlcv.
  • # Extract the open, high, low, close, and volume columns
    dc_ohlcv <- OHLCV(DC)
  • View the first few rows of dc_ohlcv.
  • # Look at the head of dc_ohlcv
    head(dc_ohlcv)
    ##            DC.Open DC.High DC.Low DC.Close DC.Volume
    ## 2022-04-05    5.02    8.47   5.02     6.90    829300
    ## 2022-04-06    7.81    8.00   5.08     5.41   1675200
    ## 2022-04-07    5.05    5.20   4.41     5.09    836700
    ## 2022-04-08    5.03    5.30   4.88     5.06    520700
    ## 2022-04-11    5.10    5.58   4.97     5.10    411400
    ## 2022-04-12    5.15    5.15   4.92     4.94    252900

    Good job! There is also a more flexible function called getPrice(). Let’s look at that next.

    2.1.3 Use getPrice to extract other columns

    The extractor functions you learned in the previous two exercises do not cover all use cases. Sometimes you might have one object that contains the same price column for multiple instruments. Other times, you might have an object with price data (e.g. bid, ask, trade) that do not have an explicit extractor function.

    The getPrice() function in the quantmod package can extract any column by name by using the prefer argument. It can also extract columns for a specific instrument by using the symbol argument, which is useful when an object contains several instruments with the same price type.

    You can use regular expressions for both the prefer and symbol arguments, because they are passed to the base::grep() function internally.

  • Finish the Quandl() call to import CME futures data for CL and BZ into an xts object, oil_data.
  • # Download CME data for CL and BZ as an xts object
    oil_data <- Quandl(code = c("CHRIS/CME_CL1", "CHRIS/CME_BZ1"), type = "xts")
  • Use colnames() to view the column names of the oil_data object.
  • # Look at the column names of the oil_data object
    colnames(oil_data)
    ##  [1] "CHRIS.CME_CL1 - Open"                      
    ##  [2] "CHRIS.CME_CL1 - High"                      
    ##  [3] "CHRIS.CME_CL1 - Low"                       
    ##  [4] "CHRIS.CME_CL1 - Last"                      
    ##  [5] "CHRIS.CME_CL1 - Change"                    
    ##  [6] "CHRIS.CME_CL1 - Settle"                    
    ##  [7] "CHRIS.CME_CL1 - Volume"                    
    ##  [8] "CHRIS.CME_CL1 - Previous Day Open Interest"
    ##  [9] "CHRIS.CME_BZ1 - Open"                      
    ## [10] "CHRIS.CME_BZ1 - High"                      
    ## [11] "CHRIS.CME_BZ1 - Low"                       
    ## [12] "CHRIS.CME_BZ1 - Last"                      
    ## [13] "CHRIS.CME_BZ1 - Change"                    
    ## [14] "CHRIS.CME_BZ1 - Settle"                    
    ## [15] "CHRIS.CME_BZ1 - Volume"                    
    ## [16] "CHRIS.CME_BZ1 - Previous Day Open Interest"
  • Finish the getPrice() call to extract the Open price for CL into cl_open. Set prefer = “Open$” to match “Open” and not “Open Interest”.
  • # Extract the Open price for CL1
    cl_open <- getPrice(oil_data, symbol = "CL1", prefer = "Open$")
  • Use a ISO-8601 string to subset cl_open for the month of January, 2016 (don’t use a range, and you don’t need to specify any days).
  • # Look at January, 2016 using xts' ISO-8601 subsetting
    cl_open["2016-01"]
    ##            CHRIS.CME_CL1 - Open
    ## 2016-01-04                37.60
    ## 2016-01-05                36.90
    ## 2016-01-06                36.18
    ## 2016-01-07                34.09
    ## 2016-01-08                33.30
    ## 2016-01-11                32.94
    ## 2016-01-12                30.44
    ## 2016-01-13                30.48
    ## 2016-01-14                30.60
    ## 2016-01-15                31.18
    ## 2016-01-19                29.20
    ## 2016-01-20                28.33
    ## 2016-01-21                28.35
    ## 2016-01-22                29.84
    ## 2016-01-25                32.05
    ## 2016-01-26                29.81
    ## 2016-01-27                30.55
    ## 2016-01-28                32.19
    ## 2016-01-29                33.70

    Good job! getPrice() is a very flexible way to retrieve the columns you need.

    2.2 Importing and transforming multiple instruments

    2.2.1 Use Quandl to download quarterly returns data

    Sometimes you need to aggregate and/or transform raw data before you can continue your analysis. The Quandl() function allows you to specify common aggregations and transformations via the collapse and/or transform arguments. The Quandl API takes care of the details for you.

    You will use the same CL and BZ data from the CME as last exercise, but this time you will apply some aggregations and transformations on the raw data. The quandl codes for CL and BZ are provided in the object quandl_codes.

    # edited by cliex159
    quandl_codes <- c("CHRIS/CME_CL1", "CHRIS/CME_BZ1")
  • Use the collapse and type arguments with Quandl() to download quarterly data as an xts object named qtr_price.
  • # Download quarterly CL and BZ prices
    qtr_price <- Quandl(quandl_codes, type = "xts", collapse = "quarterly")
  • Use a quantmod extractor function to view the high prices for both instruments.
  • # View the high prices for both series
    Hi(qtr_price)
    ##         CHRIS.CME_CL1 - High CHRIS.CME_BZ1 - High
    ## 1983 Q1                29.60                   NA
    ## 1983 Q2                31.40                   NA
    ## 1983 Q3                30.55                   NA
    ## 1983 Q4                29.80                   NA
    ## 1984 Q1                30.85                   NA
    ## 1984 Q2                29.80                   NA
    ## 1984 Q3                29.68                   NA
    ## 1984 Q4                26.62                   NA
    ## 1985 Q1                28.33                   NA
    ## 1985 Q2                27.14                   NA
    ## 1985 Q3                29.10                   NA
    ## 1985 Q4                26.60                   NA
    ## 1986 Q1                11.20                   NA
    ## 1986 Q2                13.00                   NA
    ## 1986 Q3                15.15                   NA
    ## 1986 Q4                18.00                   NA
    ## 1987 Q1                18.85                   NA
    ## 1987 Q2                20.53                   NA
    ## 1987 Q3                19.70                   NA
    ## 1987 Q4                16.90                   NA
    ## 1988 Q1                17.10                   NA
    ## 1988 Q2                15.49                   NA
    ## 1988 Q3                13.82                   NA
    ## 1988 Q4                17.28                   NA
    ## 1989 Q1                21.48                   NA
    ## 1989 Q2                20.38                   NA
    ## 1989 Q3                20.14                   NA
    ## 1989 Q4                21.91                   NA
    ## 1990 Q1                20.30                   NA
    ## 1990 Q2                17.31                   NA
    ## 1990 Q3                39.95                   NA
    ## 1990 Q4                28.70                   NA
    ## 1991 Q1                19.72                   NA
    ## 1991 Q2                20.71                   NA
    ## 1991 Q3                22.32                   NA
    ## 1991 Q4                19.15                   NA
    ## 1992 Q1                19.49                   NA
    ## 1992 Q2                22.15                   NA
    ## 1992 Q3                21.76                   NA
    ## 1992 Q4                19.65                   NA
    ## 1993 Q1                20.47                   NA
    ## 1993 Q2                19.14                   NA
    ## 1993 Q3                18.95                   NA
    ## 1993 Q4                14.48                   NA
    ## 1994 Q1                14.90                   NA
    ## 1994 Q2                19.50                   NA
    ## 1994 Q3                18.42                   NA
    ## 1994 Q4                17.84                   NA
    ## 1995 Q1                19.35                   NA
    ## 1995 Q2                17.59                   NA
    ## 1995 Q3                17.64                   NA
    ## 1995 Q4                19.57                   NA
    ## 1996 Q1                21.65                   NA
    ## 1996 Q2                21.25                   NA
    ## 1996 Q3                24.62                   NA
    ## 1996 Q4                25.96                   NA
    ## 1997 Q1                20.88                   NA
    ## 1997 Q2                19.90                   NA
    ## 1997 Q3                21.48                   NA
    ## 1997 Q4                17.84                   NA
    ## 1998 Q1                15.89                   NA
    ## 1998 Q2                14.55                   NA
    ## 1998 Q3                16.18                   NA
    ## 1998 Q4                12.14                   NA
    ## 1999 Q1                17.05                   NA
    ## 1999 Q2                19.37                   NA
    ## 1999 Q3                24.70                   NA
    ## 1999 Q4                26.55                   NA
    ## 2000 Q1                27.15                   NA
    ## 2000 Q2                32.95                   NA
    ## 2000 Q3                30.95                   NA
    ## 2000 Q4                26.95                   NA
    ## 2001 Q1                26.64                   NA
    ## 2001 Q2                26.50                   NA
    ## 2001 Q3                23.49                   NA
    ## 2001 Q4                20.43                   NA
    ## 2002 Q1                26.38                   NA
    ## 2002 Q2                26.99                   NA
    ## 2002 Q3                30.90                   NA
    ## 2002 Q4                31.35                   NA
    ## 2003 Q1                31.30                   NA
    ## 2003 Q2                30.21                   NA
    ## 2003 Q3                29.38                   NA
    ## 2003 Q4                33.20                   NA
    ## 2004 Q1                36.00                   NA
    ## 2004 Q2                37.10                   NA
    ## 2004 Q3                50.10                   NA
    ## 2004 Q4                43.80                   NA
    ## 2005 Q1                56.10                   NA
    ## 2005 Q2                57.90                   NA
    ## 2005 Q3                67.03                   NA
    ## 2005 Q4                61.25                   NA
    ## 2006 Q1                67.14                   NA
    ## 2006 Q2                74.15                   NA
    ## 2006 Q3                63.10                   NA
    ## 2006 Q4                61.20                   NA
    ## 2007 Q1                66.70                   NA
    ## 2007 Q2                71.00                   NA
    ## 2007 Q3                83.76                   NA
    ## 2007 Q4                96.78                   NA
    ## 2008 Q1               106.78                   NA
    ## 2008 Q2               143.67                   NA
    ## 2008 Q3               102.46                   NA
    ## 2008 Q4                45.54                   NA
    ## 2009 Q1                50.00                   NA
    ## 2009 Q2                73.38                   NA
    ## 2009 Q3                70.72                   NA
    ## 2009 Q4                80.00                   NA
    ## 2010 Q1                83.85                   NA
    ## 2010 Q2                76.83                   NA
    ## 2010 Q3                80.18                   NA
    ## 2010 Q4                92.06                   NA
    ## 2011 Q1               106.83                   NA
    ## 2011 Q2                95.82                   NA
    ## 2011 Q3                83.23                   NA
    ## 2011 Q4               100.16               107.27
    ## 2012 Q1               104.15               124.04
    ## 2012 Q2                85.34                98.28
    ## 2012 Q3                92.71               113.35
    ## 2012 Q4                91.99               111.31
    ## 2013 Q1                97.35               110.08
    ## 2013 Q2                97.82               103.43
    ## 2013 Q3               102.76               108.66
    ## 2013 Q4                99.39               111.49
    ## 2014 Q1               101.97               108.33
    ## 2014 Q2               105.76               113.10
    ## 2014 Q3                94.90                97.79
    ## 2014 Q4                54.02                57.78
    ## 2015 Q1                48.73                56.37
    ## 2015 Q2                59.69                63.82
    ## 2015 Q3                45.85                49.00
    ## 2015 Q4                37.79                37.95
    ## 2016 Q1                39.04                40.02
    ## 2016 Q2                49.62                50.30
    ## 2016 Q3                48.30                49.28
    ## 2016 Q4                54.09                56.13
    ## 2017 Q1                50.85                53.32
    ## 2017 Q2                46.35                48.20
    ## 2017 Q3                51.77                57.79
    ## 2017 Q4                60.51                66.78
    ## 2018 Q1                65.26                70.43
    ## 2018 Q2                74.46                79.70
    ## 2018 Q3                73.73                82.85
    ## 2018 Q4                46.53                53.80
    ## 2019 Q1                60.73                68.89
    ## 2019 Q2                59.80                66.80
    ## 2019 Q3                56.57                62.09
    ## 2019 Q4                61.88                68.96
    ## 2020 Q1                21.89                23.86
    ## 2020 Q2                40.08                41.62
    ## 2020 Q3                40.37                40.97
    ## 2020 Q4                48.58                51.91
    ## 2021 Q1                61.17                64.77
    ## 2021 Q2                73.81                72.88
  • Modify the command to the first instruction to create a qtr_return object that contains the % change of the raw data. See the Quandl API documentation for how to specify row-on-row % change.
  • # Download quarterly CL and BZ returns
    qtr_return <- Quandl(quandl_codes, type = "xts", collapse = "quarterly", transform = "rdiff")
  • Use getPrice() to view the Settle returns.
  • # View the settle price returns for both series
    getPrice(qtr_return, prefer = "Settle")
    ##         CHRIS.CME_CL1 - Settle CHRIS.CME_BZ1 - Settle
    ## 1983 Q2            0.071355411                     NA
    ## 1983 Q3           -0.032504780                     NA
    ## 1983 Q4           -0.025032938                     NA
    ## 1984 Q1            0.042229730                     NA
    ## 1984 Q2           -0.035656402                     NA
    ## 1984 Q3           -0.003025210                     NA
    ## 1984 Q4           -0.109575185                     NA
    ## 1985 Q1            0.071185157                     NA
    ## 1985 Q2           -0.050194415                     NA
    ## 1985 Q3            0.082247860                     NA
    ## 1985 Q4           -0.095598349                     NA
    ## 1986 Q1           -0.603802281                     NA
    ## 1986 Q2            0.226487524                     NA
    ## 1986 Q3            0.155712050                     NA
    ## 1986 Q4            0.214624238                     NA
    ## 1987 Q1            0.049609810                     NA
    ## 1987 Q2            0.077535847                     NA
    ## 1987 Q3           -0.034499754                     NA
    ## 1987 Q4           -0.147524247                     NA
    ## 1988 Q1            0.022754491                     NA
    ## 1988 Q2           -0.112412178                     NA
    ## 1988 Q3           -0.118073879                     NA
    ## 1988 Q4            0.289454001                     NA
    ## 1989 Q1            0.171113689                     NA
    ## 1989 Q2            0.003962358                     NA
    ## 1989 Q3           -0.006906759                     NA
    ## 1989 Q4            0.083954297                     NA
    ## 1990 Q1           -0.070577452                     NA
    ## 1990 Q2           -0.158284024                     NA
    ## 1990 Q3            1.314586995                     NA
    ## 1990 Q4           -0.280182232                     NA
    ## 1991 Q1           -0.309774965                     NA
    ## 1991 Q2            0.047376465                     NA
    ## 1991 Q3            0.081225681                     NA
    ## 1991 Q4           -0.139901035                     NA
    ## 1992 Q1            0.016736402                     NA
    ## 1992 Q2            0.111111111                     NA
    ## 1992 Q3            0.005092593                     NA
    ## 1992 Q4           -0.101796407                     NA
    ## 1993 Q1            0.048205128                     NA
    ## 1993 Q2           -0.077788650                     NA
    ## 1993 Q3           -0.003183024                     NA
    ## 1993 Q4           -0.245875466                     NA
    ## 1994 Q1            0.043754411                     NA
    ## 1994 Q2            0.309668695                     NA
    ## 1994 Q3           -0.050593702                     NA
    ## 1994 Q4           -0.034257749                     NA
    ## 1995 Q1            0.079391892                     NA
    ## 1995 Q2           -0.092331768                     NA
    ## 1995 Q3            0.008045977                     NA
    ## 1995 Q4            0.114595211                     NA
    ## 1996 Q1            0.098209719                     NA
    ## 1996 Q2           -0.025617140                     NA
    ## 1996 Q3            0.165391969                     NA
    ## 1996 Q4            0.063166530                     NA
    ## 1997 Q1           -0.212577160                     NA
    ## 1997 Q2           -0.029887310                     NA
    ## 1997 Q3            0.069696970                     NA
    ## 1997 Q4           -0.167138810                     NA
    ## 1998 Q1           -0.115079365                     NA
    ## 1998 Q2           -0.091607944                     NA
    ## 1998 Q3            0.138222849                     NA
    ## 1998 Q4           -0.253407683                     NA
    ## 1999 Q1            0.390871369                     NA
    ## 1999 Q2            0.150954654                     NA
    ## 1999 Q3            0.270606532                     NA
    ## 1999 Q4            0.044471644                     NA
    ## 2000 Q1            0.050781250                     NA
    ## 2000 Q2            0.208178439                     NA
    ## 2000 Q3           -0.051076923                     NA
    ## 2000 Q4           -0.130998703                     NA
    ## 2001 Q1           -0.019029851                     NA
    ## 2001 Q2           -0.001521491                     NA
    ## 2001 Q3           -0.107428571                     NA
    ## 2001 Q4           -0.153222364                     NA
    ## 2002 Q1            0.326108871                     NA
    ## 2002 Q2            0.020904599                     NA
    ## 2002 Q3            0.133655994                     NA
    ## 2002 Q4            0.024630542                     NA
    ## 2003 Q1           -0.005128205                     NA
    ## 2003 Q2           -0.027384021                     NA
    ## 2003 Q3           -0.032792315                     NA
    ## 2003 Q4            0.113698630                     NA
    ## 2004 Q1            0.099630996                     NA
    ## 2004 Q2            0.036073826                     NA
    ## 2004 Q3            0.339811066                     NA
    ## 2004 Q4           -0.124697824                     NA
    ## 2005 Q1            0.275028769                     NA
    ## 2005 Q2            0.019855596                     NA
    ## 2005 Q3            0.172389381                     NA
    ## 2005 Q4           -0.078502415                     NA
    ## 2006 Q1            0.091579292                     NA
    ## 2006 Q2            0.109560258                     NA
    ## 2006 Q3           -0.149059922                     NA
    ## 2006 Q4           -0.029566047                     NA
    ## 2007 Q1            0.078951679                     NA
    ## 2007 Q2            0.073022620                     NA
    ## 2007 Q3            0.155348048                     NA
    ## 2007 Q4            0.175361254                     NA
    ## 2008 Q1            0.058345489                     NA
    ## 2008 Q2            0.378224060                     NA
    ## 2008 Q3           -0.281142857                     NA
    ## 2008 Q4           -0.556836248                     NA
    ## 2009 Q1            0.113452915                     NA
    ## 2009 Q2            0.407370117                     NA
    ## 2009 Q3            0.010301903                     NA
    ## 2009 Q4            0.123920125                     NA
    ## 2010 Q1            0.055443548                     NA
    ## 2010 Q2           -0.097063037                     NA
    ## 2010 Q3            0.057384636                     NA
    ## 2010 Q4            0.142678504                     NA
    ## 2011 Q1            0.167870431                     NA
    ## 2011 Q2           -0.105884558                     NA
    ## 2011 Q3           -0.169985328                     NA
    ## 2011 Q4            0.247853535                     NA
    ## 2012 Q1            0.042396034            0.144347178
    ## 2012 Q2           -0.175305766           -0.204101562
    ## 2012 Q3            0.085098870            0.149182004
    ## 2012 Q4           -0.004013450           -0.011388914
    ## 2013 Q1            0.058919625           -0.009810098
    ## 2013 Q2           -0.006890877           -0.071441556
    ## 2013 Q3            0.059755592            0.060787001
    ## 2013 Q4           -0.038209714            0.022423180
    ## 2014 Q1            0.032107295           -0.027436823
    ## 2014 Q2            0.037310494            0.042687454
    ## 2014 Q3           -0.134858119           -0.157440370
    ## 2014 Q4           -0.415642826           -0.394422732
    ## 2015 Q1           -0.106438896           -0.038723182
    ## 2015 Q2            0.249369748            0.153874070
    ## 2015 Q3           -0.241802590           -0.239345809
    ## 2015 Q4           -0.178531825           -0.229274344
    ## 2016 Q1            0.035097192            0.062231760
    ## 2016 Q2            0.260563380            0.254545455
    ## 2016 Q3           -0.001862197           -0.012479871
    ## 2016 Q4            0.113598673            0.144109254
    ## 2017 Q1           -0.058078928           -0.058792090
    ## 2017 Q2           -0.090118577           -0.092939618
    ## 2017 Q3            0.122284970            0.200751252
    ## 2017 Q4            0.169343913            0.157455683
    ## 2018 Q1            0.074809666            0.055105105
    ## 2018 Q2            0.141823221            0.130496656
    ## 2018 Q3           -0.012137559            0.041289023
    ## 2018 Q4           -0.380068259           -0.367988395
    ## 2019 Q1            0.324377890            0.308148432
    ## 2019 Q2           -0.027768540           -0.026904518
    ## 2019 Q3           -0.075252266           -0.086701728
    ## 2019 Q4            0.129276863            0.125041132
    ## 2020 Q1           -0.664592204           -0.667446622
    ## 2020 Q2            0.917480469            0.809586631
    ## 2020 Q3            0.024191495           -0.004860267
    ## 2020 Q4            0.206364993            0.264957265
    ## 2021 Q1            0.219291014            0.226640927
    ## 2021 Q2            0.233603786            0.136606862

    Quick Quandling! Quandl() simplifies aggregating and transforming price data.

    2.2.2 Extract the Close column from many instruments

    The previous exercise taught you how to use do.call(merge, eapply(env, fun)) to apply a function to each object in an environment and then combine all the results into one object.

    Let’s use what you learned to solve a very common problem. Often you will need to load similar data for many instruments, extract a column, and create one object that contains that specific column for every instrument.

  • Use new.env() to create a new environment named data_env.
  • # Symbols
    symbols <- c("AAPL", "MSFT", "IBM")
    
    # Create new environment
    data_env <- new.env()
  • Use getSymbols() to load data into data_env by using the env argument.
  • # Load symbols into data_env
    getSymbols(symbols, env = data_env)
    ## [1] "AAPL" "MSFT" "IBM"
  • Finish the commands that have been started for you. Fill in the blanks with the appropriate functions.
  • # Extract the close column from each object and combine into one xts object
    close_data <- do.call(merge, eapply(data_env, Cl))
  • View the first few rows of close_data.
  • # View the head of close_data
    head(close_data)
    ##            AAPL.Close IBM.Close MSFT.Close
    ## 2007-01-03   2.992857  92.99236      29.86
    ## 2007-01-04   3.059286  93.98662      29.81
    ## 2007-01-05   3.037500  93.13576      29.64
    ## 2007-01-08   3.052500  94.55067      29.93
    ## 2007-01-09   3.306071  95.66921      29.96
    ## 2007-01-10   3.464286  94.54111      29.66

    Great job!

    3 Managing data from multiple sources

    Learn how to simplify and streamline your workflow by taking advantage of the ability to customize default arguments to getSymbols(). You will see how to customize defaults by data source, and then how to customize defaults by symbol. You will also learn how to handle problematic instrument symbols.

    3.1 Setting default arguments for getSymbols()

    3.1.1 Set a default data source

    Recall that getSymbols() imports from Yahoo Finance by default. This exercise will teach you how to change the default data source with the setDefaults() function.

    The first argument to setDefaults() is the function you want to update, and the remaining arguments are name = value pairs of the arguments you want to update and the new default value.

    Note that this only works with getSymbols() because getSymbols() actively checks to see if you want to use a different default value.

  • Set the src argument to pull from Alpha Vantage instead of Yahoo by default.
  • # Set the default to pull data from Alpha Vantage
    setDefaults(getSymbols, src = "av")
  • Use getSymbols() to pull data for “GOOG”.
  • # Load GOOG data
    getSymbols("GOOG", api.key = "4EQRAZ3VA8ERX4BV")
    ## [1] "GOOG"
  • Use str() to verify that src attribute on GOOG is “alphavantage”.
  • # Verify the data was actually pulled from Alpha Vantage
    str(GOOG)
    ## An 'xts' object on 2022-06-01/2022-10-21 containing:
    ##   Data: num [1:100, 1:5] 2299 2284 2320 2335 2313 ...
    ##  - attr(*, "dimnames")=List of 2
    ##   ..$ : NULL
    ##   ..$ : chr [1:5] "GOOG.Open" "GOOG.High" "GOOG.Low" "GOOG.Close" ...
    ##   Indexed by objects of class: [Date] TZ: UTC
    ##   xts Attributes:  
    ## List of 2
    ##  $ src    : chr "alphavantage"
    ##  $ updated: Date[1:1], format: "2022-10-21"

    Great! Setting a default source can be useful if you use that source often.

    3.1.2 Set default arguments for a getSymbols source

    You can also use setDefaults() on individual getSymbols() source methods. This exercise will teach you how to change the default value for the from argument to getSymbols.yahoo().

    You can find the arguments for a specific method by using help() (e.g. help(“getSymbols.yahoo”) or by calling args() to print them to the console (e.g. args(getSymbols.yahoo)). Calling getDefaults() will show you the current default values (if there are any).

    Remember, you are not supposed to call getSymbols.yahoo() directly!

  • Use the args() function to print all the default arguments to getSymbols.yahoo().
  • # Look at getSymbols.yahoo arguments
    args(getSymbols.yahoo)
    ## function (Symbols, env, return.class = "xts", index.class = "Date", 
    ##     from = "2007-01-01", to = Sys.Date(), ..., periodicity = "daily", 
    ##     curl.options = list()) 
    ## NULL
  • Set the default of from for getSymbols.yahoo() to “2000-01-01”.
  • # Set default 'from' value for getSymbols.yahoo
    setDefaults(getSymbols.yahoo, from = "2000-01-01")
  • Use getDefaults() to confirm that the from default was set correctly.
  • # Confirm defaults were set correctly
    getDefaults("getSymbols.yahoo")
    ## $from
    ## [1] "'2000-01-01'"

    Awesome! Now you know how to set default arguments for specific getSymbols() sources.

    3.2 Setting per-instrument default arguments

    3.2.1 Set default data source for one symbol

    Changing the default source for one instrument is useful if multiple sources use the same symbol for different instruments. For example, getSymbols(“CP”, src = “yahoo”) would load Canadian Pacific Railway data from the New York Stock Exchange. But getSymbols(“CP”, src = “FRED”) would load Corporate Profits After Tax from the U.S. Bureau of Economic Analysis.

    You can use setSymbolLookup() to specify the default data source for an instrument. In this exercise, you will learn how to make getSymbols(“CP”) load the corporate profit data from FRED instead of the railway stock data from Yahoo Finance.

    setSymbolLookup() can take any number of name = value pairs, where name is the symbol and value is a named list of getSymbols() arguments for that one symbol.

    # edited by cliex159
    getSymbols("CP", , api.key = "4EQRAZ3VA8ERX4BV")
    ## [1] "CP"
  • Use head() to look at the first few rows of data for CP, which has been loaded for you.
  • # Look at the first few rows of CP
    head(CP)
    ##            CP.Open CP.High CP.Low CP.Close CP.Volume
    ## 2022-06-01   71.91  73.620 71.410    73.05   2949267
    ## 2022-06-02   73.17  76.550 73.170    76.36   3651456
    ## 2022-06-03   75.73  76.080 74.685    75.18   2178119
    ## 2022-06-06   75.99  76.460 75.420    75.69   1993966
    ## 2022-06-07   74.87  76.475 74.240    76.41   1932792
    ## 2022-06-08   75.78  75.820 73.950    74.30   1812080
  • Complete the call to setSymbolLookup() to use FRED as the source for CP.
  • # Set the source for CP to FRED
    setSymbolLookup(CP = list(src = "FRED"))
  • Use getSymbols() to load the data again.
  • # Load CP data again
    getSymbols("CP", api.key = "4EQRAZ3VA8ERX4BV")
    ## [1] "CP"
  • Look at the first few rows of CP and compare the output to the output from the first instruction.
  • # Look at the first few rows of CP
    head(CP)
    ##            CP.Open CP.High CP.Low CP.Close CP.Volume
    ## 2022-06-01   71.91  73.620 71.410    73.05   2949267
    ## 2022-06-02   73.17  76.550 73.170    76.36   3651456
    ## 2022-06-03   75.73  76.080 74.685    75.18   2178119
    ## 2022-06-06   75.99  76.460 75.420    75.69   1993966
    ## 2022-06-07   74.87  76.475 74.240    76.41   1932792
    ## 2022-06-08   75.78  75.820 73.950    74.30   1812080

    Great! Occasionally this happens, and it is useful to set a single symbol to be pulled from a specific source.

    3.2.2 Save and load symbol lookup table

    The previous exercise taught you how to set default arguments on a per-symbol basis, but those settings only last for the current session.

    This exercise will teach you how to save and load symbol-based defaults by using saveSymbolLookup() and loadSymbolLookup(), respectively. You can use the file arguments to specify a file to store your defaults.

    You can also use the getSymbolLookup() function to check per-symbol defaults before you try to load data using getSymbols().

    The source for CP has already been set to “FRED”.

  • Use saveSymbolLookup() to save the symbol lookup table to a file named “my_symbol_lookup.rda”.
  • # Save symbol lookup table to file
    saveSymbolLookup(file = "my_symbol_lookup.rda")
  • Set the source for CP to “yahoo” using what you learned in the prior exercise.
  • # Set default source for CP to "yahoo"
    setSymbolLookup(CP = "yahoo")
  • Use getSymbolLookup() to verify that the default source for CP actually changed from “FRED” to “yahoo”.
  • # Verify the default source is "yahoo" for "CP"
    getSymbolLookup("CP")
    ## $CP
    ## $CP$src
    ## [1] "yahoo"
  • Now use loadSymbolLookup() to load the saved symbol lookup table.
  • # Load symbol lookup table from file
    loadSymbolLookup(file = "my_symbol_lookup.rda")

    Awesome! This will let you load the same lookup table even if you close out of R.

    3.2.3 How not to specify the getSymbols() source

    There are several ways you can specify the data source when you use getSymbols(). Which of the following is a way you should not use getSymbols() to pull data from a specific source?

    • Use the src argument.
    • Use setSymbolLookup().
    • Call getSymbols.source directly.

    Correct! The methods of getSymbols() should not be called directly.

    3.3 Handling instrument symbols that clash or are not valid R names

    3.3.1 Access the object using get() or backticks

    At some point, you might download data for an instrument that does not have a syntactically valid name. Syntactically valid names contain letters, numbers, “.”, and “_“, and must start with a letter or a”.” followed by a non-number.

    For example, the symbol for Berkshire Hathaway class A shares is “BRK-A”, which is not a syntactically valid name because it contains a “-” character. Another example are Chinese stocks, which have symbols composed of numbers. The Yahoo Finance symbol for the SSE Composite Index is “000001.SS”.

    You can use the get function or backticks (`) to access objects that do not have syntactically valid names.

  • Use getSymbols() to load data for “BRK-A”.
  • # Load BRK-A data
    getSymbols("BRK-A", api.key = "4EQRAZ3VA8ERX4BV")
    ## [1] "BRK-A"
  • Use backticks (`) and head() to look at the first few rows of data.
  • # Use backticks (`) and head() to look at the loaded data
    head(`BRK-A`)
    ##            BRK-A.Open BRK-A.High BRK-A.Low BRK-A.Close BRK-A.Volume
    ## 2022-06-01   476594.3   476594.3  464900.0    469620.0         2143
    ## 2022-06-02   472134.3   472886.0  463312.3    472710.0         2056
    ## 2022-06-03   469925.1   471789.2  465850.0    467650.0         1949
    ## 2022-06-06   472899.5   473900.0  467632.3    468400.0         2003
    ## 2022-06-07   467649.5   471499.9  464551.0    471499.9         1808
    ## 2022-06-08   472089.5   472089.5  462300.5    463300.0         2059
  • Use get() to assign the data loaded by getSymbols() to an object named BRK.A.
  • # Use get() to assign the BRK-A data to an object named BRK.A
    BRK.A <- get("BRK-A")

    Great! Just remember to use backticks or get() if you ever run into invalid characters.

    3.3.2 Create valid name for one instrument

    If you are only downloading data for a single symbol and that symbol is not a syntactically valid name, you can set auto.assign = FALSE in your call to getSymbols(). That will allow you to directly assign the output to a syntactically valid name.

    You may also want to convert the column names to syntactically valid names. That is a good idea if you plan to use the data in functions that expect column names to be syntactically valid names (e.g. lm()).

  • Use getSymbols() and its auto.assign argument to create an object named BRK.A containing data for “BRK-A”.
  • # Create BRK.A object
    setDefaults(getSymbols, src = "yahoo")
    BRK.A <- getSymbols("BRK-A", auto.assign = FALSE)
    
    setDefaults(getSymbols, src = "av")
  • Use colnames() to create an object named col_names that contains the column names for the BRK.A object.
  • # Create col_names object from the colnames() of BRK.A
    col_names <- colnames(BRK.A)
  • Use make.names() to convert col_names to syntactically valid names.
  • # Convert col_names into syntactically valid names
    colnames(BRK.A) <- make.names(col_names)

    Fantastic! Now you can fix tricky ticker symbols in the column names of your data.

    3.3.3 Create valid names for multiple instruments

    An earlier exercise taught you how to use setSymbolLookup() to set a default data source for getSymbols(). You can also use setSymbolLookup() to create a mapping between the instrument symbol and the name of the R object.

    This is useful if you want to download data for a lot symbols that are not syntactically valid names, or symbols that have names that conflict with other R variable names.

    An example of a name that conflicts is the symbol for AT&T’s stock, T, which is often used as a short form for the logical value TRUE.

    To change the name of a given symbol, arguments must be passed to setSymbolLookup() as a list, like so: setSymbolLookup(NEW_NAME = list(name = “OLD_NAME”)).

  • Set the name for “BRK-A” to “BRK.A”.
  • # Set name for BRK-A to BRK.A
    setSymbolLookup(BRK.A = list(name = "BRK-A"))
  • Set the name for “T” (AT&T) to “ATT”.
  • # Set name for T to ATT
    setSymbolLookup(ATT = list(name = "T"))
  • Use one getSymbols() call to load data for both instruments using the new names you assigned to them.
  • # Load BRK.A and ATT data
    getSymbols(c("BRK.A", "ATT"), api.key = "4EQRAZ3VA8ERX4BV")
    ## NULL

    Awesome job! Now you can map troublesome tickers to new names with setSymbolLookup().

    4 Aligning data with different periodicities

    You’ve learned how to import, extract, and transform data from multiple data sources. You often have to manipulate data from different sources in order to combine them into a single data set. First, you will learn how to convert sparse, irregular data into a regular series. Then you will review how to aggregate dense data to a lower frequency. Finally, you will learn how to handle issues with intra-day data.

    4.1 Making irregular data regular

    4.1.1 Create a zero-width and regular xts object

    In order to create regular data from an irregular dataset, the first thing you need is a regular sequence of date-times that span the dates of your irregular dataset. A “regular” sequence of date-times has equally-spaced time points.

    In this exercise, you will use the irregular_xts object to create a zero-width xts object that has a regular daily index. A zero-width xts object has an index of date-times, but no data columns.

    # edited by cliex159
    library(xts)
    irregular_xts = xts(c(4,21,1,34),order.by = c(as.Date(2016-01-02),
    as.Date(2016-01-05),
    as.Date(2016-01-07),
    as.Date(2016-01-11)))
    colnames(irregular_xts) = "data"
  • Use the start() function to create an object named start_date.
  • # Extract the start date of the series
    start_date <- start(irregular_xts)
  • Use the end() function to create an object named end_date.
  • # Extract the end date of the series
    end_date <- end(irregular_xts)
  • Use the seq() function to create an object named regular_index containing a sequence of date-times from start_date to end_date by day.
  • # Create a regular date sequence
    regular_index <- seq(start_date, end_date, by = "day")
  • Use the xts() constructor to create a zero-width xts object. Store it in regular_xts.
  • # Create a zero-width xts object
    regular_xts <- xts(, order.by = regular_index)

    Fantastic! Making regular date-time sequences is useful in many time-series applications.

    4.1.2 Use merge to make an irregular index regular

    The previous exercise taught you how to make a zero-width xts object with a regular time index. You can use the zero-width object to regularize an irregular xts object.

    The regularized series usually has missing values (NA) because the irregular data does not have a value for all observations in the regular index. This exercise will teach you how to handle these missing values when you merge() the two series.

    The irregular_xts and regular_xts objects from the previous exercise are available in your workspace.

  • Use the merge() function to combine irregular_xts and regular_xts into an object named merged_xts.
  • # Merge irregular_xts and regular_xts
    merged_xts <- merge(irregular_xts, regular_xts)
  • Use head() to look at the first few rows of merged_xts.
  • # Look at the first few rows of merged_xts
    head(merged_xts)
    ##            data
    ## 1975-06-28   34
    ## 1975-06-29   NA
    ## 1975-06-30   NA
    ## 1975-07-01   NA
    ## 1975-07-02    1
    ## 1975-07-03   NA
  • Create an object named merged_filled_xts by using the fill argument to merge() to replace the NA with the last observation, carried forward (na.locf).
  • # Use the fill argument to fill NA with their previous value
    merged_filled_xts <- merge(irregular_xts, regular_xts, fill = na.locf)
  • Use head to look at the first few rows of merged_filled_xts.
  • # Look at the first few rows of merged_filled_xts
    head(merged_filled_xts)
    ##            data
    ## 1975-06-28   34
    ## 1975-06-29   34
    ## 1975-06-30   34
    ## 1975-07-01   34
    ## 1975-07-02    1
    ## 1975-07-03    1

    Awesome! Filling forward is a useful operation, but be careful to make sure it is what you want!

    4.2 Aggregating to lower frequency

    4.2.1 Aggregate daily data and merge with monthly data

    Sometimes two series have the same periodicity, but represent timestamps differently. For example, monthly series may be timestamped with the first or last date of the month. This difference creates many NA when series are merged. The yearmon class from the zoo package helps solve this problem.

    In this exercise, you will aggregate the FRED daily Fed Funds rate (DFF) to a monthly periodicity and merge it with the FRED monthly Fed Funds rate (FEDFUNDS).The DFF aggregate will be timestamped with the last day of the month, while FEDFUNDS is timestamped with the first day of the month.

    The FEDFUNDS and DFF data have already been downloaded from FRED for you, using getSymbols(c(“FEDFUNDS”, “DFF”), src = “FRED”).

    # edited by cliex159
    getSymbols(c("FEDFUNDS", "DFF"), src = "FRED")
    ## [1] "FEDFUNDS" "DFF"
  • Use apply.monthly() with mean() to calculate the average of all days for each month. Assign the result to monthly_fedfunds.
  • # Aggregate DFF to monthly averages
    monthly_fedfunds <- apply.monthly(DFF, mean)
  • Finish the command to use as.yearmon() to convert the index to yearmon.
  • # Convert index to yearmon
    index(monthly_fedfunds) <- as.yearmon(index(monthly_fedfunds))
  • Create an object named merged_fedfunds by merging FEDFUNDS with the monthly aggregate you created in the first step.
  • # Merge FEDFUNDS with the monthly aggregate
    merged_fedfunds <- merge(FEDFUNDS, monthly_fedfunds)
  • Use head() to check the output of merged_fedfunds.
  • # Look at the first few rows of the merged object
    head(merged_fedfunds)
    ##            FEDFUNDS       DFF
    ## 1954-07-01     0.80 0.7993548
    ## 1954-08-01     1.22 1.2206452
    ## 1954-09-01     1.07 1.0666667
    ## 1954-10-01     0.85 0.8487097
    ## 1954-11-01     0.83 0.8336667
    ## 1954-12-01     1.28 1.2777419

    Awesome! You will often need to aggregate to a lower frequency to align multiple time series.

    4.2.2 Align series to first and last day of month

    Sometimes you may not be able to use convenience classes like yearmon to represent timestamps. This exercise will teach you how to manually align merged data to the timestamp representation you prefer.

    First you merge the lower-frequency data with the aggregate data, then use na.locf() to fill the NA forward (or backward, using fromLast = TRUE). Then you can subset the result using the index of the object with the representation you prefer.

    Your workspace contains FEDFUNDS, monthly_fedfunds (the result of apply.monthly(DFF, mean)), and merged_fedfunds (the result of merge(FEDFUNDS, monthly_fedfunds) where the monthly_fedfunds index is a Date). Note the NA values in merged_fedfunds.

  • Use na.locf() to fill the NA values in merged_fedfunds. Assign the result to merged_fedfunds_locf.
  • # Fill NA forward
    merged_fedfunds_locf <- na.locf(merged_fedfunds)
  • Subset merged_fedfunds_locf by index(monthly_fedfunds) to create an xts object with timestamps at month-end. Name the result aligned_last_day.
  • # Extract index values containing last day of month
    aligned_last_day <- merged_fedfunds_locf[index(monthly_fedfunds)]
  • Use the fromLast argument to na.locf() to fill the NA values with the next observation. Assign the result to merged_fedfunds_locb.
  • # Fill NA backward
    merged_fedfunds_locb <- na.locf(merged_fedfunds, fromLast = TRUE)
  • Subset merged_fedfunds_locb by index(FEDFUNDS) to create an xts object with timestamps on the first of the month. Name the result aligned_first_day.
  • # Extract index values containing first day of month
    aligned_first_day <- merged_fedfunds_locb[index(FEDFUNDS)]

    Great! Knowing how to manually align merged data will definitely come in handy!

    4.2.3 Aggregate to weekly, ending on Wednesdays

    In this exercise, you will learn a general aggregation technique to aggregate daily data to weekly, but with weeks ending on Wednesdays. This is often done in stock market research to avoid intra-week seasonality.

    The period.apply() function takes an xts object, time period end points, and an aggregation function. Then it applies the function to each group of data between the end points.

    The endpoints() function can calculate the time period end points for period.apply(), and you can use custom end points too. But your custom end points vector must start with zero and end with the total number of observations you’re going to aggregate, just like the output of endpoints().

    You’ll use .indexwday() to find the Wednesday of each week. It returns a number between 0-6, where Sunday=0.

    This exercise will use the daily Fed Funds data (DFF) from prior exercises.

  • Use .indexwday() to get the week days from the DFF index. Assign the result to index_weekdays.
  • # Extract index weekdays (Sunday = 0)
    index_weekdays <- .indexwday(DFF)
  • Use the which() function to find the locations of the Wednesdays in index_weekdays. Store the result in wednesdays.
  • # Find locations of Wednesdays
    wednesdays <- which(index_weekdays == 3)
  • Complete the command to make the end_points start with 0 and end with the total number of rows, like the output of endpoints().
  • # Create custom end points
    end_points <- c(0, wednesdays, nrow(DFF))
  • Use period.apply() and end_points to aggregate DFF to weekly averages. Assign the result to weekly_mean.
  • # Calculate weekly mean using custom end points
    weekly_mean <- period.apply(DFF, end_points, mean)

    Wonderful! There are many ways to convert a time series to a lower frequency.

    4.3 Aggregating and combining intraday data

    4.3.1 Combine data that have timezones

    Recall that xts objects store the time index as seconds since midnight, 1970-01-01 in the UTC timezone. merge() uses this underlying index and returns a result with the first object’s timezone.

    This exercise provides an example. The two objects in your workspace are identical except for the index timezone. The index values are the same instances in time, but measured in different locations. The london object’s timezone is Europe/London and the chicago object’s timezone is America/Chicago.

    # edited by cliex159
    library(lubridate)
    london = xts(c(1,2,3,4,5), order.by = ymd_hms("2010-01-09 01:00:00","2010-01-09 04:00:00","2010-01-09 06:00:00","2010-01-09 14:00:00","2010-01-10 12:00:00"), tz = "GMT")
    colnames(london) = "London"
    
    chicago = xts(c(1,2,3,4,5), order.by = ymd_hms("2010-01-08 19:00:00", "2010-01-08 22:00:00", "2010-01-09 00:00:00", "2010-01-09 08:00:00", "2010-01-10 06:00:00"), tz = "CST")
    colnames(chicago) = "Chicago"
  • Fill in the object names so the result of the merge() will have a London timezone.
  • # Create merged object with a London timezone
    tz_london <- merge(london, chicago)
  • Use str() to look at the structure of tz_london and note the TZ and the local time range printed in the first line of output.
  • # Look at tz_london structure
    str(tz_london)
    ## An 'xts' object on 2010-01-08 19:00:00/2010-01-10 12:00:00 containing:
    ##   Data: num [1:10, 1:2] NA NA NA 1 2 3 NA 4 NA 5 ...
    ##  - attr(*, "dimnames")=List of 2
    ##   ..$ : NULL
    ##   ..$ : chr [1:2] "London" "Chicago"
    ##   Indexed by objects of class: [POSIXct,POSIXt] TZ: GMT
    ##   xts Attributes:  
    ##  NULL
  • Now complete the merge() command so the result will have a Chicago timezone.
  • # Create merged object with a Chicago timezone
    tz_chicago <- merge(chicago, london)
  • Use str() to look at the structure of tz_chicago and note how the TZ and local time range have changed.
  • # Look at tz_chicago structure
    str(tz_chicago)
    ## An 'xts' object on 2010-01-08 19:00:00/2010-01-10 12:00:00 containing:
    ##   Data: num [1:10, 1:2] 1 2 3 NA NA NA 4 NA 5 NA ...
    ##  - attr(*, "dimnames")=List of 2
    ##   ..$ : NULL
    ##   ..$ : chr [1:2] "Chicago" "London"
    ##   Indexed by objects of class: [POSIXct,POSIXt] TZ: CST
    ##   xts Attributes:  
    ##  NULL

    Great job! Combining multiple timezones is easy with xts!

    4.3.2 Make irregular intra-day data regular

    Earlier you learned how to create a regular daily series from irregular daily data. Now you will create regular intra-day data from an irregular series.

    Intra-day financial data often does not span a full 24 hours. Most markets are usually closed part of the day. This exercise assumes markets open at 9AM and close at 4PM Monday-Friday.

    Your data may not have an observation exactly at the market open and/or close. So, you would not be able to use start() and end() as you could with the daily data. You need to specify the start and end date-times to create this regular sequence.

    The regular date-time sequence will include periods when markets are closed, but you can use xts’ time-of-day subsetting to extract only the periods the market is open.

    # edited by cliex159
    irregular_xts = xts(seq(1,20,1),ymd_hms(
    "2010-01-04 09:00:00",
    "2010-01-04 11:00:00",
    "2010-01-04 18:00:00",
    "2010-01-04 22:00:00",
    "2010-01-05 00:00:00",
    "2010-01-05 05:00:00",
    "2010-01-05 13:00:00",
    "2010-01-05 17:00:00",
    "2010-01-05 22:00:00",
    "2010-01-06 02:00:00",
    "2010-01-06 06:00:00",
    "2010-01-06 07:00:00",
    "2010-01-06 17:00:00",
    "2010-01-06 22:00:00",
    "2010-01-06 23:00:00",
    "2010-01-07 02:00:00",
    "2010-01-07 03:00:00",
    "2010-01-07 08:00:00",
    "2010-01-08 00:00:00",
    "2010-01-08 06:00:00"), tz="UTC")
    colnames(irregular_xts) = "data"
  • Finish the command to create a regular 30-minute date-time sequence between 09:00 Monday and 16:00 Friday.
  • # Create a regular date-time sequence
    regular_index <- seq(as.POSIXct("2010-01-04 09:00"), as.POSIXct("2010-01-08 16:00"), by = "30 min")
  • Set the values for x and order.by to create a zero-width xts object.
  • # Create a zero-width xts object
    regular_xts <- xts(x = NULL, order.by = regular_index)
  • Create merged_xts by merging irregular_xts and regular_xts. Use the fill argument to replace NA with their previous value using na.locf().
  • # Merge irregular_xts and regular_xts, filling NA with their previous value
    merged_xts <- merge(regular_xts, irregular_xts, fill = na.locf)
  • Use xts’ time-of-day subsetting to extract observations between 9AM and 4PM every day. Assign the result to trade_day.
  • # Subset to trading day (09:00-16:00)
    trade_day <- merged_xts["T09:00/T16:00"]

    Great job! Now you know how to subset your intra-day data to only contain the trading day!

    4.3.3 Fill missing values by trading day

    The previous exercise carried the last observation of the prior day forward into the first observation of the following day. This exercise will show you how to fill missing values by trading day, without using the prior day’s final value.

    You will use the same split-lapply-rbind paradigm from the Introduction to xts and zoo course. For reference, the pattern is below.

    x_split <- split(x, f = "months")
    x_list <- lapply(x_split, cummax)
    x_list_rbind <- do.call(rbind, x_list)
    

    Recall that the do.call(rbind, …) syntax allows you to pass a list of objects to rbind() instead of having to type all their names.

    Your workspace has a trade_day object that contains the regular series from the previous exercise, but without any NA filled in.

    # edited by cliex159
    trade_day = xts(c(1,rep(NA,3),2,rep(NA,18),7,rep(NA,51)),index(trade_day))
  • Create a daily_list object by using split() to put the trade_day data into a list of data for each day.
  • # Split trade_day into days
    daily_list <- split(trade_day , f = "days")
  • Now use lapply() to fill the NA for each day’s data in the daily_list list.
  • # Use lapply to call na.locf for each day in daily_list
    daily_filled <- lapply(daily_list, FUN = na.locf)
  • Finally, use do.call() and rbind() to convert daily_filled to a single xts object named filled_by_trade_day.
  • # Use do.call to rbind the results
    filled_by_trade_day <- do.call(rbind, daily_filled)

    You’re amazing! You used advanced functions to transform data for each trading day!

    4.3.4 Aggregate irregular intra-day data

    Intraday data can be huge, with hundreds of thousands of observations per day, millions per month, and hundreds of millions per year. These datasets often need to be aggregated before you can work with them.

    You learned how to aggregate daily data in the Introduction to xts and zoo course. This exercise will use to.period() to aggregate intraday data to an OHLC series. You often need to specify both period and k arguments to aggregate intraday data.

    The intraday_xts object contains one trading day of random data.

    # edited by cliex159
    load("DC.RData")
    intraday_xts <- DC[,1]
  • Use to.period() to convert intraday_xts into a 5-second price series called xts_5sec.
  • # Convert raw prices to 5-second prices
    xts_5sec <- to.period(intraday_xts, period = "seconds", k = 5)
  • Use to.period() to convert intraday_xts into a 10-minute price series called xts_10min.
  • # Convert raw prices to 10-minute prices
    xts_10min <- to.period(intraday_xts, period = "minutes", k = 10)
  • Use to.period() to convert intraday_xts into a 1-hour price series called xts_1hour.
  • # Convert raw prices to 1-hour prices
    xts_1hour <- to.period(intraday_xts, period = "hours", k = 1)

    Fantastic! You just aggregated a large amount of data like a pro!

    5 Importing text data, and adjusting for corporate actions

    You’ve learned the core workflow of importing and manipulating financial data. Now you will see how to import data from text files of various formats. Then you will learn how to check data for weirdness and handle missing values. Finally, you will learn how to adjust stock prices for splits and dividends.

    5.1 Importing text files

    5.1.1 Import well-formatted daily OHLC data

    You can use getSymbols() to import a well-formatted CSV. In this case, well-formatted means the file contains data for a single instrument with date, open, high, low, close, volume, and adjusted close columns, in that order. You might have noticed that this is the same format as getSymbols() returns when you download data from internet sources.

    getSymbols() allows you to use a directory of CSV files as a source (like Yahoo Finance and FRED). In this exercise, you will be using AMZN.csv in your working directory. It contains some randomized Amazon.com data from the first half of 2002. You can use dir() to see the file in your working directory.

  • Use getSymbols() to load AMZN.csv.
  • # Load AMZN sourced from the csv
    getSymbols("AMZN", src = "csv")
    ## [1] "AMZN"
  • Look at the structure of AMZN.
  • # Look at AMZN structure
    str(AMZN)
    ## An 'xts' object on 2002-01-02/2002-06-28 containing:
    ##   Data: num [1:124, 1:6] 11.1 11.3 12.5 11.8 12.8 ...
    ##  - attr(*, "dimnames")=List of 2
    ##   ..$ : NULL
    ##   ..$ : chr [1:6] "AMZN.Open" "AMZN.High" "AMZN.Low" "AMZN.Close" ...
    ##   Indexed by objects of class: [Date] TZ: UTC
    ##   xts Attributes:  
    ## List of 2
    ##  $ src    : chr "csv"
    ##  $ updated: POSIXct[1:1], format: "2022-10-22 13:20:43"

    Great! Well-formatted CSV files are very easy to work with.

    5.1.2 Import text files in other formats

    The previous exercise taught you how to import well-formatted CSV data using getSymbols(). Unfortunately, most data are not well-formatted.

    The zoo package provides several functions to import text files as zoo objects. The main function is read.zoo(), which wraps read.table(). The xts class extends zoo, so you can easily convert the result of read.zoo() into an xts object by using as.xts().

    In this exercise, you will learn how you can use read.zoo() to import the same AMZN.csv data from the previous exercise.

  • Use read.zoo() to import AMZN.csv. Assign the output to an object named amzn_zoo.
  • # Import the AMZN.csv file using read.zoo
    amzn_zoo <- read.zoo("AMZN.csv", sep = ",", header = TRUE)
  • Convert amzn_zoo to an xts object. Assign the output to an object named amzn_xts.
  • # Convert to xts
    amzn_xts <- as.xts(amzn_zoo)
  • Look at the first few rows of amzn_xts.
  • # Look at the first few rows of amzn_xts
    head(amzn_xts)
    ##            AMZN.Open AMZN.High AMZN.Low AMZN.Close AMZN.Volume AMZN.Adjusted
    ## 2002-01-02     11.13     11.01    10.46      10.87     6674703         10.87
    ## 2002-01-03     11.26     12.25    10.76      11.99    11441553         11.99
    ## 2002-01-04     12.46     12.62    11.71      12.10    12619402         12.10
    ## 2002-01-07     11.76     12.73    12.01      12.48    12296059         12.48
    ## 2002-01-08     12.82     12.43    11.35      11.96     6821418         11.96
    ## 2002-01-09     12.07     12.49    10.93      11.59    10669376         11.59

    Awesome! As you will see, read.zoo() is a very flexible import function for time series.

    5.1.3 Handle date and time in separate columns

    read.zoo() makes it easy to import data when the date and time are in separate columns. The index.column argument allows you to specify the name or number of the column(s) containing the index data. That’s all you need to do if the date and time are specified in the standard format (“%Y-%m-%d” for date, and “%H:%M:%S” for time).

    In this exercise, you will use the index.column argument to specify the date and time columns of the file. Your working directory has a file named UNE.csv that contains some 5-minute OHLC data for the energy company, Unron. You will still use read.csv() find the column names of the date and time columns.

  • Import the first 5 lines of UNE.csv using read.csv(). Assign the output to une_data.
  • # Read data with read.csv
    une_data <- read.csv("UNE.csv", nrows = 5)
  • Look at the structure of une_data and note the column names and locations.
  • # Look at the structure of une_data
    str(une_data)
    ## 'data.frame':    5 obs. of  6 variables:
    ##  $ Date : chr  "2016-11-14" "2016-11-14" "2016-11-14" "2016-11-14" ...
    ##  $ Time : chr  "09:05:00" "09:10:00" "09:15:00" "09:20:00" ...
    ##  $ Open : num  10.4 10.1 10.2 10.1 10.2
    ##  $ High : num  10.4 10.2 10.2 10.2 10.2
    ##  $ Low  : num  10.2 10.1 10.2 10.1 10.2
    ##  $ Close: num  10.2 10.2 10.2 10.2 10.2
  • Use read.zoo() to import UNE.csv, specifying index.columns as the names of the date and time columns. Assign the output to une_zoo.
  • # Specify Date and Time index column names
    une_zoo <- read.zoo("UNE.csv", index.column = c("Date", "Time"), sep = ",", header = TRUE)
  • Look at the first few rows of une_zoo.
  • # Look at first few rows of data
    head(une_zoo)
    ##                      Open  High   Low Close
    ## 2016-11-14 09:05:00 10.38 10.38 10.21 10.22
    ## 2016-11-14 09:10:00 10.12 10.22 10.12 10.22
    ## 2016-11-14 09:15:00 10.18 10.22 10.15 10.22
    ## 2016-11-14 09:20:00 10.09 10.22 10.07 10.22
    ## 2016-11-14 09:25:00 10.22 10.22 10.21 10.22
    ## 2016-11-14 09:30:00 10.16 10.22 10.15 10.22

    Good job! The index.column argument is great if your dates and times are in separate columns!

    5.1.4 Read text file containing multiple instruments

    The previous exercises work if each file contains only one instrument. Some software and data vendors may provide data for all instruments in one file. This exercise will teach you how to import a file that contains multiple instruments.

    Once again, you can use read.zoo(). This time you will be using its split argument, which allows you to specify the name or number of the columns(s) that contain the variables that identify unique observations.

    The two_symbols.csv file in your working directory contains bid/ask data for two instruments, where each row has one bid or ask observation for one instrument. You will use the split argument to import the data into an object that has both bid and ask prices for both instruments on one row.

  • Import the first 5 lines of two_symbols.csv using read.csv(). Assign the output to two_symbols_data.
  • # Read data with read.csv
    two_symbols_data <- read.csv("two_symbols.csv", nrows = 5)
  • Look at the structure of two_symbols_data and note the column names and locations.
  • # Look at the structure of two_symbols_data
    str(two_symbols_data)
    ## 'data.frame':    5 obs. of  4 variables:
    ##  $ Date  : chr  "2016-01-01 10:43:01" "2016-01-01 10:43:01" "2016-01-01 10:43:01" "2016-01-01 10:43:01" ...
    ##  $ Symbol: chr  "A" "A" "B" "B" ...
    ##  $ Type  : chr  "Bid" "Ask" "Bid" "Ask" ...
    ##  $ Price : num  58.2 58.2 29 29 58.2
  • Use read.zoo() to import two_symbols.csv, specifying split as the names of the symbol and type columns. Assign the output to two_symbols_zoo.
  • # Specify Symbol and Type index column names
    two_symbols_zoo <- read.zoo("two_symbols.csv", split = c("Symbol", "Type"), sep = ",", header = TRUE)
  • Look at the first few rows of two_symbols_zoo.
  • # Look at first few rows of data
    head(two_symbols_zoo)
    ##                     A.Ask B.Ask A.Bid B.Bid
    ## 2016-01-01 10:43:01 58.24 28.98 58.23 28.96
    ## 2016-01-01 10:43:02 58.25 28.99 58.24 28.97

    Great! The split argument is the key for importing multiple instruments.

    5.2 Checking for weirdness

    5.2.1 Handle missing values

    In chapter 3, you used na.locf() to fill missing values with the previous non-missing value. You can use interpolation when carrying the previous value forward isn’t appropriate. In this exercise, you will explore two interpolation methods: linear and spline.

    Linear interpolation calculates values that lie on a line between two known data points. This is a good choice for fairly linear data, like a series with a strong trend. Spline interpolation is more appropriate for series without a strong trend, because it calculates a non-linear approximation using multiple data points.

    Use these two methods to interpolate the three missing values for the 10-year Treasury rate in the object DGS10. Then compare the results with the output of na.locf().

    # edited by cliex159
    DGS10 = xts(c(4.94,4.85,4.78,4.79,4.85,NA,4.99,4.97,4.86,4.80,4.84,NA,NA,4.64,4.57,4.63),ymd("2001-08-27",
    "2001-08-28",
    "2001-08-29",
    "2001-08-30",
    "2001-08-31",
    "2001-09-03",
    "2001-09-04",
    "2001-09-05",
    "2001-09-06",
    "2001-09-07",
    "2001-09-10",
    "2001-09-11",
    "2001-09-12",
    "2001-09-13",
    "2001-09-14",
    "2001-09-17"))
  • Complete the command to use na.approx() to fill in missing values using linear interpolation.
  • # fill NA using last observation carried forward
    locf <- na.locf(DGS10)
    
    # fill NA using linear interpolation
    approx <- na.approx(DGS10)
  • Complete the command to use na.spline() to fill in missing values using spline interpolation.
  • # fill NA using spline interpolation
    spline <- na.spline(DGS10)
  • Merge locf, approx, and spline into one object named na_filled.
  • # merge into one object
    na_filled <- merge(locf, approx, spline)
  • Complete the command to plot na_filled.
  • # plot combined object
    plot(na_filled, col = c("black", "red", "green"))

    Excellent! Now you know multiple ways to interpolate missing values.

    5.2.2 Visualize imported data

    It’s important to check your imported data is reasonable. A plot is a quick and easy way to spot oddities. In this exercise, you will use the plot() function to visualize some AAPL data from Yahoo Finance.

    A stock split caused a huge price change in June 2014. Apple simultaneously increased the number of shares outstanding and decreased its stock price, leaving the company value unchanged. For example, a 2-for-1 split would double the shares outstanding, and reduce the stock price by 1/2.

    You will also use the quantmod extractor functions Cl() and Ad() to access the close and adjusted close columns, respectively. Yahoo Finance provides the split- and/or dividend-adjusted close column. You will learn more about the adjustment process in the next video.

    # edited by cliex159
    getSymbols("AAPL", src='yahoo', from = "2007-01-01", to = "2017-09-17")
    ## [1] "AAPL"

    AAPL is available in your workspace - consisting of Apple stock data (imported from Yahoo Finance) from January 2007 through September 15, 2017.

  • Look at the tail() of AAPL.
  • # Look at the last few rows of AAPL data
    tail(AAPL)
    ##            AAPL.Open AAPL.High AAPL.Low AAPL.Close AAPL.Volume AAPL.Adjusted
    ## 2017-09-08   40.2150   40.2875  39.6325    39.6575   114446000      37.63755
    ## 2017-09-11   40.1250   40.5125  39.9725    40.3750   126323200      38.31851
    ## 2017-09-12   40.6525   40.9900  39.6925    40.2150   286856000      38.16667
    ## 2017-09-13   39.9675   39.9900  39.4775    39.9125   179629600      37.87957
    ## 2017-09-14   39.7475   39.8500  39.5225    39.5700    95042800      37.55452
    ## 2017-09-15   39.6175   40.2425  39.5000    39.9700   196458400      37.93414
  • Plot the AAPL close price.
  • # Plot close price
    plot(Cl(AAPL))
  • Plot the AAPL adjusted close price.
  • # Plot adjusted close price
    plot(Ad(AAPL))

    Good job! Visualization is a great way to get a feel for what the data is like.

    5.3 Adjusting for corporate actions

    5.3.1 Adjust for stock splits and dividends

    Stock splits can create large historical price changes even though they do not change the value of the company. So, you must adjust all pre-split prices in order to calculate historical returns correctly.

    Similarly, you must adjust all pre-dividend prices. Dividends do reduce the company’s value by the amount of the dividend payment, but the investor’s return isn’t affected because they receive the offsetting dividend payment.

    In this exercise, you will learn how to use the adjustOHLC() function to adjust raw historical OHLC prices for splits and dividends, so historical returns can be calculated accurately.

    Yahoo Finance provides raw prices and a split- and dividend-adjusted close column. The output of adjustOHLC() should match Yahoo’s adjusted close column. AAPL data from Yahoo Finance is already loaded in your workspace.

    While not necessary to complete this exercise, Yahoo Finance provides an accessible example of the adjusted close calculation, if you’re interested in learning more.

  • Look at the first few rows of AAPL
  • # Look at first few rows of AAPL
    head(AAPL)
    ##            AAPL.Open AAPL.High AAPL.Low AAPL.Close AAPL.Volume AAPL.Adjusted
    ## 2007-01-03  3.081786  3.092143 2.925000   2.992857  1238319600      2.555398
    ## 2007-01-04  3.001786  3.069643 2.993571   3.059286   847260400      2.612116
    ## 2007-01-05  3.063214  3.078571 3.014286   3.037500   834741600      2.593514
    ## 2007-01-08  3.070000  3.090357 3.045714   3.052500   797106800      2.606322
    ## 2007-01-09  3.087500  3.320714 3.041071   3.306071  3349298400      2.822829
    ## 2007-01-10  3.383929  3.492857 3.337500   3.464286  2952880000      2.957918
  • Use adjustOHLC() to adjust the AAPL data. Assign the result to an object named aapl_adjusted.
  • # Adjust AAPL for splits and dividends
    aapl_adjusted <- adjustOHLC(AAPL)
  • Look at the first few rows of aapl_adjusted. Notice the close column of aapl_adjusted is the same as the adjusted close column of AAPL.
  • # Look at first few rows of aapl_adjusted
    head(aapl_adjusted)
    ##            AAPL.Open AAPL.High  AAPL.Low AAPL.Close AAPL.Volume AAPL.Adjusted
    ## 2007-01-03 0.3960780 0.3974091 0.3759275  0.3846486  1238319600      2.555398
    ## 2007-01-04 0.3857962 0.3945173 0.3847404  0.3931862   847260400      2.612116
    ## 2007-01-05 0.3936911 0.3956648 0.3874027  0.3903862   834741600      2.593514
    ## 2007-01-08 0.3945632 0.3971795 0.3914419  0.3923141   797106800      2.606322
    ## 2007-01-09 0.3968124 0.4267855 0.3908452  0.4249036  3349298400      2.822829
    ## 2007-01-10 0.4349101 0.4489097 0.4289429  0.4452377  2952880000      2.957918

    Great! The adjustOHLC() function takes care of the hard work for you!

    5.3.2 Download split and dividend data

    In the previous exercise, you used adjustOHLC() to adjust raw historical OHLC prices for splits and dividends, but it only works for OHLC data. It will not work if you only have close prices, and it does not return any of the split or dividend data it uses.

    You need the dates and values for each split and dividend to adjust a non-OHLC price series, or if you simply want to analyze the raw split and dividend data.

    You can download the split and dividend data from Yahoo Finance using the quantmod functions getSplits() and getDividends(), respectively. The historical dividend data from Yahoo Finance is adjusted for splits. If you want to download unadjusted dividend data, you need to set split.adjust = FALSE in your call to getDividends().

  • Use getSplits() to import AAPL split data into splits.
  • # Download AAPL split data
    splits <- getSplits("AAPL")
  • Use getDividends() to import split-adjusted AAPL dividend data into dividends.
  • # Download AAPL dividend data
    dividends <- getDividends("AAPL")
  • Look at the first few rows of dividends. The values are small and in fractional cents because they have been split-adjusted.
  • # Look at the first few rows of dividends
    head(dividends)
    ##                AAPL.div
    ## 1987-05-11 2.392857e-06
    ## 1987-08-10 4.785714e-06
    ## 1987-11-17 6.375000e-06
    ## 1988-02-12 6.375000e-06
    ## 1988-05-16 6.375000e-06
    ## 1988-08-15 6.375000e-06
  • Use getDividends() to import unadjusted AAPL dividend data into raw_dividends.
  • # Download AAPL dividend data that is not split-adjusted
    raw_dividends <- getDividends("AAPL", split.adjust = FALSE)
  • Look at the first few rows of raw_dividends. Note they differ from the values in dividends.
  • # Look at the first few rows of raw_dividends
    head(raw_dividends)
    ##            AAPL.div
    ## 1987-05-11 0.000536
    ## 1987-08-10 0.000536
    ## 1987-11-17 0.000714
    ## 1988-02-12 0.000714
    ## 1988-05-16 0.000714
    ## 1988-08-15 0.000714

    Good job! It’s important to get splits and dividends correct when calculating historical returns.

    5.3.3 Adjust univariate data for splits and dividends

    If you only have close prices, you can adjust them with adjRatios(). It has 3 arguments: splits, dividends, and close. It returns an xts object with split and dividend adjustment ratios in columns “Split” and “Div”, respectively.

    You need to provide split data via the splits argument to calculate the split ratio. To calculate the dividend ratio, you need to provide raw dividends and raw prices via the dividends and close arguments, respectively.

    Once you have the split and dividend adjustment ratios, you calculate the adjusted price multiplying the unadjusted price by both the split and dividend adjustment ratios.

    Your workspace contains splits, dividends, and raw_dividends from prior exercises, and AAPL data from Yahoo Finance.

  • Use adjRatios() to calculate the split and dividend adjustment ratios. Assign the output to ratios.
  • # Calculate split and dividend adjustment ratios
    ratios <- adjRatios(splits = splits, dividends = raw_dividends, close = Cl(AAPL))
  • Calculate the adjusted close for AAPL by extracting the close price and multiplying it by both adjustment ratios and assign it to aapl_adjusted.
  • # Use the Split and Div columns to calculate adjusted close for AAPL
    aapl_adjusted <- Cl(AAPL) * ratios[, "Split"] * ratios[, "Div"]
  • Look at the first few rows of the adjusted close column provided by Yahoo Finance.
  • # Look at first few rows of adjusted close for AAPL
    head(Ad(AAPL))
    ##            AAPL.Adjusted
    ## 2007-01-03      2.555398
    ## 2007-01-04      2.612116
    ## 2007-01-05      2.593514
    ## 2007-01-08      2.606322
    ## 2007-01-09      2.822829
    ## 2007-01-10      2.957918
  • Look at the first few rows of the adjusted close price you calculated, and compare that to the adjusted close from Yahoo Finance.
  • # Look at first few rows of aapl_adjusted
    head(aapl_adjusted)
    ##            AAPL.Close
    ## 2007-01-03  0.3846486
    ## 2007-01-04  0.3931862
    ## 2007-01-05  0.3903862
    ## 2007-01-08  0.3923141
    ## 2007-01-09  0.4249036
    ## 2007-01-10  0.4452377

    Amazing! You just calculated the adjusted close price! Congrats!

    5.3.4 When to adjust data

    When do you need to back-adjust all historical data for an instrument?

    • Every time you receive new price data.
    • Never.
    • Any time there’s a new split or dividend.

    Good job! Continue to the next exercise.

    5.4 Congratulations!

    That concludes this course on importing and managing financial data in R.

    Well done! You’ve learned tools that will help you wrangle many types of time-series data that you will encounter in the real world. Go forth and analyze!