• 1 R Time Series Visualization Tools
    • 1.1 Refresher on xts and the plot() function Arnaud Amsellem The R Trader
    • 1.2 Other useful visualizing functions
  • 2 Univariate Time Series
  • 3 Multivariate Time Series
  • 4 Case study: Visually selecting a stock that improves your existing portfolio

1 R Time Series Visualization Tools

1.1 Refresher on xts and the plot() function Arnaud Amsellem The R Trader

1.1.1 plot() function - basic parameters

The plot.xts() function is the most useful tool in the R time series data visualization artillery. It is fairly similar to general plotting, but its x-axis contains a time scale. You can use plot() instead of plot.xts() if the object used in the function is an xts object.

Let’s look at a few examples:

# Basic syntax
> plot(mydata)

> # Add title and double thickness of line
> plot(mydata, main = "Stock XYZ", lwd = 2)

> # Add labels for X and Y axes
> plot(mydata, xlab = "X axis", ylab = "Y axis")> 

As you can see, there are a wide variety of parameters for the function allowing endless possibilities. Note that each call of plot() creates an entirely new plot only using the parameters that are defined in that particular call.

Furthermore, to display the first few rows of a dataset mydata to your console, use head(mydata). To display only the names of the columns, use colnames(mydata). You can also select a particular column of a dataset by specifying its title after a dollar sign, like in mydata$mycolumn.

library(readr)
library(dplyr)

Attaching package: 㤼㸱dplyr㤼㸲

The following objects are masked from 㤼㸱package:xts㤼㸲:

    first, last

The following objects are masked from 㤼㸱package:stats㤼㸲:

    filter, lag

The following objects are masked from 㤼㸱package:base㤼㸲:

    intersect, setdiff, setequal, union
library(xts)
data <- read_table2("stocks_01.csv",
                    col_names = FALSE,
                    skip = 1,
                    col_types = cols(
                      X2 = col_number(),
                      X3 = col_number(),
                      X4 = col_number(),
                      X5 = col_number()
                      ))
data <- rename(data, index = X1, 
       yahoo = X2, 
       microsoft = X3,
       citigroup = X4,
       dow_chemical = X5)
head(data)
ABCDEFGHIJ0123456789
index
<date>
yahoo
<dbl>
microsoft
<dbl>
citigroup
<dbl>
dow_chemical
<dbl>
2015-01-0250.1744.3050153.4525942.48209
2015-01-0549.1343.8975951.7680341.16822
2015-01-0649.2143.2532949.9455640.50662
2015-01-0748.5943.8028450.4085740.44139
2015-01-0850.2345.0914451.1671141.44776
2015-01-0949.7244.7124450.0243741.38254
data$index <- as.Date(data$index)
data <- as.xts(data [, -1], order.by = data$index)
# Display the first few lines of the data
head(data)
           yahoo microsoft citigroup dow_chemical
2015-01-02 50.17  44.30501  53.45259     42.48209
2015-01-05 49.13  43.89759  51.76803     41.16822
2015-01-06 49.21  43.25329  49.94556     40.50662
2015-01-07 48.59  43.80284  50.40857     40.44139
2015-01-08 50.23  45.09144  51.16711     41.44776
2015-01-09 49.72  44.71244  50.02437     41.38254
# Display the column names of the data
colnames(data)
[1] "yahoo"        "microsoft"    "citigroup"    "dow_chemical"
# Plot yahoo data and add title
plot(data$yahoo, main = "yahoo")


# Replot yahoo data with labels for X and Y axes
plot(data$yahoo, main = "yahoo", xlab = "date", ylab = "price")

You can add even more customization with the plot() function using other options. As you saw in the video, the lines() function is especially helpful when you want to modify an existing plot.

Let’s look at another example:

> # Use bars instead of points and add subtitle
> plot(mydata, type = "h", sub = "Subtitle")

> # Triple thickness of line and change color to red
> lines(mydata, col = "red", lwd = 3)
# Plot the second time series and change title
plot(data[ ,2], main = "microsoft")


# Replot with same title, add subtitle, use bars
plot(data[ ,2], main = "microsoft", sub = "Daily closing price since 2015", type = "h")


# Change line color to red
lines(data[ ,2], col = "red")

1.1.2 Control graphic parameters

n R, it is also possible to tailor the window layout using the par() function.

To set up a graphical window for multiple charts with nr rows and nc columns, assign the vector c(nr, nc) to the option mfrow. To adjust the size of the margins and characters in the text, set the appropriate decimal value to to the options mex and cex, respectively. Like plot(), each call to par() only implements the parameters in that particular call.

> # Create 3x1 graphical window
> par(mfrow = c(3, 1))

> # Also reduce margin and character sizes by half
> par(mfrow = c(2, 1), mex = 0.5, cex = 0.5)
# Plot two charts on same graphical window
par(mfrow = c(2 , 1))
plot(data[,1], main = "yahoo")
plot(data[,2], main = "microsoft")


# Replot with reduced margin and character sizes
par(mfrow = c(2 , 1), mex = 0.6, cex = 0.8)

plot(data[,1], main = "yahoo")
plot(data[,2], main = "microsoft")

1.2 Other useful visualizing functions

1.2.1 Adding an extra series to an existing chart

A great way to visually compare two times series is to display them on the same chart with different scales.

Suppose you already have a plot of mydata. As you saw in the video, you can use lines(mydata2) to add a new time series mydata2 to this existing plot. If you want a scale for this time series on the right side of the plot with equally spaced tick marks, use axis(side, at), where side is an integer specifying which side of the plot the axis should be drawn on, and at is set equal to pretty(mydata2).

Finally, to distinguish these two time series, you can add a legend with the legend() function.

> # x specifies location of legend in plot
> legend(x = "bottomright",
         # legend specifies text label(s)
         legend = c("Stock X", "Stock Y"),
         # col specifies color(s)
         col = c("black", "red"),
         # lty specifies line type(s)
         lty = c(1, 1))

Since there are two time series in the plot, some options in legend() are set to a vector of length two.

# Plot the "microsoft" series
plot(data$microsoft, main = "Stock prices since 2015")


# Add the "dow_chemical" series in red
lines(data$dow_chemical, col = "red")

# Add a Y axis on the right side of the chart
axis(side = 4, at = pretty(data$dow_chemical))

# Add a legend in the bottom right corner
legend(x = "topleft",
       legend = c("microsoft", "dow_chemical"),
       col = c("black", "red"),
       lty = c(1, 1))

1.2.2 Highlighting events in a time series

You have also learned that it is possible to use the function abline() to add straight lines through an existing plot. Specifically, you can draw a horizontal line to identify a particular date by setting h to a specific Y value, and a vertical line to identify a particular level by setting v to a specific X value:

> abline(h = NULL, v = NULL, ...)

Recall that the index of an xts object are date objects, so the X values of a plot will also contain dates. In this exercise, you will use indexing as well as as.Date(“YYYY-MM-DD”) and mean() to visually compare the average of the Citigroup stock market prices to its price on January 4, 2016, after it was affected by turbulence in the Chinese stock market.

# Plot the "citigroup" time series
plot.zoo(data$citigroup, main = "Citigroup")

# Create vert_line to identify January 4th, 2016 in citigroup
vert_line <- as.Date("2016-01-04")

# Add a red vertical line using vert_line
abline(v = vert_line, col = "red")

# Create hori_line to identify average price of citigroup
hori_line <- mean(data$citigroup)

# Add a blue horizontal line using hori_line
abline(h = hori_line, col = "blue")

1.2.3 Highlighting a specific period in a time series

To highlight a specific period in a time series, you can display it in the plot in a different background color. The chart.TimeSeries() function in the PerformanceAnalytics package offers a very easy and flexible way of doing this.

Let’s examine some of the arguments of this function:

chart.TimeSeries(R, period.areas, period.color)

R is an xts, time series, or zoo object of asset returns, period.areas are shaded areas specified by a start and end date in a vector of xts date ranges like c(“1926-10/1927-11”), and period.color draws the shaded region in whichever color is specified.

library(PerformanceAnalytics)
package 㤼㸱PerformanceAnalytics㤼㸲 was built under R version 4.0.3
Attaching package: 㤼㸱PerformanceAnalytics㤼㸲

The following object is masked from 㤼㸱package:graphics㤼㸲:

    legend
# Create period to hold the 3 months of 2015
period <- c("2015-01/2015-03")

# Highlight the first three months of 2015 
chart.TimeSeries(data$citigroup, period.areas = period)


# Highlight the first three months of 2015 in light grey
chart.TimeSeries(data$citigroup, period.areas = period, period.color = "lightgrey")

1.2.4 A fancy stock chart

# Plot the microsoft series
plot(data$microsoft, main = "Dividend date and amount")


# Add the citigroup series
lines(data$citigroup, col = "orange", lwd = 2)

# Add a new y axis for the citigroup series
axis(side = 4, at = pretty(data$citigroup), col = "orange")

you will add a legend to the chart that you just created containing the name of the companies and the dates and values of the latest dividends.

Fill in the pre-written code with the following variables containing the dividend values and dates for both companies:

  • citi_div_value
  • citi_div_date
  • micro_div_value
  • micro_div_date
citi_div_value <- "$0.16"
citi_div_date <- "13 Nov. 2016"
micro_div_value <- "$0.39"
micro_div_date <- "15 Nov. 2016"

Recall that the default color of a plotted line is black, and that the values for legend, col, and lty in legend() should be set to vectors of the same length as the number of time series plotted in your chart.

# Same plot as the previous exercise
plot(data$microsoft, main = "Dividend date and amount")

lines(data$citigroup, col = "orange", lwd = 2)
axis(side = 4, at = pretty(data$citigroup), col = "orange")


# Create the two legend strings
micro <- paste0("Microsoft div. of ", micro_div_value," on ", micro_div_value)
citi <- paste0("Citigroup div. of ", citi_div_value," on ", citi_div_date)

# Create the legend in the bottom right corner
# Same plot as the previous exercise
plot(data$microsoft, main = "Dividend date and amount")

lines(data$citigroup, col = "orange", lwd = 2)
axis(side = 4, at = pretty(data$citigroup), col = "orange")

# Create the two legend strings
micro <- paste0("Microsoft div. of ", micro_div_value," on ", micro_div_date)
citi <- paste0("Citigroup div. of ", citi_div_value," on ", citi_div_date)

# Create the legend in the bottom right corner
legend(x = "bottomright", legend = c(micro, citi), col = c("black", "orange"), lty = c(1, 1))

2 Univariate Time Series

2.1 Univariate time series analysis

2.1.1 Representing a univariate time series

The very first step in the analysis of any time series is to address if the time series have the right mathematical properties to apply the standard statistical framework. If not, you must transform the time series first.

In finance, price series are often transformed to differenced data, making it a return series. In R, the ROC() (which stands for “Rate of Change”) function from the TTR package does this automatically to a price or volume series x:

ROC(x) In this exercise, you will compare plots of the Apple daily prices and Apple daily returns using the stock data contained in data.

data <- read_table2("apple_daily_returns.csv")
Parsed with column specification:
cols(
  `"Index"` = col_date(format = ""),
  `"Apple"` = col_double()
)
data <- rename(data, index = `"Index"`, 
       apple = `"Apple"`)
data$index <- as.Date(data$index)
data <- as.xts(data [, -1], order.by = data$index)
library(TTR)
package 㤼㸱TTR㤼㸲 was built under R version 4.0.3
# Plot Apple's stock price 
plot(data$apple, main = "Apple stock price")

# Create a time series called rtn
rtn <- ROC(data$apple)

# Plot Apple daily price and daily returns 
par(mfrow = c(1, 2))

plot(data$apple, main = "Apple stock price")
plot(rtn)

2.2 Other visualization tools

2.2.1 Histogram of returns

A simple chart of returns does not reveal much about the time series properties; often, data must be displayed in a different format to visualize interesting features.

The density function, represented by the histogram of returns, indicates the most common returns in a time series without taking time into account. In R, these are calculated with the hist() and density() functions.

To create a histogram with 20 buckets, a title, and no Y axis label:

> hist(amazon_stocks,
       breaks = 20,
       main = "AMAZON return distribution",
       xlab = "")

Recall that you can use the lines() function to add a new time series, even with different line properties like color and thickness, to an existing plot.

In this exercise, you will create a histogram of the Apple daily returns data for the last two years contained in rtn.

# Create a histogram of Apple stock returns
hist(rtn,
main = "Apple stock return distribution",
probability = TRUE)

# Add a density line
lines(density(rtn[-1,]))

# Redraw a thicker, red density line
lines(density(rtn[-1,]), lwd = 2, col = "red")

It looks like Apple might have some extreme returns!

2.2.2 Box and whisker plot

A box and whisker plot gives information regarding the shape, variability, and center (or median) of a data set. It is particularly useful for displaying skewed data.

By comparing the data set to a standard normal distribution, you can identify departure from normality (asymmetry, skewness, etc). The lines extending parallel from the boxes are known as whiskers, which are used to indicate variability outside the upper and lower quartiles, i.e. outliers. Those outliers are usually plotted as individual dots that are in-line with whiskers.

use boxplot() to create a horizontal box and whisker plot:

> boxplot(amazon_stocks,
          horizontal = TRUE,
          main = "Amazon return distribution")

In this exercise, you will draw a box and whisker plot for Apple stock returns in rtn.

rtn <- as.data.frame(rnt[-1, ])
# Draw box and whisker plot for the Apple returns
boxplot(rtn,
horizontal = TRUE)


# Draw a box and whisker plot of a normal distribution
boxplot(rnorm(1000),
horizontal = TRUE)

# Redraw both plots on the same graphical window
par(mfrow = c(2, 1))

boxplot(rtn,
horizontal = TRUE)
boxplot(rnorm(1000),
horizontal = TRUE)

Boxplots are useful for quickly getting a feel of the location and variability in your data.

2.2.3 Autocorrelation

Another important piece of information is the relationship between one point in the time series and points that come before it. This is called autocorrelation and it can be displayed as a chart which indicates the correlation between points separated by various time lags.

In R, you can plot the autocorrelation function using acf(), which by default, displays the first 30 lags (i.e. the correlation between points n and n - 1, n and n - 2, n and n - 3 and so on up to 30). The autocorrelogram, or the autocorrelation chart, tells you how any point in the time series is related to its past as well as how significant this relationship is. The significance levels are given by 2 horizontal lines above and below 0.

> acf(amazon_stocks,
      main = "AMAZON return autocorrelations")
# Draw autocorrelation plot
acf(rtn, main = "Apple return autocorrelation")


# Redraw with a maximum lag of 10
acf(rtn, main = "Apple return autocorrelation", lag.max = 10)

Autocorrelation helps you understand time-lagged relationships in your data.

2.2.4 q-q plot

A q-q plot is a plot of the quantiles of one dataset against the quantiles of a second dataset. This is often used to understand if the data matches the standard statistical framework, or a normal distribution.

If the data is normally distributed, the points in the q-q plot follow a straight diagonal line. This is useful to check for normality at a glance but note that it is not an accurate statistical test.

To create a q-q plot using the qqnorm() function, and a reference line for if the data were perfectly normally distributed with qqline():

> qqnorm(amazon_stocks,
         main = "AMAZON return QQ-plot")

> qqline(amazon_stocks,
         col = "red")

In the context of this course, the first dataset is Apple stock return and the second dataset is a standard normal distribution. In this exercise, you will check how Apple stock returns in rtn deviate from a normal distribution.

# Create q-q plot
qqnorm(rtn[[1]],
main = "Apple return QQ-plot")

# Add a red line showing normality
qqline(rtn[[1]], col = "red")

It does not look like Apple returns fit a normal distribution very well in the tails.

2.3 How to use everything we learned so far?

2.3.1 A comprehensive time series diagnostic

Each plotting function that you’ve learned so far provides a different piece of insight about a time series. By putting together the histogram, the box and whisker plot, the autocorrelogram, and the q-q plot, you can gather a lot of useful information about time series behavior.

In this exercise, you will explore the ExxonMobil return data in the rtn series available in your workspace. Draw a histogram of rtn, scale it to a probability density, and add a red line to the plot showing the density of rtn

rtn <- rtn[[1]]
# Draw histogram and add red density line
hist(rtn,
probability = TRUE)
lines(density(rtn),
col = "red")


# Draw box and whisker plot
boxplot(rtn)


# Draw autocorrelogram
acf(rtn)


# Draw q-q plot and add a red line for normality
qqnorm(rtn)
qqline(rtn, col = "red")

To allow a quick and efficient diagnostic, it is often more convenient to display the four charts above on the same graphical window.

# Set up 2x2 graphical window
par(mfrow = c(2, 2))

# Recreate all four plots
hist(rtn, probability = TRUE)
lines(density(rtn), col = "red")

boxplot(rtn)

acf(rtn)

qqnorm(rtn)
qqline(rtn, col = "red")

  1. The best suited tool to identify asymmetry in a time series is the histogram

  2. If a time series is upward sloping, its distribution will be skewed to the right

  3. Outliers in a time series are the points outside the whiskers in a box and whisker plot

3 Multivariate Time Series

3.1 Dealing with higher dimensions

3.1.1 Two time series grouped or stacked

In the first chapter, you learned how to use axis() to plot two lines on the same graphic with different Y scales. Should you want to compare them, however, you may find other kind of graphs to be more insightful. One solution is to plot both time series as barcharts. There are two types:

Grouped barchart: for a single period, there are as many bars as time series Stacked bar chart: for each period, there is a single bar, and each time series is represented by a portion of the bar proportional to the value of the time series at this date (i.e. the total at each period adds up to 100%)

You are provided with a dataset (portfolio) containing the weigths of stocks A (stocka) and B (stockb) in your portfolio for each month in 2016. You will use the barplot() function to create both types of charts.

startDate <- as.Date("2016-01-01")
endDate <- as.Date("2016-12-01")
date <- seq.Date(startDate, endDate, by = "month")
portfolio <- matrix(c(0.1, 0.4, 0.5, 0.5, 0.2, 0.3, 0.7, 0.8, 0.7, 0.2, 
0.1, 0.2, 0.9, 0.6, 0.5, 0.5, 0.8, 0.7, 0.3, 0.2, 0.3, 0.8, 0.9, 
0.8),
ncol = 2)
colnames(portfolio) <- c("stocka", "stockb")
portfolio <- xts(portfolio, order.by = date)
portfolio
           stocka stockb
2016-01-01    0.1    0.9
2016-02-01    0.4    0.6
2016-03-01    0.5    0.5
2016-04-01    0.5    0.5
2016-05-01    0.2    0.8
2016-06-01    0.3    0.7
2016-07-01    0.7    0.3
2016-08-01    0.8    0.2
2016-09-01    0.7    0.3
2016-10-01    0.2    0.8
2016-11-01    0.1    0.9
2016-12-01    0.2    0.8
# Plot stacked barplot
barplot(portfolio)


# Plot grouped barplot
barplot(portfolio,
beside = TRUE)

The two types of barplot display the same information in very different ways.

3.1.2 Visualizing bivariate relationships

If you want to go even further than simply plotting variables and instead investigate whether any relationship exists between 2 variables, you can draw a scatterplot. This is a graph where the values of two variables are plotted along two axes.

The pattern of the resulting points is used to reveal the presence of any correlation; usually, a regression line is added to identify the tendency, if there is any:

An upward sloping regression line indicates a positive linear relationship between A and B (when A goes up B tends to goes up as well) A downward sloping regression line indicates a negative linear relationship between A and B You can draw a scatterplot and then create a regression model with the following functions: plot(x = A, y = B) lm(B ~ A) In this exercise, you will draw a scatterplot and regression line for the return series for the SP500 (sp500) and Citigroup (citi) from January 2015 to January 2017.

library(quantmod)
package 㤼㸱quantmod㤼㸲 was built under R version 4.0.3Registered S3 method overwritten by 'quantmod':
  method            from
  as.zoo.data.frame zoo 
Version 0.4-0 included new data defaults. See ?getSymbols.
getSymbols(c("^GSPC", "C"), from = "2015-01-01", to = "2017-01-01", src =  "yahoo", adjust =  TRUE)
㤼㸱getSymbols㤼㸲 currently uses auto.assign=TRUE by default, but will
use auto.assign=FALSE in 0.5-0. You will still be able to use
㤼㸱loadSymbols㤼㸲 to automatically load data. getOption("getSymbols.env")
and getOption("getSymbols.auto.assign") will still be checked for
alternate defaults.

This message is shown once per session and may be disabled by setting 
options("getSymbols.warning4.0"=FALSE). See ?getSymbols for details.

incomplete final line found by readTableHeader on 'https://query1.finance.yahoo.com/v7/finance/download/^GSPC?period1=-2208988800&period2=1603756800&interval=1d&events=div&crumb=f9HDxFyHZMj'incomplete final line found by readTableHeader on 'https://query1.finance.yahoo.com/v7/finance/download/^GSPC?period1=-2208988800&period2=1603756800&interval=1d&events=split&crumb=f9HDxFyHZMj'incomplete final line found by readTableHeader on 'https://query1.finance.yahoo.com/v7/finance/download/^GSPC?period1=-2208988800&period2=1603756800&interval=1d&events=split&crumb=f9HDxFyHZMj'
[1] "^GSPC" "C"    
sp500 <- ROC(GSPC$GSPC.Adjusted)
citi <- ROC(C$C.Adjusted)
# Draw the scatterplot
plot(x = coredata(sp500), y = coredata(citi))

# Draw a regression line
abline(reg = lm(citi ~ sp500),
lwd = 2,
col = "red")

It looks there is definitely a positive linear relationship between these two variables.

3.2 Multivariate time series

3.2.1 Correlation matrix

What if you want to evaluate the relationship between mutiple time series? The most common tool to use is a correlation matrix, which is a table showing correlation coefficients between pairs of variables. Several types of correlations exist but the most used ones are:

  • Pearson correlation: measures the linear relationship between 2 variables
  • Spearman rank correlation: measures the statistical dependency between the ranking of 2 variables (not necessarily linear) The latter is used when there is no assumption made on the distribution of the data. All this is achieved in R using the function cor(). You can use the method argument to select the desired correlation type. “pearson” is the default method, but you can specify “spearman” as well.

In this exercise, you will calculate the correlation matrix of the data provided in the dataset my_data containing the returns for 5 stocks: ExxonMobile, Citigroup, Microsoft, Dow Chemical and Yahoo.

library(readr)
library(xts)
my_data <- read_csv("stocks_02.csv")
Parsed with column specification:
cols(
  Index = col_date(format = ""),
  sp500 = col_double(),
  citigroup = col_double(),
  microsoft = col_double(),
  apple = col_double(),
  dowchemical = col_double(),
  yahoo = col_double()
)
#my_data$index <- as.Date(my_data$index)
my_data <- as.xts(my_data [, -1], order.by = my_data$Index)

# Create correlation matrix using Pearson method
cor(my_data, method = "pearson")
                sp500 citigroup microsoft     apple dowchemical     yahoo
sp500       1.0000000 0.5097953 0.3743215 0.3576966   0.5217243 0.2900962
citigroup   0.5097953 1.0000000 0.4841408 0.4291841   0.5085190 0.4029490
microsoft   0.3743215 0.4841408 1.0000000 0.5133469   0.3954523 0.4329388
apple       0.3576966 0.4291841 0.5133469 1.0000000   0.3627755 0.3413626
dowchemical 0.5217243 0.5085190 0.3954523 0.3627755   1.0000000 0.2938749
yahoo       0.2900962 0.4029490 0.4329388 0.3413626   0.2938749 1.0000000
# Create correlation matrix using Spearman method
cor(my_data, method = "spearman")
                sp500 citigroup microsoft     apple dowchemical     yahoo
sp500       1.0000000 0.5192579 0.4244237 0.3518853   0.5316235 0.3262037
citigroup   0.5192579 1.0000000 0.4976477 0.4374850   0.5607511 0.3780730
microsoft   0.4244237 0.4976477 1.0000000 0.5128477   0.4684114 0.4448179
apple       0.3518853 0.4374850 0.5128477 1.0000000   0.3681791 0.3680715
dowchemical 0.5316235 0.5607511 0.4684114 0.3681791   1.0000000 0.3464743
yahoo       0.3262037 0.3780730 0.4448179 0.3680715   0.3464743 1.0000000

Notice how the two methods calculate different correlation values.

3.2.2 Scatterplots for multiple pairs of data

In the previous exercise, you saw a numerical representation of the relationship between pairs of data through a correlation matrix. It’s also possible to have a graphical representation of those relationships using scatterplots.

Specifically, the relationship between pairs() of time series is represented by a facetted scatterplot of all pairs at once. This is very convenient for a quick comparison betwen pairs of time series.

In this exercise, you will draw scatterplots of the stock data in my_data from the previous exercise.

# Create scatterplot matrix
pairs(coredata(my_data))


# Create upper panel scatterplot matrix
pairs(coredata(my_data), lower.panel = NULL)

When you have a small number of time series to compare, a scatterplot matrix can be useful to visualize everything at once.

3.2.3 Correlation plot

R offers other ways of displaying the correlation matrix. With the corrplot package, the visualization of correlations is made easier and more powerful by allowing you to represent the correlations with numbers, symbols, colors, and more.

In this exercise, you will use the provided correlation matrix cor_mat and the corrplot() function to draw some correlation charts.

library(corrplot)
package 㤼㸱corrplot㤼㸲 was built under R version 4.0.3corrplot 0.84 loaded
cor_mat <- cor(my_data, method = "pearson")
# Create correlation matrix
corrplot(cor_mat)


# Create correlation matrix with numbers
corrplot(cor_mat, method = "number")


# Create correlation matrix with colors
corrplot(cor_mat, method = "color")


# Create upper triangle correlation matrix
# Create correlation matrix with numbers
corrplot(cor_mat, method = "number", type = "upper")

3.3 Higher dimension time series

3.3.1 Correlation matrix as heatmap

Should you want to check correlations betweens hundreds of time series, representing correlations with numbers is not really helpful - for a dataset of 100 elements, you would have to analyze 10,000 (100 x 100) correlation numbers!

In this case, a heatmap is a better suited tool. A heatmap is a map or diagram in which data values are represented as colors. When using one, it might also be useful to reorder the corelation matrix to make it more readable. You can create heatmaps using corrplot(method = “color”).

In this exercise, you will create some heatmaps with the same correlation matrix cor_mat as from the previous exercise.

# Draw heatmap of cor_mat
corrplot(cor_mat, method = "color")


# Draw upper heatmap
corrplot(cor_mat,
type = "upper", method = "color")

Draw the upper heatmap ordering the matrix using hclust in the order argument

# Draw heatmap of cor_mat
corrplot(cor_mat, method = "color")


# Draw upper heatmap
corrplot(cor_mat,
type = "upper", method = "color")

4 Case study: Visually selecting a stock that improves your existing portfolio

4.1 Case study presentation

4.1.1 Current portfolio description

Your savings are invested in a portfolio made of 3 stocks: Yahoo, Apple and Microsoft. Each stocks has the same weight in the portfolio at 33%. You have some extra cash to invest, but before going any further, you want to gather some information on your existing portfolio.

In this exercise, you are provided with a dataset data containing the value and the return of the portfolio over time, in value and return, respectively.

data <- read_csv("existing_portfolio.csv")
Parsed with column specification:
cols(
  Index = col_date(format = ""),
  value = col_double(),
  return = col_double()
)
data <- as.xts(data [, -1], order.by = data$Index)
# Plot the portfolio value
plot(data$value, main = "Portfolio Value")


# Plot the portfolio return
plot(data$return, main = "Portfolio Return")


# Plot a histogram of portfolio return 
hist(data$return,
probability = TRUE)

# Add a density line
lines(density(data$return),
lwd = 2,
col = "red")

4.2 New stocks

4.2.1 New stocks description

In this exercise, you will review plotting multiple graphs on the same graphical window.

The new dataset data containing four new stocks is available in your workspace:

Goldman Sachs (GS) Coca-Cola (KO) Walt Disney (DIS) Caterpillar (CAT)

data <- read_csv("stocks_03.csv")
Parsed with column specification:
cols(
  Index = col_date(format = ""),
  GS = col_double(),
  KO = col_double(),
  DIS = col_double(),
  CAT = col_double()
)
data <- as.xts(data [, -1], order.by = data$Index)
# Plot the four stocks on the same graphical window
par(mfrow = c(2, 2),
mex = 0.8,
cex = 0.8)
plot(data$GS)
plot(data$KO)
plot(data$DIS)
plot(data$CAT)

Now that you know what the new stocks look like, you want to find out if any of them provide diversification benefits to your existing portfolio. You can do this by looking at the correlation of each stock to our portfolio, visualized through regression lines.

In this exercise, you are provided with four individual series containing the return of the same four stocks:

Goldman Sachs (gs) Coca-Cola (ko) Walt Disney (dis) Caterpillar (cat)

The return of your existing portfolio in portfolio are also available in your workspace. Now it’s your turn to analyze the relationships!

library(TTR)
portfolio <- read_csv("existing_portfolio.csv")
Parsed with column specification:
cols(
  Index = col_date(format = ""),
  value = col_double(),
  return = col_double()
)
portfolio <- as.xts(portfolio [, -1], order.by = portfolio$Index)
gs <- ROC(coredata(data$GS))
gs <- gs[-1]
ko <- ROC(coredata(data$KO))
ko <- ko[-1]
dis <- ROC(coredata(data$DIS))
dis <- dis[-1]
cat <- ROC(coredata(data$CAT))
cat <- cat[-1]
# Draw the scatterplot of gs against the portfolio
plot(x = gs, y = portfolio$return)

# Add a regression line in red
abline(reg = lm(gs ~portfolio$return),
col = "red",
lwd = 2)

# Plot scatterplots and regression lines to a 2x2 window
par(mfrow = c(2, 2))


plot(x = gs, y = portfolio$return)
abline(reg = lm(gs ~ portfolio$return),
col = "red",
lwd = 2)

plot(x = ko, y = portfolio$return)
abline(reg = lm(ko ~ portfolio$return),
col = "red",
lwd = 2)

plot(x = dis, y = portfolio$return)
abline(reg = lm(dis ~ portfolio$return),
col = "red",
lwd = 2)

plot(x = cat, y = portfolio$return)
abline(reg = lm(cat ~ portfolio$return),
col = "red",
lwd = 2)

Coca-Cola seems to provide the most diversification benefit based on low correlation to the portfolio.

4.2.2 Compare old and new portfolios

Great work. You decide to buy stocks in Coca-Cola, and now your portfolio is made of equal proportions of four stocks: Yahoo, Microsoft, Apple and Coca-Cola.

In this exercise, you are given a dataset old.vs.new.portfolio with the following self-explanatory columns:

old.portfolio.value

new.portfolio.value

old.portfolio.rtn

new.portfolio.rtn

old.vs.new.portfolio <- read_csv("old_vs_new_portfolio.csv")
Parsed with column specification:
cols(
  Index = col_date(format = ""),
  old.portfolio.value = col_double(),
  new.portfolio.value = col_double(),
  old.portfolio.rtn = col_double(),
  new.portfolio.rtn = col_double()
)
old.vs.new.portfolio <- as.xts(old.vs.new.portfolio [, -1], order.by = old.vs.new.portfolio$Index)
# Plot new and old portfolio values on same chart
plot(old.vs.new.portfolio$old.portfolio.value)

lines(old.vs.new.portfolio$new.portfolio.value, col = "red")



# Plot density of the new and old portfolio returns on same chart
plot(density(old.vs.new.portfolio$old.portfolio.rtn))
lines(density(old.vs.new.portfolio$new.portfolio.rtn), col = "red")

The new portfolio seems to have less variation based on the density lines.

4.2.3 A more accurate comparison of portfolios

Looking at the value and distribution of returns of your portfolio is a good start, but it doesn’t necessarily tell the whole story. You could obviously look at many other charts and metrics, but ultimately what matters is performance, and specifically periods of poor performance.

The PerformanceAnalytics package provides additional tools to get a finer view of your portfolio. In particular, the charts.PerformanceSummary() function provides a quick and easy way to display the portfolio value, returns, and periods of poor performance, also known as drawdowns.

In this exercise, you will use this new function on the same old and new portfolio data in old.vs.new.portfolio from the previous exercise.

# Draw value, return, drawdowns of old portfolio
charts.PerformanceSummary(old.vs.new.portfolio$old.portfolio.rtn)


# Draw value, return, drawdowns of new portfolio
charts.PerformanceSummary(old.vs.new.portfolio$new.portfolio.rtn)


# Draw both portfolios on same chart
charts.PerformanceSummary(old.vs.new.portfolio[, c(3, 4)])

The new portfolio looks to have a higher cumulative return and lower drawdown for this period of time.

what grounds should you add a new stock to your portfolio?

Correlation to your existing portfolio to assess diversification, return histogram to assess risk and box and whisker plot to assess average return

---
title: "Visualizing Time Series Data in R"
output:
  html_notebook:
    toc: true
    toc_float: true
    toc_collapsed: false
    number_sections: true
    
toc_depth: 3
---
# R Time Series Visualization Tools

## Refresher on xts and the plot() function Arnaud Amsellem The R Trader

### plot() function - basic parameters

The plot.xts() function is the most useful tool in the R time series data visualization artillery. It is fairly similar to general plotting, but its x-axis contains a time scale. You can use plot() instead of plot.xts() if the object used in the function is an xts object.

Let's look at a few examples:

    # Basic syntax
    > plot(mydata)
    
    > # Add title and double thickness of line
    > plot(mydata, main = "Stock XYZ", lwd = 2)
    
    > # Add labels for X and Y axes
    > plot(mydata, xlab = "X axis", ylab = "Y axis")> 

As you can see, there are a wide variety of parameters for the function allowing endless possibilities. Note that each call of plot() creates an entirely new plot only using the parameters that are defined in that particular call.

Furthermore, to display the first few rows of a dataset mydata to your console, use head(mydata). To display only the names of the columns, use colnames(mydata). You can also select a particular column of a dataset by specifying its title after a dollar sign, like in mydata$mycolumn.

```{r}
library(readr)
library(dplyr)
library(xts)
data <- read_table2("stocks_01.csv",
                    col_names = FALSE,
                    skip = 1,
                    col_types = cols(
                      X2 = col_number(),
                      X3 = col_number(),
                      X4 = col_number(),
                      X5 = col_number()
                      ))
data <- rename(data, index = X1, 
       yahoo = X2, 
       microsoft = X3,
       citigroup = X4,
       dow_chemical = X5)
head(data)
data$index <- as.Date(data$index)
data <- as.xts(data [, -1], order.by = data$index)
```
```{r}
# Display the first few lines of the data
head(data)

# Display the column names of the data
colnames(data)

# Plot yahoo data and add title
plot(data$yahoo, main = "yahoo")

# Replot yahoo data with labels for X and Y axes
plot(data$yahoo, main = "yahoo", xlab = "date", ylab = "price")
```
You can add even more customization with the plot() function using other options. As you saw in the video, the lines() function is especially helpful when you want to modify an existing plot.

Let's look at another example:

    > # Use bars instead of points and add subtitle
    > plot(mydata, type = "h", sub = "Subtitle")
    
    > # Triple thickness of line and change color to red
    > lines(mydata, col = "red", lwd = 3)


```{r}
# Plot the second time series and change title
plot(data[ ,2], main = "microsoft")

# Replot with same title, add subtitle, use bars
plot(data[ ,2], main = "microsoft", sub = "Daily closing price since 2015", type = "h")

# Change line color to red
lines(data[ ,2], col = "red")
```
### Control graphic parameters

n R, it is also possible to tailor the window layout using the par() function.

To set up a graphical window for multiple charts with nr rows and nc columns, assign the vector c(nr, nc) to the option mfrow. To adjust the size of the margins and characters in the text, set the appropriate decimal value to to the options mex and cex, respectively. Like plot(), each call to par() only implements the parameters in that particular call.

    > # Create 3x1 graphical window
    > par(mfrow = c(3, 1))
    
    > # Also reduce margin and character sizes by half
    > par(mfrow = c(2, 1), mex = 0.5, cex = 0.5)

```{r}
# Plot two charts on same graphical window
par(mfrow = c(2 , 1))
plot(data[,1], main = "yahoo")
plot(data[,2], main = "microsoft")


# Replot with reduced margin and character sizes
par(mfrow = c(2 , 1), mex = 0.6, cex = 0.8)
plot(data[,1], main = "yahoo")
plot(data[,2], main = "microsoft")
```
## Other useful visualizing functions

### Adding an extra series to an existing chart

A great way to visually compare two times series is to display them on the same chart with different scales.

Suppose you already have a plot of mydata. As you saw in the video, you can use lines(mydata2) to add a new time series mydata2 to this existing plot. If you want a scale for this time series on the right side of the plot with equally spaced tick marks, use axis(side, at), where side is an integer specifying which side of the plot the axis should be drawn on, and at is set equal to pretty(mydata2).

Finally, to distinguish these two time series, you can add a legend with the legend() function.

    > # x specifies location of legend in plot
    > legend(x = "bottomright",
             # legend specifies text label(s)
             legend = c("Stock X", "Stock Y"),
             # col specifies color(s)
             col = c("black", "red"),
             # lty specifies line type(s)
             lty = c(1, 1))

Since there are two time series in the plot, some options in legend() are set to a vector of length two.
```{r}
# Plot the "microsoft" series
plot(data$microsoft, main = "Stock prices since 2015")

# Add the "dow_chemical" series in red
lines(data$dow_chemical, col = "red")

# Add a Y axis on the right side of the chart
axis(side = 4, at = pretty(data$dow_chemical))

# Add a legend in the bottom right corner
legend(x = "topleft",
       legend = c("microsoft", "dow_chemical"),
       col = c("black", "red"),
       lty = c(1, 1))
```
### Highlighting events in a time series

You have also learned that it is possible to use the function abline() to add straight lines through an existing plot. Specifically, you can draw a horizontal line to identify a particular date by setting h to a specific Y value, and a vertical line to identify a particular level by setting v to a specific X value:

    > abline(h = NULL, v = NULL, ...)

Recall that the index of an xts object are date objects, so the X values of a plot will also contain dates. In this exercise, you will use indexing as well as as.Date("YYYY-MM-DD") and mean() to visually compare the average of the Citigroup stock market prices to its price on January 4, 2016, after it was affected by turbulence in the Chinese stock market.

```{r}
# Plot the "citigroup" time series
plot.zoo(data$citigroup, main = "Citigroup")

# Create vert_line to identify January 4th, 2016 in citigroup
vert_line <- as.Date("2016-01-04")

# Add a red vertical line using vert_line
abline(v = vert_line, col = "red")

# Create hori_line to identify average price of citigroup
hori_line <- mean(data$citigroup)

# Add a blue horizontal line using hori_line
abline(h = hori_line, col = "blue")
```
### Highlighting a specific period in a time series

To highlight a specific period in a time series, you can display it in the plot in a different background color. The chart.TimeSeries() function in the PerformanceAnalytics package offers a very easy and flexible way of doing this.

Let's examine some of the arguments of this function:

    chart.TimeSeries(R, period.areas, period.color)
    
R is an xts, time series, or zoo object of asset returns, period.areas are shaded areas specified by a start and end date in a vector of xts date ranges like c("1926-10/1927-11"), and period.color draws the shaded region in whichever color is specified.

```{r}
library(PerformanceAnalytics)
# Create period to hold the 3 months of 2015
period <- c("2015-01/2015-03")

# Highlight the first three months of 2015 
chart.TimeSeries(data$citigroup, period.areas = period)

# Highlight the first three months of 2015 in light grey
chart.TimeSeries(data$citigroup, period.areas = period, period.color = "lightgrey")
```
### A fancy stock chart

```{r}
# Plot the microsoft series
plot(data$microsoft, main = "Dividend date and amount")

# Add the citigroup series
lines(data$citigroup, col = "orange", lwd = 2)

# Add a new y axis for the citigroup series
axis(side = 4, at = pretty(data$citigroup), col = "orange")
```
you will add a legend to the chart that you just created containing the name of the companies and the dates and values of the latest dividends.

Fill in the pre-written code with the following variables containing the dividend values and dates for both companies:

- citi_div_value
- citi_div_date
- micro_div_value
- micro_div_date


```{r}
citi_div_value <- "$0.16"
citi_div_date <- "13 Nov. 2016"
micro_div_value <- "$0.39"
micro_div_date <- "15 Nov. 2016"
```
Recall that the default color of a plotted line is black, and that the values for legend, col, and lty in legend() should be set to vectors of the same length as the number of time series plotted in your chart.
```{r}
# Same plot as the previous exercise
plot(data$microsoft, main = "Dividend date and amount")
lines(data$citigroup, col = "orange", lwd = 2)
axis(side = 4, at = pretty(data$citigroup), col = "orange")

# Create the two legend strings
micro <- paste0("Microsoft div. of ", micro_div_value," on ", micro_div_value)
citi <- paste0("Citigroup div. of ", citi_div_value," on ", citi_div_date)

# Create the legend in the bottom right corner
# Same plot as the previous exercise
plot(data$microsoft, main = "Dividend date and amount")
lines(data$citigroup, col = "orange", lwd = 2)
axis(side = 4, at = pretty(data$citigroup), col = "orange")

# Create the two legend strings
micro <- paste0("Microsoft div. of ", micro_div_value," on ", micro_div_date)
citi <- paste0("Citigroup div. of ", citi_div_value," on ", citi_div_date)

# Create the legend in the bottom right corner
legend(x = "bottomright", legend = c(micro, citi), col = c("black", "orange"), lty = c(1, 1))
```
# Univariate Time Series

## Univariate time series analysis

### Representing a univariate time series

The very first step in the analysis of any time series is to address if the time series have the right mathematical properties to apply the standard statistical framework. If not, you must transform the time series first.

In finance, price series are often transformed to differenced data, making it a return series. In R, the ROC() (which stands for "Rate of Change") function from the TTR package does this automatically to a price or volume series x:

ROC(x)
In this exercise, you will compare plots of the Apple daily prices and Apple daily returns using the stock data contained in data.
```{r}
data <- read_table2("apple_daily_returns.csv")
data <- rename(data, index = `"Index"`, 
       apple = `"Apple"`)
data$index <- as.Date(data$index)
data <- as.xts(data [, -1], order.by = data$index)
```

```{r}
library(TTR)
# Plot Apple's stock price 
plot(data$apple, main = "Apple stock price")

# Create a time series called rtn
rtn <- ROC(data$apple)

# Plot Apple daily price and daily returns 
par(mfrow = c(1, 2))
plot(data$apple, main = "Apple stock price")
plot(rtn)
```
## Other visualization tools

### Histogram of returns

A simple chart of returns does not reveal much about the time series properties; often, data must be displayed in a different format to visualize interesting features.

The density function, represented by the histogram of returns, indicates the most common returns in a time series without taking time into account. In R, these are calculated with the hist() and density() functions.

To create a histogram with 20 buckets, a title, and no Y axis label:

    > hist(amazon_stocks,
           breaks = 20,
           main = "AMAZON return distribution",
           xlab = "")

Recall that you can use the lines() function to add a new time series, even with different line properties like color and thickness, to an existing plot.

In this exercise, you will create a histogram of the Apple daily returns data for the last two years contained in rtn.

```{r}
# Create a histogram of Apple stock returns
hist(rtn,
main = "Apple stock return distribution",
probability = TRUE)

# Add a density line
lines(density(rtn[-1,]))

# Redraw a thicker, red density line
lines(density(rtn[-1,]), lwd = 2, col = "red")
```
It looks like Apple might have some extreme returns!

### Box and whisker plot

A box and whisker plot gives information regarding the shape, variability, and center (or median) of a data set. It is particularly useful for displaying skewed data.

By comparing the data set to a standard normal distribution, you can identify departure from normality (asymmetry, skewness, etc). The lines extending parallel from the boxes are known as whiskers, which are used to indicate variability outside the upper and lower quartiles, i.e. outliers. Those outliers are usually plotted as individual dots that are in-line with whiskers.

use boxplot() to create a horizontal box and whisker plot:

    > boxplot(amazon_stocks,
              horizontal = TRUE,
              main = "Amazon return distribution")

In this exercise, you will draw a box and whisker plot for Apple stock returns in rtn.
```{r}
rtn <- as.data.frame(rnt[-1, ])
# Draw box and whisker plot for the Apple returns
boxplot(rtn,
horizontal = TRUE)

# Draw a box and whisker plot of a normal distribution
boxplot(rnorm(1000),
horizontal = TRUE)

# Redraw both plots on the same graphical window
par(mfrow = c(2, 1))
boxplot(rtn,
horizontal = TRUE)
boxplot(rnorm(1000),
horizontal = TRUE)
```
Boxplots are useful for quickly getting a feel of the location and variability in your data.

### Autocorrelation

Another important piece of information is the relationship between one point in the time series and points that come before it. This is called autocorrelation and it can be displayed as a chart which indicates the correlation between points separated by various time lags.

In R, you can plot the autocorrelation function using acf(), which by default, displays the first 30 lags (i.e. the correlation between points n and n - 1, n and n - 2, n and n - 3 and so on up to 30). The autocorrelogram, or the autocorrelation chart, tells you how any point in the time series is related to its past as well as how significant this relationship is. The significance levels are given by 2 horizontal lines above and below 0.

    > acf(amazon_stocks,
          main = "AMAZON return autocorrelations")

```{r}
# Draw autocorrelation plot
acf(rtn, main = "Apple return autocorrelation")

# Redraw with a maximum lag of 10
acf(rtn, main = "Apple return autocorrelation", lag.max = 10)
```
Autocorrelation helps you understand time-lagged relationships in your data.

### q-q plot

A q-q plot is a plot of the quantiles of one dataset against the quantiles of a second dataset. This is often used to understand if the data matches the standard statistical framework, or a normal distribution.

If the data is normally distributed, the points in the q-q plot follow a straight diagonal line. This is useful to check for normality at a glance but note that it is not an accurate statistical test.

To create a q-q plot using the qqnorm() function, and a reference line for if the data were perfectly normally distributed with qqline():

    > qqnorm(amazon_stocks,
             main = "AMAZON return QQ-plot")
    
    > qqline(amazon_stocks,
             col = "red")

In the context of this course, the first dataset is Apple stock return and the second dataset is a standard normal distribution. In this exercise, you will check how Apple stock returns in rtn deviate from a normal distribution.
```{r}
# Create q-q plot
qqnorm(rtn[[1]],
main = "Apple return QQ-plot")

# Add a red line showing normality
qqline(rtn[[1]], col = "red")
```
It does not look like Apple returns fit a normal distribution very well in the tails.

## How to use everything we learned so far?

### A comprehensive time series diagnostic

Each plotting function that you've learned so far provides a different piece of insight about a time series. By putting together the histogram, the box and whisker plot, the autocorrelogram, and the q-q plot, you can gather a lot of useful information about time series behavior.

In this exercise, you will explore the ExxonMobil return data in the rtn series available in your workspace.
Draw a histogram of rtn, scale it to a probability density, and add a red line to the plot showing the density of rtn

```{r}
rtn <- rtn[[1]]
# Draw histogram and add red density line
hist(rtn,
probability = TRUE)
lines(density(rtn),
col = "red")

# Draw box and whisker plot
boxplot(rtn)

# Draw autocorrelogram
acf(rtn)

# Draw q-q plot and add a red line for normality
qqnorm(rtn)
qqline(rtn, col = "red")

```
To allow a quick and efficient diagnostic, it is often more convenient to display the four charts above on the same graphical window.
```{r}
# Set up 2x2 graphical window
par(mfrow = c(2, 2))

# Recreate all four plots
hist(rtn, probability = TRUE)
lines(density(rtn), col = "red")

boxplot(rtn)

acf(rtn)

qqnorm(rtn)
qqline(rtn, col = "red")
```
A) The best suited tool to identify asymmetry in a time series is the histogram

B) If a time series is upward sloping, its distribution will be skewed to the right

C) Outliers in a time series are the points outside the whiskers in a box and whisker plot

# Multivariate Time Series

## Dealing with higher dimensions

### Two time series grouped or stacked

In the first chapter, you learned how to use axis() to plot two lines on the same graphic with different Y scales. Should you want to compare them, however, you may find other kind of graphs to be more insightful. One solution is to plot both time series as barcharts. There are two types:

Grouped barchart: for a single period, there are as many bars as time series
Stacked bar chart: for each period, there is a single bar, and each time series is represented by a portion of the bar proportional to the value of the time series at this date (i.e. the total at each period adds up to 100%)

You are provided with a dataset (portfolio) containing the weigths of stocks A (stocka) and B (stockb) in your portfolio for each month in 2016. You will use the barplot() function to create both types of charts.
```{r}
startDate <- as.Date("2016-01-01")
endDate <- as.Date("2016-12-01")
date <- seq.Date(startDate, endDate, by = "month")
portfolio <- matrix(c(0.1, 0.4, 0.5, 0.5, 0.2, 0.3, 0.7, 0.8, 0.7, 0.2, 
0.1, 0.2, 0.9, 0.6, 0.5, 0.5, 0.8, 0.7, 0.3, 0.2, 0.3, 0.8, 0.9, 
0.8),
ncol = 2)
colnames(portfolio) <- c("stocka", "stockb")
portfolio <- xts(portfolio, order.by = date)
portfolio
```
```{r}
# Plot stacked barplot
barplot(portfolio)

# Plot grouped barplot
barplot(portfolio,
beside = TRUE)
```
The two types of barplot display the same information in very different ways.

### Visualizing bivariate relationships

If you want to go even further than simply plotting variables and instead investigate whether any relationship exists between 2 variables, you can draw a scatterplot. This is a graph where the values of two variables are plotted along two axes.

The pattern of the resulting points is used to reveal the presence of any correlation; usually, a regression line is added to identify the tendency, if there is any:

An upward sloping regression line indicates a positive linear relationship between A and B (when A goes up B tends to goes up as well)
A downward sloping regression line indicates a negative linear relationship between A and B
You can draw a scatterplot and then create a regression model with the following functions:
plot(x = A, y = B)
lm(B ~ A)
In this exercise, you will draw a scatterplot and regression line for the return series for the SP500 (sp500) and Citigroup (citi) from January 2015 to January 2017.

```{r}
library(quantmod)
getSymbols(c("^GSPC", "C"), from = "2015-01-01", to = "2017-01-01", src =  "yahoo", adjust =  TRUE)
```
```{r}
sp500 <- ROC(GSPC$GSPC.Adjusted)
citi <- ROC(C$C.Adjusted)
```

```{r}
# Draw the scatterplot
plot(x = coredata(sp500), y = coredata(citi))

# Draw a regression line
abline(reg = lm(citi ~ sp500),
lwd = 2,
col = "red")
```
It looks there is definitely a positive linear relationship between these two variables.

## Multivariate time series

### Correlation matrix

What if you want to evaluate the relationship between mutiple time series? The most common tool to use is a correlation matrix, which is a table showing correlation coefficients between pairs of variables. Several types of correlations exist but the most used ones are:

-	Pearson correlation: measures the linear relationship between 2 variables
-	Spearman rank correlation: measures the statistical dependency between the ranking of 2 variables (not necessarily linear)
The latter is used when there is no assumption made on the distribution of the data. All this is achieved in R using the function cor(). You can use the method argument to select the desired correlation type. "pearson" is the default method, but you can specify "spearman" as well.

In this exercise, you will calculate the correlation matrix of the data provided in the dataset my_data containing the returns for 5 stocks: ExxonMobile, Citigroup, Microsoft, Dow Chemical and Yahoo.

```{r}
library(readr)
library(xts)
my_data <- read_csv("stocks_02.csv")
#my_data$index <- as.Date(my_data$index)
my_data <- as.xts(my_data [, -1], order.by = my_data$Index)
```

```{r}

# Create correlation matrix using Pearson method
cor(my_data, method = "pearson")

# Create correlation matrix using Spearman method
cor(my_data, method = "spearman")

```
Notice how the two methods calculate different correlation values.

### Scatterplots for multiple pairs of data

In the previous exercise, you saw a numerical representation of the relationship between pairs of data through a correlation matrix. It's also possible to have a graphical representation of those relationships using scatterplots.

Specifically, the relationship between pairs() of time series is represented by a facetted scatterplot of all pairs at once. This is very convenient for a quick comparison betwen pairs of time series.

In this exercise, you will draw scatterplots of the stock data in my_data from the previous exercise.

```{r}
# Create scatterplot matrix
pairs(coredata(my_data))

# Create upper panel scatterplot matrix
pairs(coredata(my_data), lower.panel = NULL)
```
When you have a small number of time series to compare, a scatterplot matrix can be useful to visualize everything at once.

### Correlation plot

R offers other ways of displaying the correlation matrix. With the corrplot package, the visualization of correlations is made easier and more powerful by allowing you to represent the correlations with numbers, symbols, colors, and more.

In this exercise, you will use the provided correlation matrix cor_mat and the corrplot() function to draw some correlation charts.

```{r}
library(corrplot)
cor_mat <- cor(my_data, method = "pearson")
# Create correlation matrix
corrplot(cor_mat)

# Create correlation matrix with numbers
corrplot(cor_mat, method = "number")

# Create correlation matrix with colors
corrplot(cor_mat, method = "color")

# Create upper triangle correlation matrix
# Create correlation matrix with numbers
corrplot(cor_mat, method = "number", type = "upper")
```
## Higher dimension time series

### Correlation matrix as heatmap

Should you want to check correlations betweens hundreds of time series, representing correlations with numbers is not really helpful - for a dataset of 100 elements, you would have to analyze 10,000 (100 x 100) correlation numbers!

In this case, a heatmap is a better suited tool. A heatmap is a map or diagram in which data values are represented as colors. When using one, it might also be useful to reorder the corelation matrix to make it more readable. You can create heatmaps using corrplot(method = "color").

In this exercise, you will create some heatmaps with the same correlation matrix cor_mat as from the previous exercise.
```{r}
# Draw heatmap of cor_mat
corrplot(cor_mat, method = "color")

# Draw upper heatmap
corrplot(cor_mat,
type = "upper", method = "color")
```
Draw the upper heatmap ordering the matrix using hclust in the order argument
```{r}
# Draw heatmap of cor_mat
corrplot(cor_mat, method = "color")

# Draw upper heatmap
corrplot(cor_mat,
type = "upper", method = "color")
```
# Case study: Visually selecting a stock that improves your existing portfolio

## Case study presentation

### Current portfolio description

Your savings are invested in a portfolio made of 3 stocks: Yahoo, Apple and Microsoft. Each stocks has the same weight in the portfolio at 33%. You have some extra cash to invest, but before going any further, you want to gather some information on your existing portfolio.

In this exercise, you are provided with a dataset data containing the value and the return of the portfolio over time, in value and return, respectively.
```{r}
data <- read_csv("existing_portfolio.csv")
data <- as.xts(data [, -1], order.by = data$Index)
```
```{r}
# Plot the portfolio value
plot(data$value, main = "Portfolio Value")

# Plot the portfolio return
plot(data$return, main = "Portfolio Return")

# Plot a histogram of portfolio return 
hist(data$return,
probability = TRUE)

# Add a density line
lines(density(data$return),
lwd = 2,
col = "red")
```
## New stocks

### New stocks description

In this exercise, you will review plotting multiple graphs on the same graphical window.

The new dataset data containing four new stocks is available in your workspace:

Goldman Sachs (GS)
Coca-Cola (KO)
Walt Disney (DIS)
Caterpillar (CAT)

```{r}
data <- read_csv("stocks_03.csv")
data <- as.xts(data [, -1], order.by = data$Index)
```
```{r}
# Plot the four stocks on the same graphical window
par(mfrow = c(2, 2),
mex = 0.8,
cex = 0.8)
plot(data$GS)
plot(data$KO)
plot(data$DIS)
plot(data$CAT)
```
Now that you know what the new stocks look like, you want to find out if any of them provide diversification benefits to your existing portfolio. You can do this by looking at the correlation of each stock to our portfolio, visualized through regression lines.

In this exercise, you are provided with four individual series containing the return of the same four stocks:

Goldman Sachs (gs)
Coca-Cola (ko)
Walt Disney (dis)
Caterpillar (cat)

The return of your existing portfolio in portfolio are also available in your workspace. Now it's your turn to analyze the relationships!

```{r}
library(TTR)
portfolio <- read_csv("existing_portfolio.csv")
portfolio <- as.xts(portfolio [, -1], order.by = portfolio$Index)
gs <- ROC(coredata(data$GS))
gs <- gs[-1]
ko <- ROC(coredata(data$KO))
ko <- ko[-1]
dis <- ROC(coredata(data$DIS))
dis <- dis[-1]
cat <- ROC(coredata(data$CAT))
cat <- cat[-1]
```
```{r}
# Draw the scatterplot of gs against the portfolio
plot(x = gs, y = portfolio$return)

# Add a regression line in red
abline(reg = lm(gs ~portfolio$return),
col = "red",
lwd = 2)

# Plot scatterplots and regression lines to a 2x2 window
par(mfrow = c(2, 2))

plot(x = gs, y = portfolio$return)
abline(reg = lm(gs ~ portfolio$return),
col = "red",
lwd = 2)

plot(x = ko, y = portfolio$return)
abline(reg = lm(ko ~ portfolio$return),
col = "red",
lwd = 2)

plot(x = dis, y = portfolio$return)
abline(reg = lm(dis ~ portfolio$return),
col = "red",
lwd = 2)

plot(x = cat, y = portfolio$return)
abline(reg = lm(cat ~ portfolio$return),
col = "red",
lwd = 2)
```
Coca-Cola seems to provide the most diversification benefit based on low correlation to the portfolio.

### Compare old and new portfolios

Great work. You decide to buy stocks in Coca-Cola, and now your portfolio is made of equal proportions of four stocks: Yahoo, Microsoft, Apple and Coca-Cola.

In this exercise, you are given a dataset old.vs.new.portfolio with the following self-explanatory columns:

old.portfolio.value

new.portfolio.value

old.portfolio.rtn

new.portfolio.rtn

```{r}
old.vs.new.portfolio <- read_csv("old_vs_new_portfolio.csv")
old.vs.new.portfolio <- as.xts(old.vs.new.portfolio [, -1], order.by = old.vs.new.portfolio$Index)
```

```{r}
# Plot new and old portfolio values on same chart
plot(old.vs.new.portfolio$old.portfolio.value)
lines(old.vs.new.portfolio$new.portfolio.value, col = "red")


# Plot density of the new and old portfolio returns on same chart
plot(density(old.vs.new.portfolio$old.portfolio.rtn))
lines(density(old.vs.new.portfolio$new.portfolio.rtn), col = "red")
```
The new portfolio seems to have less variation based on the density lines.

### A more accurate comparison of portfolios

Looking at the value and distribution of returns of your portfolio is a good start, but it doesn't necessarily tell the whole story. You could obviously look at many other charts and metrics, but ultimately what matters is performance, and specifically periods of poor performance.

The PerformanceAnalytics package provides additional tools to get a finer view of your portfolio. In particular, the charts.PerformanceSummary() function provides a quick and easy way to display the portfolio value, returns, and periods of poor performance, also known as drawdowns.

In this exercise, you will use this new function on the same old and new portfolio data in old.vs.new.portfolio from the previous exercise.
```{r}
# Draw value, return, drawdowns of old portfolio
charts.PerformanceSummary(old.vs.new.portfolio$old.portfolio.rtn)

# Draw value, return, drawdowns of new portfolio
charts.PerformanceSummary(old.vs.new.portfolio$new.portfolio.rtn)

# Draw both portfolios on same chart
charts.PerformanceSummary(old.vs.new.portfolio[, c(3, 4)])
```
The new portfolio looks to have a higher cumulative return and lower drawdown for this period of time.

what grounds should you add a new stock to your portfolio?

Correlation to your existing portfolio to assess diversification, return histogram to assess risk and box and whisker plot to assess average return

