• 1 Introduction
    • 1.1 Objectives
    • 1.2 Learning Outcomes
    • 1.3 Experience Level
  • 2 Modebar
  • 3 Installation
  • 4 The iris dataset
  • 5 Basic Charts
  • 6 Subplot
  • 7 Statistical Plots

1 Introduction

Welcome to the Plotly Graphic Library tutorial! Plotly is an open-source, powerful interactive library that allows you to create highly graphic and interactive charts.

Here you will have a chance to practice hands-on through step-by-step instructions how to utilize Plotly in R. We will be using one of the native datasets in R, iris,so you can easily follow and recreate all of the examples by a simple call to the

dataset.

1.1 Objectives

In this tutorial, we will focus on some of the most commonly used functions in Plotly as well as different types of transforms, and creating subplots. You should be able to complete this tutorial within 25 minutes. All exercises have their key answers posted in the section below them and titled as Exercise Solution. This is for you to check and verify your learning after you have completed the exercise.

1.2 Learning Outcomes

By the end of this tutorial you should be able to:

1. Install Plotly on RStudio/Colab/Github. (It is the same command!)

  1. Initialize the library so that the library commands are runnable.

  2. Plot a simple and statistical chart.

  3. Use transform functions like filter, group and aggregate.

  4. Create subplots to show several plots in one area.

  5. Become familiar with using the different commands on the Modebar.

1.3 Experience Level

This tutorial is for beginners in R. Some knowledge of basic charting concepts and programming is needed, as the focus would be to learn about the library commands.

Watch the Video tutorial here!

2 Modebar

One of the advantages of Plotly is the modebar. The modebar enables you to download and work with your plot interactively.

2.1 Download Plot as a PNG

As it is shown in the picture, click on the camera icon and get the plot in PNG format.

2.2 Zoom and Pan Buttons

Clicking and holding with your mouse allows you to zoom and pan. You can toggle between modes by clicking on the zoom or pan icons.

2.3 Zoom In/Out

You can zoom in and out by clicking on the + and - buttons. Your axes labels will automatically optimize as you zoom in.

2.4 Autoscale and Reset Axes

The initial axis setting is called “autoscale”. When a graph is made, you can change the axes, and then save their graph. You can return to this saved form by clicking on Reset Axes.

2.5 Hover Options

One of these two buttons is selected at all times. Clicking ‘Show closest data on hover’ will display the data for just the one point under the cursor. Clicking ‘Compare data on hover’ will show you the data for all points with the same x-value.

3 Installation

Let’s start by installing plotly, run the following command to install and initialize the necessary libraries. Throughout this tutorial, we also need to do some operations provided by tidyverse.

install.packages('plotly')
library(plotly)

install.packages('tidyverse')
library(tidyverse)

4 The iris dataset

The iris dataset is built-in in R. It contains several columns of information about 3 classes of iris plant, 50 instances each. These columns include:

  1. Sepal length (in cm)
  2. Sepal width (in cm)
  3. Petal length (in cm)
  4. Petal width (in cm)
  5. Species:
    • Iris Setosa
    • Iris Versicolour
    • Iris Virginica
str(iris)
## 'data.frame':    150 obs. of  5 variables:
##  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
##  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
##  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
##  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
##  $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
head(iris)
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa

5 Basic Charts

5.1 Scatter Plot

To visualize the relationship between Sepal.Length and Petal.Length, we can do a scatter plot. By adding color = ~Species, the data is automatically grouped by Species.

plot_ly(
  data  = iris,
  x     = ~Sepal.Length,
  y     = ~Petal.Length,
  color = ~Species,
  type  = 'scatter',
  mode  = 'markers'
)
56781234567
setosaversicolorvirginicaSepal.LengthPetal.Length

5.1.1 Using groupby

Plotly allows you to apply transform functions like ‘groupby’ while building the plot, see the example below grouping the data by Species.

plot_ly(
  data  = iris,
  x     = ~Sepal.Length,
  y     = ~Petal.Length,
  type  = 'scatter',
  mode  = 'markers',
  transforms = list(
    list(
      type   = 'groupby',
      groups = ~Species,
      styles = list(
        list(target = 'setosa'    , value = list(marker = list(color = 'Red'))),
        list(target = 'versicolor', value = list(marker = list(color = 'Blue'))),
        list(target = 'virginica' , value = list(marker = list(color = 'Green')))
      )
    )
  )
)

5.1.2 Applying filter

In the example below we filtered the y-axis to show the petal length that is greater than 2 cm. Try changing the filter for the x-axis and notice how that changes the plot.

plot_ly(
  data  = iris,
  x     = ~Sepal.Length,
  y     = ~Petal.Length,
  color = ~Species,
  type  = 'scatter',
  mode  = 'markers',
  transforms = list(
    list(
      type      = 'filter',
      target    = 'y',
      operation = '>',
      value     = 2
    )
  )
)
56781234567
setosaversicolorvirginicaSepal.LengthPetal.Length

5.2 Line Plot

We can also add a regression line by using add_trace(data, x, y).

model <- lm(Petal.Length~Sepal.Length, iris)

plot_ly(
  data = iris,
  x    = ~Sepal.Length,
  y    = ~Petal.Length,
  type = 'scatter',
  mode = 'markers',
  name = 'data'
) %>%
add_trace(
  data = iris,
  x    = ~Sepal.Length,
  y    = model$fitted.values,
  type = 'scatter',
  mode = 'lines',
  name = 'fit'
)

5.3 Bar Chart

Here, we calculate the mean of the Sepal.Length and show it using a bar chart.

data = iris %>%
  group_by(Species) %>%
  summarise(mean.Sepal.Length=mean(Sepal.Length))

plot_ly(
  data  = data,
  x     = ~Species,
  y     = ~mean.Sepal.Length,
  color = ~Species,
  type  = 'bar'
)
setosaversicolorvirginica0123456
setosaversicolorvirginicaSpeciesmean.Sepal.Length

5.3.1 Doing aggregate

This time, we calculate the stddev of the Sepal.Length. User can also select one of the following aggregation functions:

  • count returns the quantity of items for each group.
  • sum returns the summation of all numeric values.
  • avg returns the average of all numeric values.
  • median returns the median of all numeric values.
  • mode returns the mode of all numeric values.
  • rms returns the rms of all numeric values.
  • stddev returns the standard deviation of all numeric values.
  • min returns the minimum numeric value for each group.
  • max returns the maximum numeric value for each group.
  • first returns the first numeric value for each group.
  • last returns the last numeric value for each group.
plot_ly(
  data  = iris,
  x     = ~Species,
  y     = ~Sepal.Length,
  color = ~Species,
  type  = 'bar',
  transforms = list(
    list(
      type         = 'aggregate',
      groups       = ~Species,
      aggregations = list(
        list(
          target = 'y',
          func   = 'stddev'
        )
      )
    )
  )
)
setosaversicolorvirginica00.10.20.30.40.50.6
setosaversicolorvirginicaSpeciesSepal.Length

5.4 Pie Chart

Using a pie chart, we can visualize the ratio of the three classes of iris under a certain condition, e.g. Sepal.Length > 5.5.

data <- iris %>%
  filter(Sepal.Length > 5.5) %>%
  count(Species)

plot_ly(
  data   = data,
  labels = ~Species,
  values = ~n,
  type   = 'pie'
)

6 Subplot

6.1 Individual plots

Recall the first scatter plot that we did? It showed a scatter plot of Petal.Length (y-axis) vs Sepal.Length (x-axis).

fig_xy <- plot_ly(
  data  = iris,
  x     = ~Sepal.Length,
  y     = ~Petal.Length,
  color = ~Species,
  type  = 'scatter',
  mode  = 'markers'
)

fig_xy
56781234567
setosaversicolorvirginicaSepal.LengthPetal.Length

We can then look into the marginal distribution of Sepal.Length (x-axis) using a histogram.

fig_x <- plot_ly(
  data       = iris,
  x          = ~Sepal.Length,
  color      = ~Species,
  type       = 'histogram',
  alpha      = 0.75,
  showlegend = FALSE,
  nbinsx     = 20,
  bingroup   = 1
) %>%
layout(
  barmode = 'overlay',
  title   = 'Histogram of Sepal.Length',
  yaxis = list(title = 'Count')
)

fig_x
4.555.566.577.580246810121416
Histogram of Sepal.LengthSepal.LengthCount

Similarly, we can plot the marginal distribution of Petal.Length (y-axis) using histogram too.

fig_y <- plot_ly(
  data       = iris,
  y          = ~Petal.Length,
  color      = ~Species,
  type       = 'histogram',
  alpha      = 0.75,
  showlegend = FALSE,
  nbinsy     = 20,
  bingroup   = 1
) %>%
layout(
  barmode = 'overlay',
  title   = 'Histogram of Petal.Length',
  xaxis = list(title = 'Count')
)

fig_y
0102030401234567
Histogram of Petal.LengthCountPetal.Length

6.2 Horizontal arrangement

Firstly, the subplot command can be used to combine them together horizontally.

subplot(
  fig_xy,
  fig_y,
  nrows  = 1,
  widths = c(0.8,0.2),
  shareY = TRUE
) %>%
layout(
  title = '',
  xaxis = list(title = 'Sepal.Length'),
  yaxis = list(title = 'Petal.Length')
)
56781234567010203040
setosaversicolorvirginicaSepal.LengthPetal.Length

6.3 Vertical arrangement

Secondly, the subplot command can also combine plots vertically.

subplot(
  fig_xy,
  fig_x,
  nrows   = 2,
  heights = c(0.8,0.2),
  shareX  = TRUE
) %>%
layout(
  title = '',
  xaxis = list(title = 'Sepal.Length'),
  yaxis = list(title = 'Petal.Length')
)
12345675678051015
setosaversicolorvirginicaSepal.LengthPetal.Length

6.4 Grid arrangement

Finally, the subplot command can combine all the plots together using any grid arrangement, like a join plot.

subplot(
  fig_x,
  plotly_empty(),
  fig_xy,
  fig_y,
  nrows   = 2,
  widths  = c(0.8,0.2),
  heights = c(0.2,0.8),
  shareX  = TRUE,
  shareY  = TRUE
) %>%
layout(
  title  = 'Joint plot of Petal.Length vs Sepal.Length'
)
## Warning: No trace type specified and no positional attributes specified
## No trace type specified:
##   Based on info supplied, a 'scatter' trace seems appropriate.
##   Read more about this trace type -> https://plotly.com/r/reference/#scatter
## No scatter mode specifed:
##   Setting the mode to markers
##   Read more about this attribute -> https://plotly.com/r/reference/#scatter-mode
56781234567
setosaversicolorvirginicaJoint plot of Petal.Length vs Sepal.LengthSepal.LengthCountCountPetal.Length

7 Statistical Plots

Sometimes you may need to use specific types of plots to depict certain information. In the following examples you will learn how to use plotly to produce other interesting types of plots. Don’t forget to use the Modebar to manipulate your graph and hover over to see the figures.

7.1 Box plot

This code simply plots the Box plot of Sepal length versus Species.

plot_ly(y = iris$Sepal.Length,
        x = iris$Species,
        type = "box")

7.2 Histogram

In this section we provide 3 types of histograms to be used in everyday analysis.

7.2.1 Histogram - basic

The following code is to draw a simple histogram, if you are interested to dig deeper, jump to the histogram comparison, or the 2d histogram below.

plot_ly(x = iris$Sepal.Length,
        type = "histogram")

7.2.2 Histogram - comparison

Or stacked histogram, it can help you depict the comparison of different categories under the same bin group.

First, we create 1 dataframe for versicolor and virginica Species by the following code.

versicolor <- filter(
  iris,
  Species=='versicolor'
  )

virginica <- filter(
  iris,
  Species=='virginica'
  )

Then, we only need to run the following code to get the result.

fig <- plot_ly(
  x        = virginica$Sepal.Length,
  type     = "histogram",
  bingroup = 1)

fig <- fig %>%
 add_trace(
   x        = versicolor$Sepal.Length,
   type     = "histogram",
   bingroup = 1,
   opacity  = 0.5
   )

fig <- fig %>%
 layout(
   barmode="overlay",
   bargap=0.1
   )

fig

7.2.3 2D Histogram

The 2D Histogram or heatmap, shows the relationship with the different variables you want to depict. The chart below displays the heatmap/correlation among the 4 variables of each species.

First, we calculate the correlation between 4 variables in the dataset as below.

corrdata <- cor(iris[, c(1:4)])

corrdata
##              Sepal.Length Sepal.Width Petal.Length Petal.Width
## Sepal.Length    1.0000000  -0.1175698    0.8717538   0.8179411
## Sepal.Width    -0.1175698   1.0000000   -0.4284401  -0.3661259
## Petal.Length    0.8717538  -0.4284401    1.0000000   0.9628654
## Petal.Width     0.8179411  -0.3661259    0.9628654   1.0000000

Then, we only need to run the following code to get the result.

plot_ly(
  x    = colnames(corrdata),
  y    = colnames(corrdata),
  z    = corrdata,
  type = "heatmap"
  )
Sepal.LengthSepal.WidthPetal.LengthPetal.WidthSepal.LengthSepal.WidthPetal.LengthPetal.Width
00.51

7.3 Error bars

Error bars are a great way to visualize the estimated error. The following example shows you how to use the error bars in plotly.

First, we will calculate the mean and standard deviation of Sepal Length for Species by the following code.

data <- iris %>%
  group_by(Species) %>% 
  summarize(Mean = mean(Sepal.Length),
  STD            = sd(Sepal.Length)
  )

data
## # A tibble: 3 × 3
##   Species     Mean   STD
##   <fct>      <dbl> <dbl>
## 1 setosa      5.01 0.352
## 2 versicolor  5.94 0.516
## 3 virginica   6.59 0.636

Then, we only need to run the following code to get the result.

plot_ly(
  data,
  x       = ~Species,
  y       = ~Mean,
  error_y = list(array=~STD),
  type    = 'scatter'
  )
## No scatter mode specifed:
##   Setting the mode to markers
##   Read more about this attribute -> https://plotly.com/r/reference/#scatter-mode