iris
datasetWelcome to the Plotly Graphic Library tutorial! Plotly is an open-source, powerful interactive library that allows you to create highly graphic and interactive charts.
Here you will have a chance to practice hands-on through step-by-step
instructions how to utilize Plotly in
R. We will be using one of the native datasets in R,
iris
,so you can easily follow and recreate all of the
examples by a simple call to the
dataset.
In this tutorial, we will focus on some of the most commonly used functions in Plotly as well as different types of transforms, and creating subplots. You should be able to complete this tutorial within 25 minutes. All exercises have their key answers posted in the section below them and titled as Exercise Solution. This is for you to check and verify your learning after you have completed the exercise.
By the end of this tutorial you should be able to:
1. Install Plotly on RStudio/Colab/Github. (It is the same command!)
Initialize the library so that the library commands are runnable.
Plot a simple and statistical chart.
Use transform functions like filter, group and aggregate.
Create subplots to show several plots in one area.
Become familiar with using the different commands on the Modebar.
This tutorial is for beginners in R. Some knowledge of basic charting concepts and programming is needed, as the focus would be to learn about the library commands.
Watch the Video tutorial here!
One of the advantages of Plotly is the modebar. The modebar enables you to download and work with your plot interactively.
As it is shown in the picture, click on the camera icon and get the
plot in PNG format.
You can zoom in and out by clicking on the + and - buttons. Your axes labels will automatically optimize as you zoom in.
The initial axis setting is called “autoscale”. When a graph is made, you can change the axes, and then save their graph. You can return to this saved form by clicking on Reset Axes.
One of these two buttons is selected at all times. Clicking ‘Show closest data on hover’ will display the data for just the one point under the cursor. Clicking ‘Compare data on hover’ will show you the data for all points with the same x-value.
Let’s start by installing plotly
, run the following
command to install and initialize the necessary libraries. Throughout
this tutorial, we also need to do some operations provided by
tidyverse
.
install.packages('plotly')
library(plotly)
install.packages('tidyverse')
library(tidyverse)
iris
datasetThe iris
dataset is built-in in R. It contains several
columns of information about 3 classes of iris plant, 50 instances each.
These columns include:
str(iris)
## 'data.frame': 150 obs. of 5 variables:
## $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
## $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
## $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
## $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
## $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
head(iris)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
To visualize the relationship between Sepal.Length
and
Petal.Length
, we can do a scatter plot. By adding
color = ~Species
, the data is automatically grouped by
Species
.
plot_ly(
data = iris,
x = ~Sepal.Length,
y = ~Petal.Length,
color = ~Species,
type = 'scatter',
mode = 'markers'
)
groupby
Plotly allows you to apply transform functions like ‘groupby’ while building the plot, see the example below grouping the data by Species.
plot_ly(
data = iris,
x = ~Sepal.Length,
y = ~Petal.Length,
type = 'scatter',
mode = 'markers',
transforms = list(
list(
type = 'groupby',
groups = ~Species,
styles = list(
list(target = 'setosa' , value = list(marker = list(color = 'Red'))),
list(target = 'versicolor', value = list(marker = list(color = 'Blue'))),
list(target = 'virginica' , value = list(marker = list(color = 'Green')))
)
)
)
)
filter
In the example below we filtered the y-axis to show the petal length that is greater than 2 cm. Try changing the filter for the x-axis and notice how that changes the plot.
plot_ly(
data = iris,
x = ~Sepal.Length,
y = ~Petal.Length,
color = ~Species,
type = 'scatter',
mode = 'markers',
transforms = list(
list(
type = 'filter',
target = 'y',
operation = '>',
value = 2
)
)
)
We can also add a regression line by using
add_trace(data, x, y)
.
model <- lm(Petal.Length~Sepal.Length, iris)
plot_ly(
data = iris,
x = ~Sepal.Length,
y = ~Petal.Length,
type = 'scatter',
mode = 'markers',
name = 'data'
) %>%
add_trace(
data = iris,
x = ~Sepal.Length,
y = model$fitted.values,
type = 'scatter',
mode = 'lines',
name = 'fit'
)
Here, we calculate the mean
of the
Sepal.Length
and show it using a bar chart.
data = iris %>%
group_by(Species) %>%
summarise(mean.Sepal.Length=mean(Sepal.Length))
plot_ly(
data = data,
x = ~Species,
y = ~mean.Sepal.Length,
color = ~Species,
type = 'bar'
)
aggregate
This time, we calculate the stddev
of the
Sepal.Length
. User can also select one of the following
aggregation functions:
count
returns the quantity of items for each
group.sum
returns the summation of all numeric values.avg
returns the average of all numeric values.median
returns the median of all numeric values.mode
returns the mode of all numeric values.rms
returns the rms of all numeric values.stddev
returns the standard deviation of all numeric
values.min
returns the minimum numeric value for each
group.max
returns the maximum numeric value for each
group.first
returns the first numeric value for each
group.last
returns the last numeric value for each
group.plot_ly(
data = iris,
x = ~Species,
y = ~Sepal.Length,
color = ~Species,
type = 'bar',
transforms = list(
list(
type = 'aggregate',
groups = ~Species,
aggregations = list(
list(
target = 'y',
func = 'stddev'
)
)
)
)
)
Using a pie chart, we can visualize the ratio of the three classes of
iris under a certain condition,
e.g. Sepal.Length > 5.5
.
data <- iris %>%
filter(Sepal.Length > 5.5) %>%
count(Species)
plot_ly(
data = data,
labels = ~Species,
values = ~n,
type = 'pie'
)
Recall the first scatter plot that we did? It showed a scatter plot
of Petal.Length
(y-axis) vs Sepal.Length
(x-axis).
fig_xy <- plot_ly(
data = iris,
x = ~Sepal.Length,
y = ~Petal.Length,
color = ~Species,
type = 'scatter',
mode = 'markers'
)
fig_xy
We can then look into the marginal distribution of Sepal.Length (x-axis) using a histogram.
fig_x <- plot_ly(
data = iris,
x = ~Sepal.Length,
color = ~Species,
type = 'histogram',
alpha = 0.75,
showlegend = FALSE,
nbinsx = 20,
bingroup = 1
) %>%
layout(
barmode = 'overlay',
title = 'Histogram of Sepal.Length',
yaxis = list(title = 'Count')
)
fig_x
Similarly, we can plot the marginal distribution of Petal.Length (y-axis) using histogram too.
fig_y <- plot_ly(
data = iris,
y = ~Petal.Length,
color = ~Species,
type = 'histogram',
alpha = 0.75,
showlegend = FALSE,
nbinsy = 20,
bingroup = 1
) %>%
layout(
barmode = 'overlay',
title = 'Histogram of Petal.Length',
xaxis = list(title = 'Count')
)
fig_y
Firstly, the subplot
command can be used to combine them
together horizontally.
subplot(
fig_xy,
fig_y,
nrows = 1,
widths = c(0.8,0.2),
shareY = TRUE
) %>%
layout(
title = '',
xaxis = list(title = 'Sepal.Length'),
yaxis = list(title = 'Petal.Length')
)
Secondly, the subplot
command can also combine plots
vertically.
subplot(
fig_xy,
fig_x,
nrows = 2,
heights = c(0.8,0.2),
shareX = TRUE
) %>%
layout(
title = '',
xaxis = list(title = 'Sepal.Length'),
yaxis = list(title = 'Petal.Length')
)
Finally, the subplot
command can combine all the plots
together using any grid arrangement, like a join
plot.
subplot(
fig_x,
plotly_empty(),
fig_xy,
fig_y,
nrows = 2,
widths = c(0.8,0.2),
heights = c(0.2,0.8),
shareX = TRUE,
shareY = TRUE
) %>%
layout(
title = 'Joint plot of Petal.Length vs Sepal.Length'
)
## Warning: No trace type specified and no positional attributes specified
## No trace type specified:
## Based on info supplied, a 'scatter' trace seems appropriate.
## Read more about this trace type -> https://plotly.com/r/reference/#scatter
## No scatter mode specifed:
## Setting the mode to markers
## Read more about this attribute -> https://plotly.com/r/reference/#scatter-mode
Sometimes you may need to use specific types of plots to depict certain information. In the following examples you will learn how to use plotly to produce other interesting types of plots. Don’t forget to use the Modebar to manipulate your graph and hover over to see the figures.
This code simply plots the Box plot of Sepal length versus Species.
plot_ly(y = iris$Sepal.Length,
x = iris$Species,
type = "box")
In this section we provide 3 types of histograms to be used in everyday analysis.
The following code is to draw a simple histogram, if you are interested to dig deeper, jump to the histogram comparison, or the 2d histogram below.
plot_ly(x = iris$Sepal.Length,
type = "histogram")
Or stacked histogram, it can help you depict the comparison of different categories under the same bin group.
First, we create 1 dataframe for versicolor and virginica Species by the following code.
versicolor <- filter(
iris,
Species=='versicolor'
)
virginica <- filter(
iris,
Species=='virginica'
)
Then, we only need to run the following code to get the result.
fig <- plot_ly(
x = virginica$Sepal.Length,
type = "histogram",
bingroup = 1)
fig <- fig %>%
add_trace(
x = versicolor$Sepal.Length,
type = "histogram",
bingroup = 1,
opacity = 0.5
)
fig <- fig %>%
layout(
barmode="overlay",
bargap=0.1
)
fig
The 2D Histogram or heatmap, shows the relationship with the different variables you want to depict. The chart below displays the heatmap/correlation among the 4 variables of each species.
First, we calculate the correlation between 4 variables in the dataset as below.
corrdata <- cor(iris[, c(1:4)])
corrdata
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## Sepal.Length 1.0000000 -0.1175698 0.8717538 0.8179411
## Sepal.Width -0.1175698 1.0000000 -0.4284401 -0.3661259
## Petal.Length 0.8717538 -0.4284401 1.0000000 0.9628654
## Petal.Width 0.8179411 -0.3661259 0.9628654 1.0000000
Then, we only need to run the following code to get the result.
plot_ly(
x = colnames(corrdata),
y = colnames(corrdata),
z = corrdata,
type = "heatmap"
)
Error bars are a great way to visualize the estimated error. The following example shows you how to use the error bars in plotly.
First, we will calculate the mean and standard deviation of Sepal Length for Species by the following code.
data <- iris %>%
group_by(Species) %>%
summarize(Mean = mean(Sepal.Length),
STD = sd(Sepal.Length)
)
data
## # A tibble: 3 × 3
## Species Mean STD
## <fct> <dbl> <dbl>
## 1 setosa 5.01 0.352
## 2 versicolor 5.94 0.516
## 3 virginica 6.59 0.636
Then, we only need to run the following code to get the result.
plot_ly(
data,
x = ~Species,
y = ~Mean,
error_y = list(array=~STD),
type = 'scatter'
)
## No scatter mode specifed:
## Setting the mode to markers
## Read more about this attribute -> https://plotly.com/r/reference/#scatter-mode