Functional Programming with purrr

DataCamp

Course Description

Lists can be difficult to both understand and manipulate, but they can pack a ton of information and are very powerful. In this course, you will learn to easily extract, summarize, and manipulate lists and how to export the data to your desired object, be it another list, a vector, or even something else! Throughout the course, you will work with the purrr package and a variety of datasets from the repurrrsive package, including data from Star Wars and Wes Anderson films and data collected about GitHub users and GitHub repos. Following this course, your list skills will be purrrfect!

1 Simplifying with purrr

Iteration is a powerful way to make the computer do the work for you. It can also be an area of coding where it is easy to make lots of typos and simple mistakes. The purrr package helps simplify iteration so you can focus on the next step, instead of finding typos.

1.1 The power of iteration

1.1.1 Introduction to iteration

Imagine that you need to read in hundreds of files with a similar structure and perform an action on them. You don’t want to write hundreds of repetitive lines of code to read in all the files or to perform the action. Instead, you want to iterate over them. Iteration is the process of doing the same process to multiple inputs. Being able to iterate is important to make your code efficient, and is powerful when working with lists.

For this exercise, the names of 16 CSV files have been loaded into a list called files. In your own work, you could use the list.files() function to create this list. The readr library is also already loaded.

This course touches on a lot of concepts you may have forgotten, so if you ever need a quick refresher, download the tidyverse Cheat Sheet and keep it handy!

Create a for loop, which iterates over the files list, and gives each element as an input for readr::read_csv(), which is another way of saying the read_csv() function from the readr package.

# Initialize list
all_files <- list()

Then use that input, so the result is a list where each CSV file has been read into a separate element of the newly created all_files list.

files=list.files("/Users/apple/Documents/Rstudio/DataCamp/FoundationsofFunctionalProgrammingwithpurrr/simulated_data_from_1990_to_2005", pattern = "*.csv")
files=paste("/Users/apple/Documents/Rstudio/DataCamp/FoundationsofFunctionalProgrammingwithpurrr/simulated_data_from_1990_to_2005/",files,sep="")
# For loop to read files into a list
for(i in seq_along(files)){
  all_files[[i]] <- read_csv(files[[i]])
}

head(all_files)

## [[1]]
## # A tibble: 200 × 3
##    years     a     b
##    <dbl> <dbl> <dbl>
##  1  1990  5.25  197.
##  2  1990  8.17  192.
##  3  1990  6.49  192.
##  4  1990  5.82  195.
##  5  1990  5.54  201.
##  6  1990  6.65  196.
##  7  1990 10.4   208.
##  8  1990  1.66  183.
##  9  1990  2.78  174.
## 10  1990  8.34  198.
## # … with 190 more rows
## 
## [[2]]
## # A tibble: 200 × 3
##    years     a     b
##    <dbl> <dbl> <dbl>
##  1  1991  3.70  197.
##  2  1991  5.37  187.
##  3  1991  7.05  186.
##  4  1991  1.97  207.
##  5  1991  8.05  217.
##  6  1991  1.97  213.
##  7  1991  5.33  195.
##  8  1991  4.32  204.
##  9  1991  4.46  177.
## 10  1991  4.63  222.
## # … with 190 more rows
## 
## [[3]]
## # A tibble: 200 × 3
##    years     a     b
##    <dbl> <dbl> <dbl>
##  1  1992  8.64  178.
##  2  1992  3.70  207.
##  3  1992  4.79  206.
##  4  1992  9.22  194.
##  5  1992  6.49  202.
##  6  1992  4.58  197.
##  7  1992  5.06  174.
##  8  1992  2.20  216.
##  9  1992  4.72  177.
## 10  1992 10.0   188.
## # … with 190 more rows
## 
## [[4]]
## # A tibble: 200 × 3
##    years     a     b
##    <dbl> <dbl> <dbl>
##  1  1993  2.34  204.
##  2  1993  5.44  167.
##  3  1993  6.86  213.
##  4  1993  5.70  197.
##  5  1993  2.78  193.
##  6  1993  3.24  164.
##  7  1993  5.59  234.
##  8  1993  3.02  183.
##  9  1993  4.60  182.
## 10  1993  7.56  205.
## # … with 190 more rows
## 
## [[5]]
## # A tibble: 200 × 3
##    years     a     b
##    <dbl> <dbl> <dbl>
##  1  1994  3.40  197.
##  2  1994  4.29  214.
##  3  1994  6.91  175.
##  4  1994  3.11  181.
##  5  1994  5.50  185.
##  6  1994  3.59  211.
##  7  1994  2.97  189.
##  8  1994  7.40  171.
##  9  1994  9.66  198.
## 10  1994  8.19  221.
## # … with 190 more rows
## 
## [[6]]
## # A tibble: 200 × 3
##    years     a     b
##    <dbl> <dbl> <dbl>
##  1  1995  5.12  197.
##  2  1995  4.18  219.
##  3  1995  3.70  186.
##  4  1995  4.46  204.
##  5  1995  7.48  209.
##  6  1995  8.38  204.
##  7  1995  4.51  202.
##  8  1995  5.68  208.
##  9  1995  5.24  211.
## 10  1995  3.04  212.
## # … with 190 more rows

map(all_files,~head(.x))

## [[1]]
## # A tibble: 6 × 3
##   years     a     b
##   <dbl> <dbl> <dbl>
## 1  1990  5.25  197.
## 2  1990  8.17  192.
## 3  1990  6.49  192.
## 4  1990  5.82  195.
## 5  1990  5.54  201.
## 6  1990  6.65  196.
## 
## [[2]]
## # A tibble: 6 × 3
##   years     a     b
##   <dbl> <dbl> <dbl>
## 1  1991  3.70  197.
## 2  1991  5.37  187.
## 3  1991  7.05  186.
## 4  1991  1.97  207.
## 5  1991  8.05  217.
## 6  1991  1.97  213.
## 
## [[3]]
## # A tibble: 6 × 3
##   years     a     b
##   <dbl> <dbl> <dbl>
## 1  1992  8.64  178.
## 2  1992  3.70  207.
## 3  1992  4.79  206.
## 4  1992  9.22  194.
## 5  1992  6.49  202.
## 6  1992  4.58  197.
## 
## [[4]]
## # A tibble: 6 × 3
##   years     a     b
##   <dbl> <dbl> <dbl>
## 1  1993  2.34  204.
## 2  1993  5.44  167.
## 3  1993  6.86  213.
## 4  1993  5.70  197.
## 5  1993  2.78  193.
## 6  1993  3.24  164.
## 
## [[5]]
## # A tibble: 6 × 3
##   years     a     b
##   <dbl> <dbl> <dbl>
## 1  1994  3.40  197.
## 2  1994  4.29  214.
## 3  1994  6.91  175.
## 4  1994  3.11  181.
## 5  1994  5.50  185.
## 6  1994  3.59  211.
## 
## [[6]]
## # A tibble: 6 × 3
##   years     a     b
##   <dbl> <dbl> <dbl>
## 1  1995  5.12  197.
## 2  1995  4.18  219.
## 3  1995  3.70  186.
## 4  1995  4.46  204.
## 5  1995  7.48  209.
## 6  1995  8.38  204.
## 
## [[7]]
## # A tibble: 6 × 3
##   years     a     b
##   <dbl> <dbl> <dbl>
## 1  1996  7.90  185.
## 2  1996 10.2   178.
## 3  1996  7.28  210.
## 4  1996  5.51  189.
## 5  1996  4.47  209.
## 6  1996  7.29  207.
## 
## [[8]]
## # A tibble: 6 × 3
##   years     a     b
##   <dbl> <dbl> <dbl>
## 1  1997  2.52  225.
## 2  1997  4.85  194.
## 3  1997  1.47  211.
## 4  1997  3.28  184.
## 5  1997  2.11  187.
## 6  1997  5.51  198.
## 
## [[9]]
## # A tibble: 6 × 3
##   years     a     b
##   <dbl> <dbl> <dbl>
## 1  1998  5.26  190.
## 2  1998  2.84  184.
## 3  1998  4.81  238.
## 4  1998  5.79  201.
## 5  1998  5.97  196.
## 6  1998  7.01  180.
## 
## [[10]]
## # A tibble: 6 × 3
##   years     a     b
##   <dbl> <dbl> <dbl>
## 1  1999  3.71  188.
## 2  1999  4.37  216.
## 3  1999  2.78  157.
## 4  1999  9.02  192.
## 5  1999  4.11  204.
## 6  1999  6.34  204.
## 
## [[11]]
## # A tibble: 6 × 3
##   years      a     b
##   <dbl>  <dbl> <dbl>
## 1  2000  5.57   196.
## 2  2000  3.40   202.
## 3  2000 10.5    196.
## 4  2000  2.73   196.
## 5  2000 -0.410  189.
## 6  2000  2.61   218.
## 
## [[12]]
## # A tibble: 6 × 3
##   years     a     b
##   <dbl> <dbl> <dbl>
## 1  2001  5.33  213.
## 2  2001  2.27  201.
## 3  2001  3.23  200.
## 4  2001  6.00  191.
## 5  2001  6.41  194.
## 6  2001  3.11  223.
## 
## [[13]]
## # A tibble: 6 × 3
##   years      a     b
##   <dbl>  <dbl> <dbl>
## 1  2002  6.63   188.
## 2  2002 -0.778  216.
## 3  2002  3.16   193.
## 4  2002  7.62   198.
## 5  2002  2.08   209.
## 6  2002  5.14   212.
## 
## [[14]]
## # A tibble: 6 × 3
##   years     a     b
##   <dbl> <dbl> <dbl>
## 1  2003  5.59  173.
## 2  2003  4.58  207.
## 3  2003  6.27  201.
## 4  2003 -1.74  195.
## 5  2003  6.54  182.
## 6  2003  5.15  203.
## 
## [[15]]
## # A tibble: 6 × 3
##   years     a     b
##   <dbl> <dbl> <dbl>
## 1  2004  7.89  222.
## 2  2004  6.05  177.
## 3  2004  3.83  212.
## 4  2004  4.15  198.
## 5  2004  3.02  196.
## 6  2004  2.58  206.
## 
## [[16]]
## # A tibble: 6 × 3
##   years     a     b
##   <dbl> <dbl> <dbl>
## 1  2005  8.73  201.
## 2  2005  3.47  191.
## 3  2005  2.19  194.
## 4  2005  4.39  211.
## 5  2005  6.33  180.
## 6  2005 -1.58  219.

Output the size of the all_files list.

# Output size of list object
length(all_files)

## [1] 16

Good work! Now let’s see how to do it more easily with purrr.

1.1.2 Iteration with purrr

You’ve made a great for loop, but it uses a lot of code to do something as simple as input a series of files into a list. This is where purrr comes in. We can do the same thing as a for loop in one line of code with purrr::map(). The function map() iterates over a list, and uses another function that can specified with the .f argument.

map() takes two arguments:

The first is the list over that will be iterated over
The second is a function that will act on each element of the list

The readr library is already loaded.

Load the purrr library (note the 3 Rs).

# Load purrr library
library(purrr)

Replicate the for loop from the last exercise using map() instead. Use the same list files and the same function readr::read_csv().

# Use map to iterate
all_files_purrr <- map(files, read_csv)

head(all_files_purrr)

## [[1]]
## # A tibble: 200 × 3
##    years     a     b
##    <dbl> <dbl> <dbl>
##  1  1990  5.25  197.
##  2  1990  8.17  192.
##  3  1990  6.49  192.
##  4  1990  5.82  195.
##  5  1990  5.54  201.
##  6  1990  6.65  196.
##  7  1990 10.4   208.
##  8  1990  1.66  183.
##  9  1990  2.78  174.
## 10  1990  8.34  198.
## # … with 190 more rows
## 
## [[2]]
## # A tibble: 200 × 3
##    years     a     b
##    <dbl> <dbl> <dbl>
##  1  1991  3.70  197.
##  2  1991  5.37  187.
##  3  1991  7.05  186.
##  4  1991  1.97  207.
##  5  1991  8.05  217.
##  6  1991  1.97  213.
##  7  1991  5.33  195.
##  8  1991  4.32  204.
##  9  1991  4.46  177.
## 10  1991  4.63  222.
## # … with 190 more rows
## 
## [[3]]
## # A tibble: 200 × 3
##    years     a     b
##    <dbl> <dbl> <dbl>
##  1  1992  8.64  178.
##  2  1992  3.70  207.
##  3  1992  4.79  206.
##  4  1992  9.22  194.
##  5  1992  6.49  202.
##  6  1992  4.58  197.
##  7  1992  5.06  174.
##  8  1992  2.20  216.
##  9  1992  4.72  177.
## 10  1992 10.0   188.
## # … with 190 more rows
## 
## [[4]]
## # A tibble: 200 × 3
##    years     a     b
##    <dbl> <dbl> <dbl>
##  1  1993  2.34  204.
##  2  1993  5.44  167.
##  3  1993  6.86  213.
##  4  1993  5.70  197.
##  5  1993  2.78  193.
##  6  1993  3.24  164.
##  7  1993  5.59  234.
##  8  1993  3.02  183.
##  9  1993  4.60  182.
## 10  1993  7.56  205.
## # … with 190 more rows
## 
## [[5]]
## # A tibble: 200 × 3
##    years     a     b
##    <dbl> <dbl> <dbl>
##  1  1994  3.40  197.
##  2  1994  4.29  214.
##  3  1994  6.91  175.
##  4  1994  3.11  181.
##  5  1994  5.50  185.
##  6  1994  3.59  211.
##  7  1994  2.97  189.
##  8  1994  7.40  171.
##  9  1994  9.66  198.
## 10  1994  8.19  221.
## # … with 190 more rows
## 
## [[6]]
## # A tibble: 200 × 3
##    years     a     b
##    <dbl> <dbl> <dbl>
##  1  1995  5.12  197.
##  2  1995  4.18  219.
##  3  1995  3.70  186.
##  4  1995  4.46  204.
##  5  1995  7.48  209.
##  6  1995  8.38  204.
##  7  1995  4.51  202.
##  8  1995  5.68  208.
##  9  1995  5.24  211.
## 10  1995  3.04  212.
## # … with 190 more rows

Check the length of all_files_purrr.

# Output size of list object
length(all_files_purrr)

## [1] 16

Nice! You can see from the output here that 16 different files have been read into all_files_purrr.

1.1.3 More iteration with for loops

Iteration isn’t just for reading in files though; iteration can be used to perform other actions on objects. First, you will try iterating with a for loop.

You’re going to change each element of a list into a numeric data type and then put it back into the same element in the same list.

For this exercise, you will iterate using a for loop that takes list_of_df, which is a list of character vector, but the characters are actually numbers! You need to change the character vectors to numeric so that you can perform mathematical operations on them; you can use the base R function, as.numeric() to do that.

Check the class type of the first element of list_of_df.

list_of_df=lapply(1:10,function(x){1:4})
# Check the class type of the first element
class(list_of_df[[1]])

## [1] "integer"

Build a for loop that takes each element of list_of_df, changes it into numeric data with as.numeric(), and adds it back into the same element of list_of_df.

# Change each element from a character to a number
for(i in seq_along(list_of_df)){
    list_of_df[[i]] <- as.numeric(list_of_df[[i]])
}

Check the class type of the first element of list_of_df.

# Check the class type of the first element
class(list_of_df[[1]])

## [1] "numeric"

Print list_of_df.

# Print out the list
head(list_of_df)

## [[1]]
## [1] 1 2 3 4
## 
## [[2]]
## [1] 1 2 3 4
## 
## [[3]]
## [1] 1 2 3 4
## 
## [[4]]
## [1] 1 2 3 4
## 
## [[5]]
## [1] 1 2 3 4
## 
## [[6]]
## [1] 1 2 3 4

Nice! You can see from the output that we have a list of numbers now!

1.1.4 More iteration with purrr

Now you will change each element of a list into a numeric data type and then put it back into the same element in the same list, but instead of using a for loop, you’ll use map().

You can use the purrr function map() to more easily loop over a list, and turn the characters into numbers. Instead of having to build a whole for loop, you can use one line of code.

Check the class of the first element of list_of_df.

# Check the class type of the first element
class(list_of_df[[1]])

## [1] "numeric"

Use map() to iterate over list_of_df and change each element of the list into numeric data.

# Change each character element to a number
list_of_df <- map(list_of_df, as.numeric)

Check the class of the first element of list_of_df.

# Check the class type of the first element again
class(list_of_df[[1]])

## [1] "numeric"

Print out list_of_df.

# Print out the list
head(list_of_df)

## [[1]]
## [1] 1 2 3 4
## 
## [[2]]
## [1] 1 2 3 4
## 
## [[3]]
## [1] 1 2 3 4
## 
## [[4]]
## [1] 1 2 3 4
## 
## [[5]]
## [1] 1 2 3 4
## 
## [[6]]
## [1] 1 2 3 4

Good work! Now you can fix class type issues in your lists!

1.2 Subsetting lists

1.2.1 Subsetting lists

Often when working in R, you’ll use dataframes or vectors. Another kind of R object is a list. While lists can be complicated, lists are also incredibly powerful. Lists are like Hermione Granger’s bag of holding (from Harry Potter); they can hold a wide variety of things. The contents of a list don’t have to be the same data type, and as long as you know how it’s organized, you can grab out what you need by subsetting.

Both named and unnamed lists can be subset using double square brackets [[ ]] list this: listname[[ index ]]

If a list is named, you can also use $ for subsetting. The syntax list$elementname pulls out the named element from the list. Like any other kind of object in R, you can use the str() to determine the structure of the list.

Load the repurrrsive package.

# Load repurrrsive package, to get access to the wesanderson dataset
library(repurrrsive)

Load the wesanderson dataset.

# Load wesanderson dataset
data(wesanderson)

Examine the structure of the first element in wesanderson.

# Get structure of first element in wesanderson
str(wesanderson[[1]])

##  chr [1:4] "#F1BB7B" "#FD6467" "#5B1A18" "#D67236"

Examine the structure of the GrandBudapest element in wesanderson.

# Get structure of GrandBudapest element in wesanderson
str(wesanderson$GrandBudapest)

##  chr [1:4] "#F1BB7B" "#FD6467" "#5B1A18" "#D67236"

Good work! Now you can subset and determine the structure of each part of a named or unnamed list!

1.2.2 Subsetting list elements

You can also subset within list elements using bracket notation like this: ListName$ElementName[VectorNumber]. If a list element is a dataframe, you can pull out a column like this: ListName$ElementName$ColumnName or ListName[[1]][,1].

In this exercise, you’ll examine the wesanderson and sw_films datasets from the repurrrsive package. wesanderson contains color palettes for each of Wes Anderson’s movies. These colors are recorded in hexadecimal, that is, a # followed by six digits that indicate a particular color. Here, you will be using two ways of pulling out a particular color hexadecimal.

sw_films contains information about the films in the Star Wars franchise, such as title, director, producer, etc. You’ll use subsetting to explore this dataset.

Subset the third color from the first element of wesanderson. Then subset the fourth color from GrandBudapest.

# Third element of the first wesanderson vector
wesanderson[[1]][3]

## [1] "#5B1A18"

# Fourth element of the GrandBudapest wesanderson vector
wesanderson$GrandBudapest[4]

## [1] "#D67236"

Subset the first element from sw_films. Then subset the title element from the first element.

# Subset the first element of the sw_films data
sw_films[[1]]

## $title
## [1] "A New Hope"
## 
## $episode_id
## [1] 4
## 
## $opening_crawl
## [1] "It is a period of civil war.\r\nRebel spaceships, striking\r\nfrom a hidden base, have won\r\ntheir first victory against\r\nthe evil Galactic Empire.\r\n\r\nDuring the battle, Rebel\r\nspies managed to steal secret\r\nplans to the Empire's\r\nultimate weapon, the DEATH\r\nSTAR, an armored space\r\nstation with enough power\r\nto destroy an entire planet.\r\n\r\nPursued by the Empire's\r\nsinister agents, Princess\r\nLeia races home aboard her\r\nstarship, custodian of the\r\nstolen plans that can save her\r\npeople and restore\r\nfreedom to the galaxy...."
## 
## $director
## [1] "George Lucas"
## 
## $producer
## [1] "Gary Kurtz, Rick McCallum"
## 
## $release_date
## [1] "1977-05-25"
## 
## $characters
##  [1] "http://swapi.co/api/people/1/"  "http://swapi.co/api/people/2/" 
##  [3] "http://swapi.co/api/people/3/"  "http://swapi.co/api/people/4/" 
##  [5] "http://swapi.co/api/people/5/"  "http://swapi.co/api/people/6/" 
##  [7] "http://swapi.co/api/people/7/"  "http://swapi.co/api/people/8/" 
##  [9] "http://swapi.co/api/people/9/"  "http://swapi.co/api/people/10/"
## [11] "http://swapi.co/api/people/12/" "http://swapi.co/api/people/13/"
## [13] "http://swapi.co/api/people/14/" "http://swapi.co/api/people/15/"
## [15] "http://swapi.co/api/people/16/" "http://swapi.co/api/people/18/"
## [17] "http://swapi.co/api/people/19/" "http://swapi.co/api/people/81/"
## 
## $planets
## [1] "http://swapi.co/api/planets/2/" "http://swapi.co/api/planets/3/"
## [3] "http://swapi.co/api/planets/1/"
## 
## $starships
## [1] "http://swapi.co/api/starships/2/"  "http://swapi.co/api/starships/3/" 
## [3] "http://swapi.co/api/starships/5/"  "http://swapi.co/api/starships/9/" 
## [5] "http://swapi.co/api/starships/10/" "http://swapi.co/api/starships/11/"
## [7] "http://swapi.co/api/starships/12/" "http://swapi.co/api/starships/13/"
## 
## $vehicles
## [1] "http://swapi.co/api/vehicles/4/" "http://swapi.co/api/vehicles/6/"
## [3] "http://swapi.co/api/vehicles/7/" "http://swapi.co/api/vehicles/8/"
## 
## $species
## [1] "http://swapi.co/api/species/5/" "http://swapi.co/api/species/3/"
## [3] "http://swapi.co/api/species/2/" "http://swapi.co/api/species/1/"
## [5] "http://swapi.co/api/species/4/"
## 
## $created
## [1] "2014-12-10T14:23:31.880000Z"
## 
## $edited
## [1] "2015-04-11T09:46:52.774897Z"
## 
## $url
## [1] "http://swapi.co/api/films/1/"

# Subset the first element of the sw_films data, the title column 
sw_films[[1]]$title

## [1] "A New Hope"

Great work, now you should be very comfortable subsetting lists!

1.3 The many flavors of map()

1.3.1 map() argument alternatives

You can also use iteration to answer a question, like how long is each element in the wesanderson dataset. You can do this by feeding map() a function like length(). You can do this using the map(list, function) syntax and it works just fine. However, future exercises get more complex, you will need to learn how to do this second way, using:

map(list, ~function(.x))

This second way gives the same result as map(list, function). To specify how the list is used in the function, use the argument .x to denote where the list element goes inside the function. When you want to use .x to show where the element goes in the function, you need to put a ~ in front of the function in the second argument of map().

Use map() on wesanderson and determine the length of each element in the “old” way.

# Map over wesanderson to get the length of each element
map(wesanderson, length)

## $GrandBudapest
## [1] 4
## 
## $Moonrise1
## [1] 4
## 
## $Royal1
## [1] 4
## 
## $Moonrise2
## [1] 4
## 
## $Cavalcanti
## [1] 5
## 
## $Royal2
## [1] 5
## 
## $GrandBudapest2
## [1] 4
## 
## $Moonrise3
## [1] 5
## 
## $Chevalier
## [1] 4
## 
## $Zissou
## [1] 5
## 
## $FantasticFox
## [1] 5
## 
## $Darjeeling
## [1] 5
## 
## $Rushmore
## [1] 5
## 
## $BottleRocket
## [1] 7
## 
## $Darjeeling2
## [1] 5

Use map() on wesanderson and determine the length of each element again, but this time using map(list, ~function(.x)).

# Map over wesanderson, and determine the length of each element
map(wesanderson, ~length(.x))

## $GrandBudapest
## [1] 4
## 
## $Moonrise1
## [1] 4
## 
## $Royal1
## [1] 4
## 
## $Moonrise2
## [1] 4
## 
## $Cavalcanti
## [1] 5
## 
## $Royal2
## [1] 5
## 
## $GrandBudapest2
## [1] 4
## 
## $Moonrise3
## [1] 5
## 
## $Chevalier
## [1] 4
## 
## $Zissou
## [1] 5
## 
## $FantasticFox
## [1] 5
## 
## $Darjeeling
## [1] 5
## 
## $Rushmore
## [1] 5
## 
## $BottleRocket
## [1] 7
## 
## $Darjeeling2
## [1] 5

Great Job! This new way of writing map_*() functions will come in handy in future exercises, so make a mental note of the ~ and the .x argument.

1.3.2 map_*

The map() function will return its output as a list. However, there are several different map() functions; you can use map_() functions to tell purrr the type of output you want. The in map_*() represents different R data types. For instance, you might want the output to be a vector of numbers so that we can put it inside a dataframe. So, unless you want something to be returned as a list, you need to determine what you want the output to be before you write your map() function.

Determine the length of each element of the wesanderson dataset using our original map() function. Examine the output.

# Map over wesanderson, to determine the length of each element
map(wesanderson, length)

## $GrandBudapest
## [1] 4
## 
## $Moonrise1
## [1] 4
## 
## $Royal1
## [1] 4
## 
## $Moonrise2
## [1] 4
## 
## $Cavalcanti
## [1] 5
## 
## $Royal2
## [1] 5
## 
## $GrandBudapest2
## [1] 4
## 
## $Moonrise3
## [1] 5
## 
## $Chevalier
## [1] 4
## 
## $Zissou
## [1] 5
## 
## $FantasticFox
## [1] 5
## 
## $Darjeeling
## [1] 5
## 
## $Rushmore
## [1] 5
## 
## $BottleRocket
## [1] 7
## 
## $Darjeeling2
## [1] 5

Create a dataframe that has the number of colors from each movie, using map_dbl(). The dbl means a double or a number that can have a decimal.

# Create a numcolors column and fill with length of each wesanderson element
data.frame(numcolors = map_dbl(wesanderson, ~length(.x)))

##                numcolors
## GrandBudapest          4
## Moonrise1              4
## Royal1                 4
## Moonrise2              4
## Cavalcanti             5
## Royal2                 5
## GrandBudapest2         4
## Moonrise3              5
## Chevalier              4
## Zissou                 5
## FantasticFox           5
## Darjeeling             5
## Rushmore               5
## BottleRocket           7
## Darjeeling2            5

Good work! Notice how much cleaner the output was using map_dbl()! It’s always worth thinking through which map_*() function will get you where to need to go before coding it out. In our next chapter, we’ll dive into more complex uses of purrr.

2 More complex iterations

purrr is much more than a for loop; it works well with pipes, we can use it to run models and simulate data, and make nested loops!

2.1 Working with unnamed lists

2.1.1 Names & pipe refresher

It is easy to determine if a list has names using names(). Understanding the named elements of a list can make working with the list elements easier because you can pull out the information you need by name, instead of searching for the correct numbered element.

purrr is a part of the tidyverse, a system of packages designed to be used together, and used with pipes. Let’s do a quick refresh on how pipes work. A pipe %>% takes the output from the function that comes before it, and feeds it into the function that comes after the pipe as its first argument.

function_before() %>% 
    function_after()

You don’t need to use pipes when you use purrr functions, but for the purposes of these lessons, you will be.

Check to see if the sw_films list has named elements with pipes.

# Use pipes to check for names in sw_films
sw_films %>%
    names()

## NULL

Good work! Now that you know how to check to see if a list has names in a tidy way, you’re ready to dive in.

2.1.2 Setting names

If you have an unnamed list, you can, of course, name each element. This can be very useful for being able to call out certain elements in a list, regardless of their order, especially if you are working with a list that may grow or change over time, or if you use the same code on several different lists. For instance, if you have a list that contains, a dataframe, a model, and a plot, being able to call out $plot instead of searching to figure out what numbered element of the plot, is much easier.

With a piped workflow:

Name each element of sw_films list and assign to a new list, sw_films_named.
Iterate over the title element.

# Set names so each element of the list is named for the film title
sw_films_named <- sw_films %>% 
  set_names(map_chr(sw_films, "title"))

Check to make sure the new list has names.

# Check to see if the names worked/are correct
names(sw_films_named)

## [1] "A New Hope"              "Attack of the Clones"   
## [3] "The Phantom Menace"      "Revenge of the Sith"    
## [5] "Return of the Jedi"      "The Empire Strikes Back"
## [7] "The Force Awakens"

Good work! Naming lists makes working in purrr easier and more human-readable.

2.1.3 Pipes in map()

So you’ve refreshed your memory on how pipes can be used between functions. You can also use pipes on the inside of map() function to help you iterate a pipeline of tasks over a list of inputs.

Here instead of using one of the repurrrsive datasets, you will be working with a list of numbers so that you can do a few mathematical operations.

Create a list that contains the values 1 through 10, each as a separate element.

# Create a list of values from 1 through 10
numlist <- list(1,2,3,4,5,6,7,8,9,10)

Create a pipeline within one map() function that takes the sqrt() of each element, and then the sin() of each element.

# Iterate over the numlist 
map(numlist, ~.x %>% sqrt() %>% sin()) %>% head()

## [[1]]
## [1] 0.841471
## 
## [[2]]
## [1] 0.9877659
## 
## [[3]]
## [1] 0.9870266
## 
## [[4]]
## [1] 0.9092974
## 
## [[5]]
## [1] 0.7867491
## 
## [[6]]
## [1] 0.6381576

Good work! Using pipes inside of map() makes iterating over multiple functions easy.

2.2 More map()

2.2.1 Simulating Data with Purrr

Often when trying to solve a problem with data we first need to build some simulated data to see if our idea is even possible. For example, you may want to test models with data that have known differences, to see if the models are working correctly.

In this exercise, you will see how this works in purrr by simulating data for two populations, a and b, from the sites: “north”, “east”, and “west”. The two populations will be randomly drawn from a normal distribution, with different means and standard deviations.

Create a list of site names, “north”, “east”, and “west”.

# List of sites north, east, and west
sites <- list("north","east","west")

Then use map() to create a list of dataframes with three columns, the first column is sites.

The second is population a, which has a mean of 5, a sample size n of 200, and an sd of (5/2).
The third is population b, which has a mean of 200, a sample size n of 200, and an sd of 15.

# Create a list of dataframes, each with a years, a, and b column
list_of_df <-  map(sites,  
  ~data.frame(sites = .x,
              a = rnorm(mean = 5,   n = 200, sd = (5/2)),
              b = rnorm(mean = 200, n = 200, sd = 15)))

map(list_of_df,~head(.x))

## [[1]]
##   sites        a        b
## 1 north 4.614419 214.1417
## 2 north 4.362532 190.2344
## 3 north 1.858374 204.0246
## 4 north 6.746475 206.2674
## 5 north 0.405748 179.3074
## 6 north 7.383507 192.7544
## 
## [[2]]
##   sites         a        b
## 1  east  5.230285 188.8726
## 2  east  1.473186 179.3934
## 3  east -1.308601 203.1301
## 4  east  3.674980 215.1431
## 5  east  4.948778 209.1833
## 6  east  2.825842 214.3016
## 
## [[3]]
##   sites        a        b
## 1  west 1.982326 214.8582
## 2  west 3.490015 198.0501
## 3  west 5.558575 189.6605
## 4  west 1.867846 195.0652
## 5  west 2.367538 187.1540
## 6  west 3.542964 200.7730

Good work! Now you can simulate data with ease.

2.2.2 Run a linear model

You can use map() to do more than just take the square root of a number or simulate data. You can also use map() to loop over different inputs to run several models, each using the unique values of a given list element. You can also then iterate over the models you’ve run to create the model summaries and look at the results.

The lists sites and list_of_df are preloaded.

Pipe list_of_df into map() along with the lm() linear model function, to compare a as the response and b as the predictor variable.
- Use the syntax: lm(response ~ predictor, data = )
Then pipe the linear model output into map() and generate the summary() of each model.

# Map over the models to look at the relationship of a vs b
list_of_df %>%
    map(~ lm(a ~ b, data = .)) %>%
    map(summary)

## [[1]]
## 
## Call:
## lm(formula = a ~ b, data = .)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -6.660 -1.737 -0.003  1.536  5.950 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)   
## (Intercept)  5.749413   2.176642   2.641  0.00892 **
## b           -0.003124   0.010833  -0.288  0.77338   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.428 on 198 degrees of freedom
## Multiple R-squared:  0.0004197,  Adjusted R-squared:  -0.004629 
## F-statistic: 0.08314 on 1 and 198 DF,  p-value: 0.7734
## 
## 
## [[2]]
## 
## Call:
## lm(formula = a ~ b, data = .)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -10.3931  -1.9487   0.2521   1.7298   6.9519 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)  2.18230    2.53465   0.861    0.390
## b            0.01415    0.01255   1.127    0.261
## 
## Residual standard error: 2.625 on 198 degrees of freedom
## Multiple R-squared:  0.006377,   Adjusted R-squared:  0.001359 
## F-statistic: 1.271 on 1 and 198 DF,  p-value: 0.261
## 
## 
## [[3]]
## 
## Call:
## lm(formula = a ~ b, data = .)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.2321 -1.9745  0.0907  1.9569  7.5872 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)  1.45198    2.63640   0.551    0.582
## b            0.01647    0.01308   1.259    0.210
## 
## Residual standard error: 2.82 on 198 degrees of freedom
## Multiple R-squared:  0.007939,   Adjusted R-squared:  0.002928 
## F-statistic: 1.584 on 1 and 198 DF,  p-value: 0.2096

Good work! This will make running multiple models and summarizing their results much easier.

2.2.3 map_chr()

In this exercise, you’ll dive a bit deeper into the different map_() variants. The map() function always outputs a list. map_() outputs other kinds of information. Study the table below and make sure you’re clear on the type of output for each map_*() variant.

`map_*()`	Output
`map_chr()`	character vector
`map_lgl()`	logical vector [TRUE or FALSE]
`map_int()`	integer vector
`map_dbl()`	double vector

Compare the results of map() and map_chr() for the director named element sw_films.

# Pull out the director element of sw_films in a list and character vector
map(sw_films, ~.x[["director"]])

## [[1]]
## [1] "George Lucas"
## 
## [[2]]
## [1] "George Lucas"
## 
## [[3]]
## [1] "George Lucas"
## 
## [[4]]
## [1] "George Lucas"
## 
## [[5]]
## [1] "Richard Marquand"
## 
## [[6]]
## [1] "Irvin Kershner"
## 
## [[7]]
## [1] "J. J. Abrams"

map_chr(sw_films, ~.x[["director"]])

## [1] "George Lucas"     "George Lucas"     "George Lucas"     "George Lucas"    
## [5] "Richard Marquand" "Irvin Kershner"   "J. J. Abrams"

Compare the map() and map_lgl() outputs on sw_films for director == George Lucas.

# Compare outputs when checking if director is George Lucas
map(sw_films, ~.x[["director"]] == "George Lucas")

## [[1]]
## [1] TRUE
## 
## [[2]]
## [1] TRUE
## 
## [[3]]
## [1] TRUE
## 
## [[4]]
## [1] TRUE
## 
## [[5]]
## [1] FALSE
## 
## [[6]]
## [1] FALSE
## 
## [[7]]
## [1] FALSE

map_lgl(sw_films, ~.x[["director"]] == "George Lucas")

## [1]  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE

Good work! Mastering the flavors of map_*() is key for success in purrr.

2.2.4 map_dbl() and map_int()

Some flavors of map_() are very similar. map_dbl() and map_int() both output numbers. map_int() outputs integer vectors, which have numbers with no decimals. map_dbl() outputs double vectors, which have numbers that can have decimals. Take a closer look at how using different map_() functions affect outputs.

Here is the map_*() table again as a reference.

`map_*()`	Output
`map_chr()`	character vector
`map_lgl()`	logical vector [TRUE or FALSE]
`map_int()`	integer vector
`map_dbl()`	double vector

Compare the map() and map_dbl() outputs for pulling out the episode_id for each element of sw_films.

# Pull out episode_id element as list
map(sw_films, ~.x[["episode_id"]])

## [[1]]
## [1] 4
## 
## [[2]]
## [1] 2
## 
## [[3]]
## [1] 1
## 
## [[4]]
## [1] 3
## 
## [[5]]
## [1] 6
## 
## [[6]]
## [1] 5
## 
## [[7]]
## [1] 7

# Pull out episode_id element as double vector
map_dbl(sw_films, ~.x[["episode_id"]])

## [1] 4 2 1 3 6 5 7

Compare the map() and map_int() outputs for pulling out the episode_id for each element of sw_films.

# Pull out episode_id element as a list
map(sw_films, ~.x[["episode_id"]])

## [[1]]
## [1] 4
## 
## [[2]]
## [1] 2
## 
## [[3]]
## [1] 1
## 
## [[4]]
## [1] 3
## 
## [[5]]
## [1] 6
## 
## [[6]]
## [1] 5
## 
## [[7]]
## [1] 7

# Pull out episode_id element as integer vector
map_int(sw_films, ~.x[["episode_id"]])

## [1] 4 2 1 3 6 5 7

Good work! Now you can output numbers without decimals!

2.3 map2() and pmap()

2.3.1 Simulating data with multiple inputs using map2()

The map() function is great if you need to iterate over one list, however, you will often need to iterate over two lists at the same time. This is where map2() comes in. While map() takes the list as the .x argument; map2() takes two lists as two arguments: .x and .y.

To test out map2(), you are going to create a simple dataset, with one list of numbers and one list of strings. You will put these two lists together and create some simulated data.

Create a list, means, of the values 1 through 3.

# List of 1, 2 and 3
means <- list(1,2,3)

Create a sites list with “north”, “west”, and “east”.

# Create sites list
sites <- list("north","west","east")

map2() over the sites and means lists to create a dataframe with two columns.

First column is sites; second column is generated by rnorm() with mean from the means list.

# Map over two arguments: sites and means
list_of_files_map2 <- map2(sites, means, ~data.frame(sites = .x,
                             a = rnorm(mean = .y, n = 200, sd = (5/2))))


map(list_of_files_map2,~head(.x))

## [[1]]
##   sites         a
## 1 north  1.573448
## 2 north -1.371510
## 3 north -2.659407
## 4 north -3.406760
## 5 north  1.178419
## 6 north  4.325915
## 
## [[2]]
##   sites          a
## 1  west -0.3739373
## 2  west  2.0272414
## 3  west  6.0924047
## 4  west  0.2325518
## 5  west  3.3310114
## 6  west  4.0336696
## 
## [[3]]
##   sites         a
## 1  east 0.6694428
## 2  east 4.7842952
## 3  east 1.0683827
## 4  east 2.4413813
## 5  east 1.7298603
## 6  east 1.1281749

Good work! Now you can you two lists together!

2.3.2 Simulating data 3+ inputs with pmap()

What if you need to iterate over three lists? Is there a map3()? To iterate over more than two lists, whether it’s three, four, or even 20, you’ll need to use pmap(). However, pmap() does require us to supply our list arguments a bit differently.

To use pmap(), you first need to create a master list of all the lists we want to iterate over. The master list is the input for pmap(). Instead of using .x or .y, use the list names as the argument names.

You are going to simulate data one more time, using five lists as inputs, instead of two. Using pmap() gives you complete control over our simulated dataset, and will allow you to use two different means and two different standard deviations along with the different sites.

Create a named list containing the sites, means, means2, sigma, and sigma2 lists.

means2=list(0.5,1,1.5)
sigma2=list(0.5,1,1.5)
sigma=list(1,2,3)
# Create a master list, a list of lists
pmapinputs <- list(sites = sites, means = means, sigma = sigma, 
                   means2 = means2, sigma2 = sigma2)

pmap() over the list of lists, to create a list of dataframes with three columns; the first column is sites.

The second column is a, which is rnorm() with mean = means, and sd = sigma.
The third column is b, which is rnorm() with mean = means2, and sd = sigma2.

# Create a master list, a list of lists
pmapinputs <- list(sites = sites, means = means, sigma = sigma, 
                   means2 = means2, sigma2 = sigma2)

# Map over the master list
list_of_files_pmap <- pmap(pmapinputs, 
  function(sites, means, sigma, means2, sigma2){
    data.frame(sites = sites,
        a = rnorm(mean = means,  n = 200, sd = sigma),
        b = rnorm(mean = means2, n = 200, sd = sigma2))})

map(list_of_files_pmap,~head(.x))

## [[1]]
##   sites           a           b
## 1 north  2.06084408  0.06552980
## 2 north  0.81124501  0.04062135
## 3 north -0.09554456 -0.53030463
## 4 north  1.84663711 -0.08551129
## 5 north  2.28165089  0.29531221
## 6 north  0.90107028  0.47888973
## 
## [[2]]
##   sites           a           b
## 1  west  1.01714878  0.74688682
## 2  west  0.06912261 -1.39160128
## 3  west  2.05448629  0.75707038
## 4  west  2.50355372 -0.07958812
## 5  west -0.02759826  1.89485152
## 6  west  1.75296353  1.91032890
## 
## [[3]]
##   sites          a          b
## 1  east -0.9149364 -0.7953880
## 2  east  6.9436878  0.7516244
## 3  east  6.7850664 -0.7958258
## 4  east  4.7084648  1.4231632
## 5  east  1.2073573  4.6877686
## 6  east  5.5571927  2.5614575

Good work! With pmap() you now have all the power in purrr.

3 Troubleshooting lists with purrr

Like anything in R, understanding how to troubleshoot issues is an important skill set. This can be particularly important with lists, where finding the problem can be tricky.

3.1 How to purrr safely()

3.1.1 safely() replace with NA

If you map() over a list, and one of the elements does not have the right data type, you will not get the output you expect. Perhaps you are trying to do a mathematical operation on each element, and it turns out one of the elements is a character - it simply won’t work.

If you have a very large list, figuring out where things went wrong, and what exactly went wrong can be hard. That is where safely() comes in; it shows you both your results and where the errors occurred in your map() call.

Use safely() with log(). This will fail to work on -10, so we’ll pipe it into transpose() to put the results first.

# Map safely over log
a <- list(-10, 1, 10, 0) %>% 
      map(safely(log, otherwise = NA_real_)) %>%
    # Transpose the result
      transpose()

## Warning in .f(...): NaNs produced

Print out a.

# Print the list
a

## $result
## $result[[1]]
## [1] NaN
## 
## $result[[2]]
## [1] 0
## 
## $result[[3]]
## [1] 2.302585
## 
## $result[[4]]
## [1] -Inf
## 
## 
## $error
## $error[[1]]
## NULL
## 
## $error[[2]]
## NULL
## 
## $error[[3]]
## NULL
## 
## $error[[4]]
## NULL

Print out the “result” element of a.

# Print the result element in the list
a[["result"]]

## [[1]]
## [1] NaN
## 
## [[2]]
## [1] 0
## 
## [[3]]
## [1] 2.302585
## 
## [[4]]
## [1] -Inf

Print out just the error messages from a.

# Print the error element in the list
a[["error"]]

## [[1]]
## NULL
## 
## [[2]]
## NULL
## 
## [[3]]
## NULL
## 
## [[4]]
## NULL

Good work! Now you have the power to start debugging your lists, and you can do it with simple element subsetting.

3.1.2 Convert data to numeric with purrr

In the sw_people dataset, some of the Star Wars characters have unknown heights. If you want to do some data exploration and determine how character height differs depending on their home planet, you need to write your code so that R understands the difference between heights and missing values. Currently, the missing values are entered as “unknown”, but you would like them as NA. In this exercise, you will combine map() and ifelse() to fix this issue.

Load the sw_people dataset.

# Load sw_people data
data(sw_people)

Map over sw_people and pull out “height”.

Then map over the output and if an element is labeled as “unknown” change it to NA, otherwise, convert the value into a number with as.numeric().

# Map over sw_people and pull out the height element
height_cm <- map(sw_people, "height") %>%
  map(function(x){
    ifelse(x == "unknown",NA,
    as.numeric(x))
})

Good work! Now you can use purrr for data wrangling to help clean numeric data in lists.

3.1.3 Finding the problem areas

When you are working with a small list, it might not seem like a lot of work to go through things manually and figure out what element has an issue. But if you have a list with hundreds or thousands of elements, you want to automate that process.

Now you’ll look at a situation with a larger list, where you can see how the error message can be useful to check through the entire list for issues.

map() over sw_people and pull out the “height” element.

map() over safely() to convert the heights from centimeters into feet.

Set quiet = FALSE so that errors are printed.

# Map over sw_people and pull out the height element
height_ft <- map(sw_people, "height")  %>% 
  map(safely(function(x){
    x * 0.0328084
  }, quiet = FALSE)) %>%
transpose()

Pipe into transpose(), to print the results first.

# Print your list, the result element, and the error element
#height_ft
#height_ft[["result"]]
#height_ft[["error"]]

Good work! Now you are ready to troubleshoot lists too large to check by hand.

3.2 Another way to possibly() purrr

3.2.1 Replace safely() with possibly()

Once you have figured out how to solve an issue with safely(), (e.g., output an NA in place of an error), swap out safely() with possibly(). possibly() will run through your code and implement your desired changes without printing out the error messages.

You’ll now map() over log() again, but you will use possibly() instead of safely() since you already know how to resolve your errors.

Create a list with the values -10, 1, 10, and 0.
map() over this list to take the log() of each element, using possibly().
Use NA_real_ to fix any elements that are not the right data type.

# Take the log of each element in the list
a <- list(-10, 1, 10, 0) %>% 
  map(possibly(function(x){
    log(x)
},NA_real_))

## Warning in log(x): NaNs produced

Good work! Now you can solve issues in lists using safely(), and then continue with your analysis using possibly().

3.2.2 Convert values with possibly()

Let’s say you need to convert the Star Wars character heights in sw_people from centimeters to feet. You already know that some of the heights have missing data, so you will use possibly() to convert missing values into NA. Then you will multiply each of the existing values by 0.0328084 to convert them from centimeters into feet.

To get a feel for your data, print out height_cm in the console to check out the heights in centimeters.

Pipe the height_cm object into a map_*() function that returns double vectors.
Convert each element in height_cm into feet (multiply it by 0.0328084).
Since not all elements are numeric, use possibly() to replace instances that do not work with NA_real_.

# Create a piped workflow that returns double vectors
height_cm %>%  
  map_dbl(possibly(function(x){
  # Convert centimeters to feet
  x * 0.0328084
}, NA_real_))

##  [1] 5.643045 5.479003 3.149606 6.627297 4.921260 5.839895 5.413386 3.182415
##  [9] 6.003937 5.971129 6.167979 5.905512 7.480315 5.905512 5.675853 5.741470
## [17] 5.577428 5.905512 2.165354 5.577428 6.003937 6.561680 6.233596 5.807087
## [25] 5.741470 5.905512 4.921260       NA 2.887139 5.249344 6.332021 6.266404
## [33] 5.577428 6.430446 7.349082 6.758530 6.003937 4.494751 3.674541 6.003937
## [41] 5.347769 5.741470 5.905512 5.839895 3.083990 4.002625 5.347769 6.167979
## [49] 6.496063 6.430446 5.610236 6.036746 6.167979 8.661418 6.167979 6.430446
## [57] 6.069554 5.150919 6.003937 6.003937 5.577428 5.446194 5.413386 6.332021
## [65] 6.266404 6.003937 5.511811 6.496063 7.513124 6.988189 5.479003 2.591864
## [73] 3.149606 6.332021 6.266404 5.839895 7.086614 7.677166 6.167979 5.839895
## [81] 6.758530       NA       NA       NA       NA       NA 5.413386

Good work! Using possibly() helps us work with problem data in a really clean and efficient way.

3.3 purrr is a walk() in the park

3.3.1 Comparing walk() vs no walk() outputs

Printing out lists with map() shows a lot of bracketed text in the console, which can be useful for understanding their structure, but this information is usually not important for communicating with your end users. If you need to print, using walk() prints out lists in a more compact and human-readable way, without all those brackets. walk() is also great for printing out plots without printing anything to the console.

Here, you’ll be using the people_by_film dataset, which dataset derived from sw_films that has the url of each character and the film they appear in.

Print people_by_film to the console.

# Print normally
people_by_film=read.csv("https://docs.google.com/spreadsheets/d/e/2PACX-1vRObsvb_OQ7qeXRvkTEbWBbQcYfyebglhoxAt9cIdRzH7Exf5s-mMqSgtjkHC0qNgK4PVsku7Q0bwfS/pub?gid=0&single=true&output=csv")
people_by_film %>% head()

##                             url                     film_url
## 1 http://swapi.co/api/people/1/ http://swapi.co/api/films/6/
## 2 http://swapi.co/api/people/1/ http://swapi.co/api/films/3/
## 3 http://swapi.co/api/people/1/ http://swapi.co/api/films/2/
## 4 http://swapi.co/api/people/1/ http://swapi.co/api/films/1/
## 5 http://swapi.co/api/people/1/ http://swapi.co/api/films/7/
## 6 http://swapi.co/api/people/2/ http://swapi.co/api/films/5/

Print out people_by_film using walk() and print().

# Print with walk
walk(people_by_film, print)

##   [1] "http://swapi.co/api/people/1/"  "http://swapi.co/api/people/1/" 
##   [3] "http://swapi.co/api/people/1/"  "http://swapi.co/api/people/1/" 
##   [5] "http://swapi.co/api/people/1/"  "http://swapi.co/api/people/2/" 
##   [7] "http://swapi.co/api/people/2/"  "http://swapi.co/api/people/2/" 
##   [9] "http://swapi.co/api/people/2/"  "http://swapi.co/api/people/2/" 
##  [11] "http://swapi.co/api/people/2/"  "http://swapi.co/api/people/3/" 
##  [13] "http://swapi.co/api/people/3/"  "http://swapi.co/api/people/3/" 
##  [15] "http://swapi.co/api/people/3/"  "http://swapi.co/api/people/3/" 
##  [17] "http://swapi.co/api/people/3/"  "http://swapi.co/api/people/3/" 
##  [19] "http://swapi.co/api/people/4/"  "http://swapi.co/api/people/4/" 
##  [21] "http://swapi.co/api/people/4/"  "http://swapi.co/api/people/4/" 
##  [23] "http://swapi.co/api/people/5/"  "http://swapi.co/api/people/5/" 
##  [25] "http://swapi.co/api/people/5/"  "http://swapi.co/api/people/5/" 
##  [27] "http://swapi.co/api/people/5/"  "http://swapi.co/api/people/6/" 
##  [29] "http://swapi.co/api/people/6/"  "http://swapi.co/api/people/6/" 
##  [31] "http://swapi.co/api/people/7/"  "http://swapi.co/api/people/7/" 
##  [33] "http://swapi.co/api/people/7/"  "http://swapi.co/api/people/8/" 
##  [35] "http://swapi.co/api/people/9/"  "http://swapi.co/api/people/10/"
##  [37] "http://swapi.co/api/people/10/" "http://swapi.co/api/people/10/"
##  [39] "http://swapi.co/api/people/10/" "http://swapi.co/api/people/10/"
##  [41] "http://swapi.co/api/people/10/" "http://swapi.co/api/people/11/"
##  [43] "http://swapi.co/api/people/11/" "http://swapi.co/api/people/11/"
##  [45] "http://swapi.co/api/people/12/" "http://swapi.co/api/people/12/"
##  [47] "http://swapi.co/api/people/13/" "http://swapi.co/api/people/13/"
##  [49] "http://swapi.co/api/people/13/" "http://swapi.co/api/people/13/"
##  [51] "http://swapi.co/api/people/13/" "http://swapi.co/api/people/14/"
##  [53] "http://swapi.co/api/people/14/" "http://swapi.co/api/people/14/"
##  [55] "http://swapi.co/api/people/14/" "http://swapi.co/api/people/15/"
##  [57] "http://swapi.co/api/people/16/" "http://swapi.co/api/people/16/"
##  [59] "http://swapi.co/api/people/16/" "http://swapi.co/api/people/18/"
##  [61] "http://swapi.co/api/people/18/" "http://swapi.co/api/people/18/"
##  [63] "http://swapi.co/api/people/19/" "http://swapi.co/api/people/20/"
##  [65] "http://swapi.co/api/people/20/" "http://swapi.co/api/people/20/"
##  [67] "http://swapi.co/api/people/20/" "http://swapi.co/api/people/20/"
##  [69] "http://swapi.co/api/people/21/" "http://swapi.co/api/people/21/"
##  [71] "http://swapi.co/api/people/21/" "http://swapi.co/api/people/21/"
##  [73] "http://swapi.co/api/people/21/" "http://swapi.co/api/people/22/"
##  [75] "http://swapi.co/api/people/22/" "http://swapi.co/api/people/22/"
##  [77] "http://swapi.co/api/people/23/" "http://swapi.co/api/people/24/"
##  [79] "http://swapi.co/api/people/25/" "http://swapi.co/api/people/25/"
##  [81] "http://swapi.co/api/people/26/" "http://swapi.co/api/people/27/"
##  [83] "http://swapi.co/api/people/27/" "http://swapi.co/api/people/28/"
##  [85] "http://swapi.co/api/people/29/" "http://swapi.co/api/people/30/"
##  [87] "http://swapi.co/api/people/31/" "http://swapi.co/api/people/32/"
##  [89] "http://swapi.co/api/people/33/" "http://swapi.co/api/people/33/"
##  [91] "http://swapi.co/api/people/33/" "http://swapi.co/api/people/34/"
##  [93] "http://swapi.co/api/people/36/" "http://swapi.co/api/people/36/"
##  [95] "http://swapi.co/api/people/37/" "http://swapi.co/api/people/38/"
##  [97] "http://swapi.co/api/people/39/" "http://swapi.co/api/people/40/"
##  [99] "http://swapi.co/api/people/40/" "http://swapi.co/api/people/41/"
## [101] "http://swapi.co/api/people/42/" "http://swapi.co/api/people/43/"
## [103] "http://swapi.co/api/people/43/" "http://swapi.co/api/people/44/"
## [105] "http://swapi.co/api/people/45/" "http://swapi.co/api/people/46/"
## [107] "http://swapi.co/api/people/46/" "http://swapi.co/api/people/46/"
## [109] "http://swapi.co/api/people/48/" "http://swapi.co/api/people/49/"
## [111] "http://swapi.co/api/people/50/" "http://swapi.co/api/people/51/"
## [113] "http://swapi.co/api/people/51/" "http://swapi.co/api/people/51/"
## [115] "http://swapi.co/api/people/52/" "http://swapi.co/api/people/52/"
## [117] "http://swapi.co/api/people/52/" "http://swapi.co/api/people/53/"
## [119] "http://swapi.co/api/people/53/" "http://swapi.co/api/people/53/"
## [121] "http://swapi.co/api/people/54/" "http://swapi.co/api/people/54/"
## [123] "http://swapi.co/api/people/55/" "http://swapi.co/api/people/55/"
## [125] "http://swapi.co/api/people/56/" "http://swapi.co/api/people/56/"
## [127] "http://swapi.co/api/people/57/" "http://swapi.co/api/people/58/"
## [129] "http://swapi.co/api/people/58/" "http://swapi.co/api/people/58/"
## [131] "http://swapi.co/api/people/59/" "http://swapi.co/api/people/59/"
## [133] "http://swapi.co/api/people/60/" "http://swapi.co/api/people/61/"
## [135] "http://swapi.co/api/people/62/" "http://swapi.co/api/people/63/"
## [137] "http://swapi.co/api/people/63/" "http://swapi.co/api/people/64/"
## [139] "http://swapi.co/api/people/64/" "http://swapi.co/api/people/65/"
## [141] "http://swapi.co/api/people/66/" "http://swapi.co/api/people/67/"
## [143] "http://swapi.co/api/people/67/" "http://swapi.co/api/people/68/"
## [145] "http://swapi.co/api/people/68/" "http://swapi.co/api/people/69/"
## [147] "http://swapi.co/api/people/70/" "http://swapi.co/api/people/71/"
## [149] "http://swapi.co/api/people/72/" "http://swapi.co/api/people/73/"
## [151] "http://swapi.co/api/people/74/" "http://swapi.co/api/people/47/"
## [153] "http://swapi.co/api/people/75/" "http://swapi.co/api/people/75/"
## [155] "http://swapi.co/api/people/76/" "http://swapi.co/api/people/77/"
## [157] "http://swapi.co/api/people/78/" "http://swapi.co/api/people/78/"
## [159] "http://swapi.co/api/people/79/" "http://swapi.co/api/people/80/"
## [161] "http://swapi.co/api/people/81/" "http://swapi.co/api/people/81/"
## [163] "http://swapi.co/api/people/82/" "http://swapi.co/api/people/82/"
## [165] "http://swapi.co/api/people/83/" "http://swapi.co/api/people/84/"
## [167] "http://swapi.co/api/people/85/" "http://swapi.co/api/people/86/"
## [169] "http://swapi.co/api/people/87/" "http://swapi.co/api/people/88/"
## [171] "http://swapi.co/api/people/35/" "http://swapi.co/api/people/35/"
## [173] "http://swapi.co/api/people/35/"
##   [1] "http://swapi.co/api/films/6/" "http://swapi.co/api/films/3/"
##   [3] "http://swapi.co/api/films/2/" "http://swapi.co/api/films/1/"
##   [5] "http://swapi.co/api/films/7/" "http://swapi.co/api/films/5/"
##   [7] "http://swapi.co/api/films/4/" "http://swapi.co/api/films/6/"
##   [9] "http://swapi.co/api/films/3/" "http://swapi.co/api/films/2/"
##  [11] "http://swapi.co/api/films/1/" "http://swapi.co/api/films/5/"
##  [13] "http://swapi.co/api/films/4/" "http://swapi.co/api/films/6/"
##  [15] "http://swapi.co/api/films/3/" "http://swapi.co/api/films/2/"
##  [17] "http://swapi.co/api/films/1/" "http://swapi.co/api/films/7/"
##  [19] "http://swapi.co/api/films/6/" "http://swapi.co/api/films/3/"
##  [21] "http://swapi.co/api/films/2/" "http://swapi.co/api/films/1/"
##  [23] "http://swapi.co/api/films/6/" "http://swapi.co/api/films/3/"
##  [25] "http://swapi.co/api/films/2/" "http://swapi.co/api/films/1/"
##  [27] "http://swapi.co/api/films/7/" "http://swapi.co/api/films/5/"
##  [29] "http://swapi.co/api/films/6/" "http://swapi.co/api/films/1/"
##  [31] "http://swapi.co/api/films/5/" "http://swapi.co/api/films/6/"
##  [33] "http://swapi.co/api/films/1/" "http://swapi.co/api/films/1/"
##  [35] "http://swapi.co/api/films/1/" "http://swapi.co/api/films/5/"
##  [37] "http://swapi.co/api/films/4/" "http://swapi.co/api/films/6/"
##  [39] "http://swapi.co/api/films/3/" "http://swapi.co/api/films/2/"
##  [41] "http://swapi.co/api/films/1/" "http://swapi.co/api/films/5/"
##  [43] "http://swapi.co/api/films/4/" "http://swapi.co/api/films/6/"
##  [45] "http://swapi.co/api/films/6/" "http://swapi.co/api/films/1/"
##  [47] "http://swapi.co/api/films/6/" "http://swapi.co/api/films/3/"
##  [49] "http://swapi.co/api/films/2/" "http://swapi.co/api/films/1/"
##  [51] "http://swapi.co/api/films/7/" "http://swapi.co/api/films/3/"
##  [53] "http://swapi.co/api/films/2/" "http://swapi.co/api/films/1/"
##  [55] "http://swapi.co/api/films/7/" "http://swapi.co/api/films/1/"
##  [57] "http://swapi.co/api/films/4/" "http://swapi.co/api/films/3/"
##  [59] "http://swapi.co/api/films/1/" "http://swapi.co/api/films/3/"
##  [61] "http://swapi.co/api/films/2/" "http://swapi.co/api/films/1/"
##  [63] "http://swapi.co/api/films/1/" "http://swapi.co/api/films/5/"
##  [65] "http://swapi.co/api/films/4/" "http://swapi.co/api/films/6/"
##  [67] "http://swapi.co/api/films/3/" "http://swapi.co/api/films/2/"
##  [69] "http://swapi.co/api/films/5/" "http://swapi.co/api/films/4/"
##  [71] "http://swapi.co/api/films/6/" "http://swapi.co/api/films/3/"
##  [73] "http://swapi.co/api/films/2/" "http://swapi.co/api/films/5/"
##  [75] "http://swapi.co/api/films/3/" "http://swapi.co/api/films/2/"
##  [77] "http://swapi.co/api/films/2/" "http://swapi.co/api/films/2/"
##  [79] "http://swapi.co/api/films/3/" "http://swapi.co/api/films/2/"
##  [81] "http://swapi.co/api/films/2/" "http://swapi.co/api/films/3/"
##  [83] "http://swapi.co/api/films/7/" "http://swapi.co/api/films/3/"
##  [85] "http://swapi.co/api/films/3/" "http://swapi.co/api/films/3/"
##  [87] "http://swapi.co/api/films/3/" "http://swapi.co/api/films/4/"
##  [89] "http://swapi.co/api/films/5/" "http://swapi.co/api/films/4/"
##  [91] "http://swapi.co/api/films/6/" "http://swapi.co/api/films/4/"
##  [93] "http://swapi.co/api/films/5/" "http://swapi.co/api/films/4/"
##  [95] "http://swapi.co/api/films/4/" "http://swapi.co/api/films/4/"
##  [97] "http://swapi.co/api/films/4/" "http://swapi.co/api/films/5/"
##  [99] "http://swapi.co/api/films/4/" "http://swapi.co/api/films/4/"
## [101] "http://swapi.co/api/films/4/" "http://swapi.co/api/films/5/"
## [103] "http://swapi.co/api/films/4/" "http://swapi.co/api/films/4/"
## [105] "http://swapi.co/api/films/3/" "http://swapi.co/api/films/5/"
## [107] "http://swapi.co/api/films/4/" "http://swapi.co/api/films/6/"
## [109] "http://swapi.co/api/films/4/" "http://swapi.co/api/films/4/"
## [111] "http://swapi.co/api/films/4/" "http://swapi.co/api/films/5/"
## [113] "http://swapi.co/api/films/4/" "http://swapi.co/api/films/6/"
## [115] "http://swapi.co/api/films/5/" "http://swapi.co/api/films/4/"
## [117] "http://swapi.co/api/films/6/" "http://swapi.co/api/films/5/"
## [119] "http://swapi.co/api/films/4/" "http://swapi.co/api/films/6/"
## [121] "http://swapi.co/api/films/4/" "http://swapi.co/api/films/6/"
## [123] "http://swapi.co/api/films/4/" "http://swapi.co/api/films/6/"
## [125] "http://swapi.co/api/films/4/" "http://swapi.co/api/films/6/"
## [127] "http://swapi.co/api/films/4/" "http://swapi.co/api/films/5/"
## [129] "http://swapi.co/api/films/4/" "http://swapi.co/api/films/6/"
## [131] "http://swapi.co/api/films/5/" "http://swapi.co/api/films/4/"
## [133] "http://swapi.co/api/films/5/" "http://swapi.co/api/films/5/"
## [135] "http://swapi.co/api/films/5/" "http://swapi.co/api/films/5/"
## [137] "http://swapi.co/api/films/6/" "http://swapi.co/api/films/5/"
## [139] "http://swapi.co/api/films/6/" "http://swapi.co/api/films/5/"
## [141] "http://swapi.co/api/films/5/" "http://swapi.co/api/films/5/"
## [143] "http://swapi.co/api/films/6/" "http://swapi.co/api/films/5/"
## [145] "http://swapi.co/api/films/6/" "http://swapi.co/api/films/5/"
## [147] "http://swapi.co/api/films/5/" "http://swapi.co/api/films/5/"
## [149] "http://swapi.co/api/films/5/" "http://swapi.co/api/films/5/"
## [151] "http://swapi.co/api/films/5/" "http://swapi.co/api/films/4/"
## [153] "http://swapi.co/api/films/5/" "http://swapi.co/api/films/6/"
## [155] "http://swapi.co/api/films/5/" "http://swapi.co/api/films/5/"
## [157] "http://swapi.co/api/films/5/" "http://swapi.co/api/films/6/"
## [159] "http://swapi.co/api/films/6/" "http://swapi.co/api/films/6/"
## [161] "http://swapi.co/api/films/6/" "http://swapi.co/api/films/1/"
## [163] "http://swapi.co/api/films/5/" "http://swapi.co/api/films/6/"
## [165] "http://swapi.co/api/films/6/" "http://swapi.co/api/films/7/"
## [167] "http://swapi.co/api/films/7/" "http://swapi.co/api/films/7/"
## [169] "http://swapi.co/api/films/7/" "http://swapi.co/api/films/7/"
## [171] "http://swapi.co/api/films/5/" "http://swapi.co/api/films/4/"
## [173] "http://swapi.co/api/films/6/"

Good work! Now you can use walk() to make your outputs cleaner and more human-readable.

3.3.2 walk() for printing cleaner list outputs

Now you will try one more use of walk(), specifically creating plots using walk(). In the previous exercise, you printed some lists, and you saw that printing lists is much cleaner using walk() than using the base R way. You can also use walk() to display multiple plots sequentially.

Here, use your map() knowledge along with ggplot2 functions to create a graph for the first ten elements of gap_split and then display each graph with walk().

Load the gap_split dataset.

# Load the gap_split data
data(gap_split)

map2() over the first 10 elements of gap_split, and the first 10 names of gap_split.

# Map over the first 10 elements of gap_split
plots <- map2(gap_split[1:10], 
              names(gap_split[1:10]), 
              ~ ggplot(.x, aes(year, lifeExp)) + 
                geom_line() +
                labs(title = .y))

Then walk() over the new plots object and supply print() as an argument to print all plots.

# Object name, then function name
walk(plots, print)

Good work! Now you can print out multiple plots easily using walk().

4 Problem solving with purrr

Now that you have the building blocks, we will start tackling some more complex data problems with purrr.

4.1 Using purrr in your workflow

4.1.1 Name review

Now, you’ll quickly review how to check if a list has names, and how to pull out a specific element from a list. Remember, you can use the names() function to see if a list is named. There are several ways to extract a named element from a list, but the key difference when working with dataframes is to remember the [[double bracket]] syntax.

Load the gh_users data.

# Load the data
data(gh_users)

Examine the names of gh_users.

# Check if data has names
names(gh_users)

## NULL

Extract the names for each element of gh_users.

# Map over name element of list
map(gh_users, ~.x[["name"]])

## [[1]]
## [1] "Gábor Csárdi"
## 
## [[2]]
## [1] "Jennifer (Jenny) Bryan"
## 
## [[3]]
## [1] "Jeff L."
## 
## [[4]]
## [1] "Julia Silge"
## 
## [[5]]
## [1] "Thomas J. Leeper"
## 
## [[6]]
## [1] "Maëlle Salmon"

Good work, now we have refreshed the basics of named lists, we can dive into our next task.

4.1.2 Setting names

Setting list names makes working with lists much easier in many scenarios; it makes the code easier to read, which is especially important when reviewing code weeks or months later.

Here you are going to work with the gh_repos and gh_users datasets and set their names in two different ways. The two methods will give the same result: a list with named elements.

Set the names on gh_users using the “name” element and use the map_*() function that outputs a character vector.

# Name gh_users with the names of the users
gh_users_named <- gh_users %>% 
    set_names(map_chr(gh_users, "name"))

Explore the structure of gh_repos to see where the owner info is stored.

# Check gh_repos structure
#str(gh_repos)

Set the names of a new list gh_repos_named based on the login of the owner of the repo, using the set_names() and map_*() functions.

# Name gh_repos with the names of the repo owner
gh_repos_named <- gh_repos %>% 
    map_chr(~ .[[1]]$owner$login) %>% 
    set_names(gh_repos, .)

Good work! Sometimes list naming is tricky but purrr makes it simpler by easily extracting the element we want to use as the names.

4.1.3 Asking questions from a list

One of the great things about purrr is you can easily move from having a question about the data to an answer, with just a few lines of code. Here you are going to use the gh_users data to ask three questions:

Which user joined GitHub first?
Are all the repositories user-owned, rather than organization-owned?
Which user has the most public repositories?

In this exercise, your map_*() knowledge is really tested, so make sure to reflect on all the different flavors of map_*() and how they should be used.

Name gh_users with the “name” element and sort the “created_at” element to determine who joined GitHub first.

# Determine who joined github first
map_chr(gh_users, ~.x[["created_at"]]) %>%
      set_names(map_chr(gh_users, "name")) %>%
    sort()

## Jennifer (Jenny) Bryan           Gábor Csárdi                Jeff L. 
## "2011-02-03T22:37:41Z" "2011-03-09T17:29:25Z" "2012-03-24T18:16:43Z" 
##       Thomas J. Leeper          Maëlle Salmon            Julia Silge 
## "2013-02-07T21:07:00Z" "2014-08-05T08:10:04Z" "2015-05-19T02:51:23Z"

Output a vector that returns TRUE for each element where the “type” is “USER”.

# Determine user versus organization
map_lgl(gh_users, ~.x[["type"]] == "User")

## [1] TRUE TRUE TRUE TRUE TRUE TRUE

Output a named numeric vector of the number of “public_repos”.

# Determine who has the most public repositories
map_int(gh_users, ~.x[["public_repos"]]) %>%
      set_names(map_chr(gh_users, "name")) %>%
    sort()

##            Julia Silge          Maëlle Salmon           Gábor Csárdi 
##                     26                     31                     52 
##                Jeff L.       Thomas J. Leeper Jennifer (Jenny) Bryan 
##                     67                     99                    168

Good work! Now you can use functions you already know to ask any question of your data in just a few lines of code.

4.2 Even more complex problems

Questions about gh_repos

You’re going to use gh_repos again, a list where each element is information about a GitHub repository. Here you will use map() and map_dbl() to answer the question:

Which repository is the largest?’

GitHub repository size is measured in megabytes. This information could be useful to document if you are working with a list based dataset that changes over time, and need to be able to pull out information, like the largest repository, in the most recent dataset.

map() over gh_repos.
map_dbl() over the `“size” element.
Then map() to determine which repo is the largest.

# Map over gh_repos to generate numeric output
map(gh_repos,
    ~map_dbl(.x, 
             ~.x[["size"]])) %>%
    # Grab the largest element
    map(~max(.x))

## [[1]]
## [1] 39461
## 
## [[2]]
## [1] 96325
## 
## [[3]]
## [1] 374812
## 
## [[4]]
## [1] 24070
## 
## [[5]]
## [1] 558176
## 
## [[6]]
## [1] 76455

Good work! You’re gaining great skills to be able to answer questions in a reproducible way with your datasets.

4.3 Graphs in purrr

4.3.1 ggplot() refresher

You’ve already been introduced to the package ggplot2 in the prerequisite for this course, but let’s do a quick refresher.

geom_point() makes scatterplots
geom_histogram() makes histograms

In this exercise, you are going to use a dataframe created from the gh_users dataset, called gh_users_df that has two columns; one for the number of public repositories a user has and another for how many followers that user has. Each row is a different user. Then you will make it into a scatter plot, a plot where the data are displayed with points.

Create a scatterplot with public_repos on the x axis and followers on the y axis.

gh_users_df=tribble(~public_repos, ~followers,
52,       303,
168,       780,
67,      3958,
26,       115,
99,       213,
31,        34)
# Scatter plot of public repos and followers
ggplot(data = gh_users_df, 
       aes(x = public_repos, y = followers))+
    geom_point()

Create a histogram of followers by piping in gh_users_df.

# Histogram of followers    
gh_users_df %>%
    ggplot(aes(x = followers))+
        geom_histogram()

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Good work! Isn’t making plots fun? Now let’s dive into how purrr can help make more of them!

4.3.2 purrr and scatterplots

Since ggplot() does not accept lists as an input, it can be paired up with purrr to go from a list to a dataframe to a ggplot() graph in just a few lines of code.

You will continue to work with the gh_users data for this exercise. You will use a map_*() function to pull out a few of the named elements and transform them into the correct datatype. Then create a scatterplot that compares the user’s number of followers to the user’s number of public repositories.

map() over gh_users, use the map_*() function that creates a dataframe, with four columns, named “login”, “name”, “followers” and “public_repos”.
Pipe that dataframe into a scatterplot, where the x axis is followers and y is public_repos.

# Create a dataframe with four columns
map_df(gh_users, `[`, 
       c("login","name","followers","public_repos")) %>%
  # Plot followers by public_repos
  ggplot(., 
         aes(x = followers, y = public_repos)) + 
      # Create scatter plots
      geom_point()

Good work! Now you can go from list to plot using a tidy workflow!

4.3.3 purrr and histograms

Now you’re going to put together everything you’ve learned, starting with two different lists, which will be turned into a faceted histogram. You’re going to work again with the Stars Wars data from the sw_films and sw_people datasets to answer a question:

What is the distribution of heights of characters in each of the Star Wars films?

Different movies take place on different sets of planets, so you might expect to see different distributions of heights from the characters. Your first task is to transform the two datasets into dataframes since ggplot() requires a dataframe input. Then you will join them together, and plot the result, a histogram with a different facet, or subplot, for each film.

Create a dataframe with the “title” of each film, and the “characters” from each film in the sw_films dataset.

# Turn data into correct dataframe format
film_by_character <- tibble(filmtitle = map_chr(sw_films, "title")) %>%
    mutate(filmtitle, characters = map(sw_films, "characters")) %>%
    unnest()

## Warning: `cols` is now required when using unnest().
## Please use `cols = c(characters)`

Create a dataframe with the “height”, “mass”, “name”, and “url” elements from sw_people.

# Pull out elements from sw_people
sw_characters <- map_df(sw_people, `[`, c("height","mass","name","url"))

Join the two dataframes together using the “characters” and “url” keys.

# Join our two new objects
character_data <- inner_join(film_by_character, sw_characters, by = c("characters" = "url")) %>%
    # Make sure the columns are numbers
    mutate(height = as.numeric(height), mass = as.numeric(mass))

## Warning in mask$eval_all_mutate(quo): NAs introduced by coercion

## Warning in mask$eval_all_mutate(quo): NAs introduced by coercion

Create a ggplot() histogram with x = height, faceted by filmtitle.

# Plot the heights, faceted by film title
ggplot(character_data, aes(x = height)) +
  geom_histogram(stat = "count") +
  facet_wrap(~ filmtitle)

## Warning: Ignoring unknown parameters: binwidth, bins, pad

## Warning: Removed 6 rows containing non-finite values (stat_count).

Good work! Now you’ve learned all the basics of how you can use purrr to make tasks that require iteration and working with lists, more manageable, and human readable!

Course Description

Have you ever been wondering what the purrr description (“A functional programming toolkit for R”) refers to? Then, you’ve come to the right place! This course will walk you through the functional programming part of purrr - in other words, you will learn how to take full advantage of the flexibility offered by the .f in map(.x, .f) to iterate other lists, vectors and data.frame with a robust, clean, and easy to maintain code. During this course, you will learn how to write your own mappers (or lambda functions), and how to use predicates and adverbs. Finally, this new knowledge will be applied to a use case, so that you’ll be able to see how you can use this newly acquired knowledge on a concrete example of a simple nested list, how to extract, keep or discard elements, how to compose functions to manipulate and parse results from this list, how to integrate purrr workflow inside other functions, how to avoid copy and pasting with purrr functional tools.

5 Programming with purrr

Do lambda functions, mappers, and predicates sound scary to you? Fear no more! After refreshing your purrr memory, we will dive into functional programming 101, discover anonymous functions and predicates, and see how we can use them to clean and explore data.

5.1 purrr basics - a refresher

5.1.1 Refreshing your purrr memory

Let’s pretend you’re a data analyst working for a web agency. The web-design team has been running a weeklong A/B test that compares the performance of two design proposals for a website, and you’re now in charge of analyzing the results.

The team measured the number of visits to the Contact page to determine the design’s impact on the number of people contacting the company. These designs were presented to 2/3 of visitors.

visit_a contains the results from campaign A and visit_b the results of campaign B. Both are expressed as an average hourly number of visits. All the other stats you have are expressed as visits per day, so you need to convert these two. Then, you’ll extract the mean of each vector.

Note that these are new data, not the one from the video.

Create the to_day() function, which multiplies x by 24.

# Create the to_day function
to_day <- function(x) {
  x*24
}

Create a list that contains visit_a and visit_b.

visit_a=c(117, 147, 131,  73,  81, 134, 121)
visit_b=c(180, 193, 116, 166, 131, 153, 146)
visit_c=c(57, 110,  68,  72,  87, 141,  67)
# Create a list containing both vectors: all_visits
all_visits <- list(visit_a, visit_b)

Turn your new list to the daily number of visits with map() and the to_day() function.

# Convert to daily number of visits: all_visits_day
all_visits_day <- map(all_visits, to_day)

Compare the mean of visits by mapping the mean() function on the results.

# Map the mean() function and output a numeric vector 
map_dbl(all_visits_day, mean)

## [1] 2756.571 3720.000

Well done! You’re mastering the basic syntax of iteration with purrr with the map() and map_dbl() functions. Let’s refresh your memory a little more!

5.1.2 Another purrr refresher

You just received visit_c, the number of visits on the website during the same week, but with the old design, which was shown to 1/3 of website visitors. You now want to compare these visit_c, with two previous design, visit_a and visit_b, to know which one led to more visits of the Contact page.

Again, you’ll need to turn all the visitor lists to the daily number of visits.

You’ve been asked to provide two insights:

A plot for each element
The total number of visits for each day, regardless design

You’ll test out both map() and walk() for plotting. Both return the “side effects,” that is to say, the changes in the environment (drawing plots, downloading a file, changing the working directory…), but walk() won’t print anything to the console.

Create a list containing the three vectors and turn these list elements to the daily number of visits.

# Create all_tests list  and modify with to_day() function
all_tests <- list(visit_a, visit_b, visit_c)
all_tests_day <- map(all_tests, to_day)

Create three bar plots of all_tests_day with one call using map().

# Plot all_tests_day with map
map(all_tests_day, barplot)

## [[1]]
##      [,1]
## [1,]  0.7
## [2,]  1.9
## [3,]  3.1
## [4,]  4.3
## [5,]  5.5
## [6,]  6.7
## [7,]  7.9
## 
## [[2]]
##      [,1]
## [1,]  0.7
## [2,]  1.9
## [3,]  3.1
## [4,]  4.3
## [5,]  5.5
## [6,]  6.7
## [7,]  7.9
## 
## [[3]]
##      [,1]
## [1,]  0.7
## [2,]  1.9
## [3,]  3.1
## [4,]  4.3
## [5,]  5.5
## [6,]  6.7
## [7,]  7.9

Create three plots with one call, without anything printed to the console.

# Plot all_tests_day
walk(all_tests_day, barplot)

Get the sum of all_tests_day as a list, then check you’ve got a numeric output by printing the class of the object.

# Get the sum, of the all_tests_day list, element by element, and check its class
sum_all <- pmap_dbl(all_tests_day, sum)
class(sum_all)

## [1] "numeric"

Congratulations! We are first using map() because we want to apply the function to each element of the list. Then, we are using pmap_dbl() because we need to take sub-element one by one. So, now that we have seen the basics of iteration with purrr, let’s dive into programming!

5.2 Introduction to mappers

5.2.1 Creating lambda functions

Do you recall the three vectors visit_a, visit_b and visit_c from the A/B test from the last exercise? They are still available in your workspace.

Remember that these vectors contain the hourly visit rate by day. Each element of these vectors corresponds to one design of the website, randomly served to the visitors. We are going to turn these vectors into a daily number of visits, but this time, we’ll use a mapper.

Using a mapper allows you to write reusable code: you will potentially be asked to redo this task, so if you have an already existing mapper, you will be able to reuse this object, instead of copying and pasting the same code again and again.

Get the daily number of visits by mapping an anonymous function on visit_a.

# Turn visit_a into daily number using an anonymous function
map(visit_a, function(x) {
  x * 24
})

## [[1]]
## [1] 2808
## 
## [[2]]
## [1] 3528
## 
## [[3]]
## [1] 3144
## 
## [[4]]
## [1] 1752
## 
## [[5]]
## [1] 1944
## 
## [[6]]
## [1] 3216
## 
## [[7]]
## [1] 2904

Make this code more concise by using a mapper.

# Turn visit_a into daily number of visits by using a mapper
map(visit_a, ~ .x * 24)

## [[1]]
## [1] 2808
## 
## [[2]]
## [1] 3528
## 
## [[3]]
## [1] 3144
## 
## [[4]]
## [1] 1752
## 
## [[5]]
## [1] 1944
## 
## [[6]]
## [1] 3216
## 
## [[7]]
## [1] 2904

Create a reusable mapper object called to_day.

# Create a mapper object called to_day
to_day <- as_mapper(~ .x * 24)

Call to_day on the three vectors (make three calls).

# Use it on the three vectors
map(visit_a, to_day)

## [[1]]
## [1] 2808
## 
## [[2]]
## [1] 3528
## 
## [[3]]
## [1] 3144
## 
## [[4]]
## [1] 1752
## 
## [[5]]
## [1] 1944
## 
## [[6]]
## [1] 3216
## 
## [[7]]
## [1] 2904

map(visit_b, to_day)

## [[1]]
## [1] 4320
## 
## [[2]]
## [1] 4632
## 
## [[3]]
## [1] 2784
## 
## [[4]]
## [1] 3984
## 
## [[5]]
## [1] 3144
## 
## [[6]]
## [1] 3672
## 
## [[7]]
## [1] 3504

map(visit_c, to_day)

## [[1]]
## [1] 1368
## 
## [[2]]
## [1] 2640
## 
## [[3]]
## [1] 1632
## 
## [[4]]
## [1] 1728
## 
## [[5]]
## [1] 2088
## 
## [[6]]
## [1] 3384
## 
## [[7]]
## [1] 1608

Well played! You now know a little bit more about lambda functions and mappers, and you’ve used them to transform your dataset. Let’s try again in a new exercise!

5.2.2 Lambda functions

We are still working with the results of a weeklong A/B test on a website. The three vectors containing the number of visits for each design (visit_a, visit_b and visit_c) are available in your workspace.

One of your colleagues has asked you to transfer him the results, but he wants them to be rounded to the nearest ten. To do this, you will need to call the round() function this way:

Rounding to a negative number of digits means rounding to a power of ten, so for example round(x, digits = -2) rounds to the nearest hundred

Definition taken from R documentation: see ?round

Make sure to use the right map_* for each call.

Round visit_a to the nearest ten with a mapper.

# Round visit_a to the nearest tenth with a mapper
map_dbl(visit_a, ~ round(.x, -1))

## [1] 120 150 130  70  80 130 120

Create a reusable mapper object called to_ten, that rounds to the nearest ten.

# Create to_ten, a mapper that rounds to the nearest tenth
to_ten <- as_mapper(~ round(.x, -1))

Map to_ten to visit_b.

# Map to_ten on visit_b
map_dbl(visit_b, to_ten)

## [1] 180 190 120 170 130 150 150

Map to_ten to visit_c.

# Map to_ten on visit_c
map_dbl(visit_c, to_ten)

## [1]  60 110  70  70  90 140  70

Purrrfect! Are you starting to like mappers ;)? In this exercise, you’ve seen how to build a reusable mapper. Using reusable elements (like mappers here) allows to write code which is easier to use and to maintain in the long run.

5.3 Using mappers to clean data

5.3.1 Clean up your data with keep()

Since the beginning of this course, we have been using the results of a weeklong A/B test.

We have put these results in a list called all_visits. This list contains visit_a, visit_b, and visit_c. These vectors are unnamed. They all contain seven numbers, one for each day of the week.

The first question we want to ask is: which days reached more than 100 visits an hour on average? We will use the keep() function. But the answer would not be readable with an unnamed vector: you would have the numbers, but you would not know to which day these numbers correspond.

The good news is: you can use the set_names() function to solve this issue. This is what we’ll do in this chapter: first, use keep() on unnamed vectors, then on named ones.

Create a mapper that will test if .x is more than 100. You’ll use it twice.

# Create a mapper that test if .x is more than 100 
is_more_than_hundred <- as_mapper(~ .x > 100)

Combining this mapper with keep(), and map it on the unnamed list all_visit. As the result is unnamed, you don’t know which days you have kept.

# Use this mapper with keep() on the all_visits object 
map(all_visits, ~ keep(.x, is_more_than_hundred))

## [[1]]
## [1] 117 147 131 134 121
## 
## [[2]]
## [1] 180 193 116 166 131 153 146

Name each vector by combining map() and the set_names() functions, using the vector of names we have provided.

# Use the  day vector to set names to all_list
day <- c("mon", "tue", "wed", "thu", "fri", "sat", "sun")
full_visits_named <- map(all_visits, ~ set_names(.x, day))

Map the previously created mapper on the newly named list. As you can see, it’s more readable now!

# Use this mapper with keep() 
map(full_visits_named, ~ keep(.x, is_more_than_hundred))

## [[1]]
## mon tue wed sat sun 
## 117 147 131 134 121 
## 
## [[2]]
## mon tue wed thu fri sat sun 
## 180 193 116 166 131 153 146

Great! In this exercise, you’ve learned how to name vectors, and how to construct a reusable mapper to answer questions about your data.

5.3.2 Split up with keep() and discard()

We want to split our results into two groups: the days over 100, and the days under 100. We’ll combine keep() and discard() to do so.

Why two functions? Couldn’t we use one function? Couldn’t we create a mapper called is_less_than_hundred?

We could, but that would be more error-prone: it’s easier to switch from keep() to discard() than copying and pasting. By combining both functions, we only need one mapper. That means that if we want to change the threshold, we’ll only need to do it once, not twice, as we would have to do if we had two mappers.

This is a rule you should endeavor to apply when coding: write code so that if you need to change one thing, you will have to change it just once.

all_visits is still available in your workspace.

Map the set_names() function on all_visits to add the name of the days: all_visits_named.

# Set the name of each subvector
day <- c("mon", "tue", "wed", "thu", "fri", "sat", "sun")
all_visits_named <- map(all_visits, ~ set_names(.x, day))

Create a mapper called threshold that will test if .x is over 100.

# Create a mapper that will test if .x is under 100 
threshold <- as_mapper(~ .x > 100)

Create group_over by keeping the elements that are over 100.

# Run this mapper on the all_visits_named object
group_over <- map(all_visits_named, ~ keep(.x, threshold))

Create group_under by discarding the elements that are over 100.

# Run this mapper on the all_visits_named object
group_under <-  map(all_visits_named, ~ discard(.x, threshold))

Well done! As you can see in this code, if I want to change the threshold, I have to change it once. This is an important feature of good code: do not write code in a way that if you need to change a parameter, you’ll have to change it several times.

5.4 Predicates

5.4.1 What is a predicate?

A predicate function is “a function that either returns TRUE or FALSE.” While a predicate functional “takes a vector and a predicate function and do something useful.”^***

In other words, the predicate functionals take in .x, which is a vector, a dataframe, or a list, and test the predicate on every element of .x. For example, you can test if every element is numeric with the is.numeric() predicate from R-Base, or if the mean of some elements is under 5 with this mapper: ~mean(.x) < 5.

Which of these functions is NOT a predicate?

is.character()
function(x) x < 5

~ .x * 100

~ .x < 5

Right! Here we are doing multiplication; the result is not TRUE or FALSE!

5.4.2 Exploring data with predicates

We will continue our exploration of A/B test data. Your manager is not interested in which days reached the threshold, he wants to know if every day reached the threshold or if some days reached the threshold. We’ll use purrr predicates to answer these questions.

You have received several thresholds and decided to write a script that will start with this threshold definition, and answer, for each design, if all the days have reached the threshold, and if not, if some did.

The results from this A/B test are in the all_visits list.

Create a variable called threshold, that contains the number 160.

# Create a threshold variable that contains 160
threshold <- 160

Create a new mapper, that will test if .x is over threshold.

# Create a mapper that tests if .x is over threshold
over_threshold <- as_mapper(~ .x > threshold)

Combine map() and every() to test if all elements are over the threshold.

# Are all elements over the defined threshold? 
map(all_visits, ~ every(.x, over_threshold))

## [[1]]
## [1] FALSE
## 
## [[2]]
## [1] FALSE

Combine map() and some() to test if some elements are over the threshold.

# Are some elements over the defined threshold? 
map(all_visits, ~ some(.x, over_threshold))

## [[1]]
## [1] FALSE
## 
## [[2]]
## [1] TRUE

Well done! You’ve completed the first chapter of the course. We’ve played a lot with lists in this first chapter. You may think you won’t need this purrr knowledge as you’re only dealing with a data frame. But good news: as data.frames are lists of same-length vectors; you can apply all these purrr methods to a data.frame. We’ll also see in the next chapter how to use purrr inside data.frames with list-columns. Starting to feel addicted to purrr? Rendez-vous in the next chapter for more magic!

6 FP: from theory to practice

Ready to go deeper with functional programming and purrr? In this chapter, we’ll discover the concept of functional programming, explore error handling using including safely() and possibly(), and introduce the function compact() for cleaning your code.

6.1 Functional programming in R

6.1.1 Everything that happens is a function call

When you are using R, every computation happens because of a call to a function.

In other words, every operation made on an object is linked to a function. And you’ve been using functions from the very first day you started R: <- is a function, as is [.

What do you think would be the output of this code?

class(`$`)

“object”

“function”

“character”
“operator”

Well done, $ is a function call, of a special type called ‘infix operator’, as they are put between two elements, and can be used without parenthesis.

6.1.2 Identifying pure functions

A pure function satisfies two properties:

Its output only depends on its inputs: when you input a value, the output is always the same.
It has no side-effect, that is to say, no effect outside the function.

A lot of functions in R are not pure, yet they are vital for a day to day use of R: when doing an analysis, you need to download files, create a plot, save results…

When programming, you should aim at making your functions either as pure as possible or as impure as possible (for example, a function that downloads a file should only download this file). But for that, you first need to be able to recognize a pure function from an impure one.

This is what we’ll do in this exercise: run functions which are either pure or impure, and see what their outputs are.

Run Sys.time(), then Sys.sleep(1), then Sys.time() again, to see how two calls to the same function can lead to different results.

# Launch Sys.time(), Sys.sleep(1), & Sys.time()
Sys.time()

## [1] "2022-02-18 14:07:38 +07"

Sys.sleep(1)
Sys.time()

## [1] "2022-02-18 14:07:39 +07"

Run nrow(iris), then Sys.sleep(1), then nrow(iris) again, to see how these two calls return the same thing, regardless of time.

# Launch nrow(iris), Sys.sleep(1), & nrow(iris)
nrow(iris)

## [1] 150

Sys.sleep(1)
nrow(iris)

## [1] 150

Run ls(), which lists the objects in the environment. Create a new object called this, which contains 12, then run ls() again.

# Launch ls(), create an object, then rerun the ls() function
ls()

##  [1] "a"                    "all_files"            "all_files_purrr"     
##  [4] "all_tests"            "all_tests_day"        "all_visits"          
##  [7] "all_visits_day"       "all_visits_named"     "character_data"      
## [10] "day"                  "files"                "film_by_character"   
## [13] "full_visits_named"    "gap_split"            "gh_repos_named"      
## [16] "gh_users"             "gh_users_df"          "gh_users_named"      
## [19] "group_over"           "group_under"          "height_cm"           
## [22] "height_ft"            "i"                    "is_more_than_hundred"
## [25] "list_of_df"           "list_of_files_map2"   "list_of_files_pmap"  
## [28] "means"                "means2"               "numlist"             
## [31] "over_threshold"       "people_by_film"       "plots"               
## [34] "pmapinputs"           "sigma"                "sigma2"              
## [37] "sites"                "sum_all"              "sw_characters"       
## [40] "sw_films_named"       "sw_people"            "threshold"           
## [43] "to_day"               "to_ten"               "visit_a"             
## [46] "visit_b"              "visit_c"              "wesanderson"

this <- 12
ls()

##  [1] "a"                    "all_files"            "all_files_purrr"     
##  [4] "all_tests"            "all_tests_day"        "all_visits"          
##  [7] "all_visits_day"       "all_visits_named"     "character_data"      
## [10] "day"                  "files"                "film_by_character"   
## [13] "full_visits_named"    "gap_split"            "gh_repos_named"      
## [16] "gh_users"             "gh_users_df"          "gh_users_named"      
## [19] "group_over"           "group_under"          "height_cm"           
## [22] "height_ft"            "i"                    "is_more_than_hundred"
## [25] "list_of_df"           "list_of_files_map2"   "list_of_files_pmap"  
## [28] "means"                "means2"               "numlist"             
## [31] "over_threshold"       "people_by_film"       "plots"               
## [34] "pmapinputs"           "sigma"                "sigma2"              
## [37] "sites"                "sum_all"              "sw_characters"       
## [40] "sw_films_named"       "sw_people"            "this"                
## [43] "threshold"            "to_day"               "to_ten"              
## [46] "visit_a"              "visit_b"              "visit_c"             
## [49] "wesanderson"

Run plot(iris), which creates a basic plot of the iris dataset. See how nothing is printed to the console, and only a side-effect is produced.

# Create a plot of the iris dataset
plot(iris)

Sys.time() is an extremely impure function, as it will return a different output depending on when you are running it, so is ls(), which depends on what is in your environment. nrow() is pure, as the output only depends on the object you’re using as an input, and it has no side effect. Other examples include read.csv(), which depends on an external source, and if the file changes, the output will change, or plot(), which is by definiton called for its side-effects.

6.2 Tools for FP in purrr

6.2.1 Safe iterations

As in the previous chapter, let’s pretend you are a data analyst working for a web agency. This time, you’ve been asked to do some web scraping.

(Note: don’t be afraid if you don’t know how to do web scraping, we’ll start simple, and all the functions will be explained).

You have received a list of URLs, but you suspect that some are not real addresses. The first thing you will do is test if you can connect to these URLs. For this, we’ll use a simple function from the readr package: read_lines(), that we will put inside a safely(). When given an URL, read_lines() reads the HTML, or returns an error if the URL is not reachable.

Theurls vector is available in your workspace. Print it in the console if you want to know what is inside.

Create a safe version of the read_lines() function.

# Create a safe version of read_lines()
safe_read <- safely(read_lines)

Map this newly created function of the provided vector called urls.

urls=c("https://thinkr.fr",
"https://colinfay.me",
"https://en.wikipedia.org",
"http://cran.r-project.org/")
# Map it on the urls vector
res <- map(urls, safe_read)

Set the names of the results with the set_names() function.

# Set the name of the results to `urls`
named_res <- set_names(res, urls)

Extract the “error” element of each sublist.

# Extract only the "error" part of each sublist
map(named_res, "error")

## $`https://thinkr.fr`
## NULL
## 
## $`https://colinfay.me`
## NULL
## 
## $`https://en.wikipedia.org`
## NULL
## 
## $`http://cran.r-project.org/`
## NULL

Purrrfect. Thanks to safely(), you were able to iterate over the list of URLs, even if some return errors.

6.2.2 Create a function

We’ve seen how we can use safely() to identify non-reachable urls in the previous exercise: we wrote a little process that called a safe version of read_lines(), and returned a list of $errors.

In this exercise, we’ll try another approach, as we won’t focus on errors only. Instead of mapping a safe function and extracting the “error” elements from the results, we will write a helper function that will immediately discard() the NULL elements of the output of safe_read().

This way, instead of extracting the $error or $result part of the output, we’ll be able to know if the elements are reachable (the content is returned in $results) or if it’s not (then the error is returned in $error).

The urls vector has been provided for you.

Create a safe version of read_lines().

# Create a safe version of read_lines()
safe_read <- safely(read_lines)

Create a function called safe_read_discard() that will run the safe version of read_lines() and discard() the NULL elements.

# Code a function that discard() the NULL from safe_read()
safe_read_discard <- function(url){
  safe_read(url) %>%
    discard(is.null)
}

Map this function on the url list that has been provided for you.

# Map this function on the url list
res <- map(urls, safe_read_discard)

Nice! You now have a simple function that can tell you if a URL is reachable, or if it returns an error.

6.3 Using possibly()

6.3.1 A possibly() version of read_lines()

We are still working with the series of URLs you were given to scrape. We are trying several methods to identify URLs that can’t be accessed. Why are we doing that? Because the first step of web scraping is analyzing if you can access the URL or not. This is what the code we are writing will be useful for.

In the previous exercise, we wrapped the read_lines() function inside a safely() function. In this exercise, we will use the possibly() function.

In web terminology, a 404 indicates that a web page is not available. This number will be used as the otherwise argument.

Also, as the read_lines() returns a vector of length n when reading a webpage, we’ll collapse paste these using the paste() function.

The urls vector has been provided for you.

Wrap the read_lines() function in a possibly() call that would otherwise return 404.

# Create a possibly() version of read_lines()
possible_read <- possibly(read_lines, otherwise = 404)

Map this newly created function on the URL list, and pipe it straight into set_names()

# Map this function on urls, pipe it into set_names()
res <- map(urls, possible_read) %>% set_names(urls)

Turn each element of this list into a length one character by using the paste() function, with the collapse argument set to ” “.

# Paste each element of the list 
res_pasted <- map(res, paste, collapse = " ")

Keep only the elements which are equal to 404.

# Keep only the elements which are equal to 404
keep(res_pasted, ~ .x == 404)

## named list()

Well done! We now have explored another way to detect which urls are not available.

6.3.2 Everything in one call

In order to make this code even more reproducible, we are going to create a function that does it in one call. We have already provided you a skeleton for this function, now it’s your turn to complete it!

In the previous exercises, we have written the process in several steps. Now, we want this to be done in just one call: we’ll then write a function that takes a list of URLs, and return the names of the elements that are not reachable.

Once you have written this function, you could save it, and reuse it whenever you need to clean a list of URLs. And maybe put it into a package ;)

The urls list from the previous exercise is available in your workspace.

Create, inside the map() call, a possibly() version of read_lines() that will otherwise return a 404.
Set the names of the output.
Use the paste() function with the collapse argument set to ” “ to turn each sublist into a character vector.
Remove the elements which are equal to 404.

url_tester <- function(url_list){
  url_list %>%
    # Map a version of read_lines() that otherwise returns 404
    map( possibly(read_lines, otherwise = 404) ) %>%
    # Set the names of the result
    set_names( urls ) %>% 
    # paste() and collapse each element
    map(paste, collapse = " ") %>%
    # Remove the 404 
    discard(~ .x == 404) %>%
    names() # Will return the names of the good ones
}

# Try this function on the urls object
url_tester(urls)

## [1] "https://thinkr.fr"          "https://colinfay.me"       
## [3] "https://en.wikipedia.org"   "http://cran.r-project.org/"

Perfect! If you have a process that you tend to repeat, it’s better to write a function to do it.

6.4 Handling adverb results

6.4.1 Purrrfecting our function

We are still perfecting our function to detect if a list of URLs contains elements that are not available.

Let’s review what we have coded so far:

An error extractor, by combining safely() and map(.x, “error”).
A “non-null” extractor, by combining safely() and discard(.x, is.null).
A 404 generator, by using possibly(.x, otherwise = 404), which was turned into a function.

We’ll change the behavior of this function a bit: you now want to be able to choose between returning either the results or the errors.

This will allow you to answer two questions with just one function: which are the unreachable URLs, and which are the reachable ones? To do this, you’ll add a parameter called “type” inside this function.

The urls vector and safe_read() are available in your workspace.

Complete the function definition.

Map safe_read() to the list of URLs.
Set the names of the result to the list of URLs.
Transpose the result into a list of $result and $error.
Use pluck() to extract the type element.

# Complete the function definition
url_tester <- function(url_list, type = c("result", "error")) {
  type <- match.arg(type)
  url_list %>%
    # Apply safe_read to each URL
    map(safe_read) %>%
    # Set the names to the URLs
    set_names(url_list) %>%
    # Transpose 
    transpose() %>%
    # Pluck the type element
    pluck(type) 
}

# Try this function on the urls object
url_tester(urls, type = "error")

## $`https://thinkr.fr`
## NULL
## 
## $`https://colinfay.me`
## NULL
## 
## $`https://en.wikipedia.org`
## NULL
## 
## $`http://cran.r-project.org/`
## NULL

By combining safely() and transpose(), you’ve written a flexible function: here you can focus either on the results or on the errors.

6.4.2 Extracting status codes with GET()

For this last exercise, we’ll switch from the read_lines() function to the GET() function from httr.

We’ll first create a possibly() version of GET(), in order to test if some of the URLs you’ve got return an error. If you can access the URL, a connection object will be returned. In it, you’ll find a “status_code” element.

Don’t focus on the results, just remember that if a GET() function returns an error, it’s because the URL is not available. The status code number we are returning can appear a bit like web jargon, but we’ll talk about it with more depth in the next chapter. Just remember, for now, that 200 means everything went as expected.

The urls vector is available in your workspace, purrr and httr has been loaded for you.

Create a version of GET() that would return NULL in case of error.
Set the names of the results.
Remove the NULL.
Extract the “status_code” of each element.

library(httr)
url_tester <- function(url_list){
  url_list %>%
    # Create a possibly() version of GET() that would otherwise return NULL 
    map( possibly(GET, NULL) ) %>%
    # Set the names of the result
    set_names( urls ) %>%
    # Remove the NULL
    compact() %>%
    # Extract all the "status_code" elements
    map("status_code")
}

# Try this function on the urls object
url_tester(urls)

## $`https://thinkr.fr`
## [1] 200
## 
## $`https://colinfay.me`
## [1] 200
## 
## $`https://en.wikipedia.org`
## [1] 200
## 
## $`http://cran.r-project.org/`
## [1] 200

Great! We have seen in this chapter how to write custom functions which can help you when doing data analysis: for example, it’s crucial when you are doing web scraping, to ensure that the urls you want to scrape are reachable. Now you now how to do this ;)

7 Better code with purrr

In this chapter, we’ll use purrr to write code that is clearer, cleaner, and easier to maintain. We’ll learn how to write clean functions with compose() and negate(). We’ll also use partial() to compose functions by “prefilling” arguments from existing functions. Lastly, we’ll introduce list-columns, which are a convenient data structure that helps us write clean code using the Tidyverse.

7.1 Why cleaner code?

7.1.1 How to write compose()

When you use compose(), the functions are passed from right to left — that is to say in the same order as the one you would use in a nested call in base R: the first function to be executed is the function on the right.

In other words, if you are used to the pipe, the order is the opposite one:

``` r

With the pipe

1:28 %>% mean() %>% round()

In base R

round(mean(1:28))

With compose

roundedmean <- compose(round, mean) rounded

So, what’s the correct way to write a function that will count the number of NA?

compose(is.na, sum)

compose(sum, is.na)

compose(is.na(), sum())
compose(sum(), is.na())

Well done! You’re composing a new function by passing in functions in reverse order: the first function in compose() will be the last to run.

7.1.2 Back to the office

You are still working as a data analyst for a web agency, and you’ve been asked to do web scraping. You have been given a list of URLs to analyze, an analysis you’ve already started in the previous chapter.

You expect this task to be recurrent: no doubt you’ll be asked to do it again in a few weeks. In order to make your future work easier, you’ve decided to try and write clean code today, so that it will be easier to come back to it later.

We’ll start by combining the two functions from httr we’ve seen in the previous chapter: GET(), for retrieving the webpage, and status_code(), to extract the status code, in order to create a status code extractor.

The urls vector is still available in your workspace. We have kept only the URLs that are reachable.

Launch purrr and httr.

# Launch purrr and httr
library(purrr)
library(httr)

Compose a status extractor with GET() and status_code().

# Compose a status extractor 
status_extract <- compose(status_code, GET)

Try this new function on “https://www.thinkr.fr” and “https://en.wikipedia.org”.

# Try with "https://thinkr.fr" & "https://en.wikipedia.org"
status_extract("https://thinkr.fr")

## [1] 200

status_extract("https://en.wikipedia.org")

## [1] 200

Map this function directly on the vector urls.

# Map it on the urls vector, return a vector of numbers
map_dbl(urls, status_extract)

## [1] 200 200 200 200

Nice! We have used purrr to quickly create a combination of two functions! And good news: all the websites we have tried to reach returned a 200 status code, meaning we were able to connect to all them without any problem.

7.2 compose() and negate()

7.2.1 Build a function

You’re still trying to perfect your tools for doing webs scraping to be as efficient as possible doing your job as a data analyst for a web agency.

In this exercise, you will make the extractor function from the previous exercise a little bit stricter: if the code returned by the status extractor is not between 200 and 203, the function will return a missing value (NA). In the other case, the status code will be returned.

purrr and httr have been loaded for you.

Negate the %in% operator, which is used to test if the element on the left is inside the element of the right.

# Negate the %in% function 
`%not_in%` <- negate(`%in%`)

Compose a extract_status() function, which will be a combination of GET() and status_code().

# Compose a status extractor 
extract_status <- compose(status_code, GET)

Complete the given function: the url status code should be extracted and assigned to a code variable. Then if this code is not in 200:203, a missing value will be returned. Otherwise, the status code is returned.

# Complete the function definition
strict_code <- function(url) {
  # Extract the status of the URL
  code <- extract_status(url)
  # If code is not in the acceptable range ...
  if (code %not_in% 200:203) {
    # then return NA
    return(NA)
  } 
  code
}

Good work! We now have a stricter version of our status code extractor. Let’s try it on a vector or urls!

7.2.2 Count the NA

Now that you have a stricter version of the status code extractor, we’ll try it on our list of URLs.

What we want to do here is to see which of the websites from our list return a status code which is not between 200 and 203. To achieve this task, we’ll flip the is.na() function, that is to say that instead of returning TRUE if the value is missing, it will return FALSE.

The urls vector and the strict_code() function are available in your workspace. httr and purrr has been loaded for you.

Run the strict_code() against the vector of urls.

# Map the strict_code function on urls
res <- map_dbl(urls, strict_code)

Set the names of the results with the set_names() function, using the urls vector.

# Set the names of the results
res_named <- set_names(res, urls)

“Flip” the is.na() function by negating its behavior.

# Negate the is.na function
is_not_na <- negate(is.na)

Use the is_not_na() function on the vector of results.

# Run is_not_na on the results
is_not_na(res_named)

##          https://thinkr.fr        https://colinfay.me 
##                       TRUE                       TRUE 
##   https://en.wikipedia.org http://cran.r-project.org/ 
##                       TRUE                       TRUE

See how clear this code is? There is not that many lines of code, and it’s pretty clear what the intent of each line is.

7.3 Prefilling functions

7.3.1 A content extractor

In the previous exercises, you have established that all the elements from the URLs vector you were given return a 200 status code. Now that you know that they are accessible, you will dig deeper into the web scraping, by doing some content extraction.

To do this, we’ll use functions from the rvest package, which will be prefilled with partial(). The functions we will write in this exercise will extract all the H2 HTML nodes from a page — on a webpage, these H2 nodes correspond to the level 2 headers. Once we have extracted these titles, the html_text() function will be used to extract the text content from the raw HTML.

purrr and rvest has been loaded for you, and the urls vector is available in your workspace.

Start by prefilling the html_nodes() with css = “h2”.

# Prefill html_nodes() with the css param set to h2
get_h2 <- partial(html_nodes, css = "h2")

Combine this newly created function between read_html and html_text, to create a text extractor for H2 headers.

# Combine the html_text, get_h2 and read_html functions
get_content <- compose(html_text, get_h2, read_html)

Run this function on the urls vector, and name the result.

# Map get_content to the urls list
res <- map(urls, get_content) %>%
  set_names(urls)

Print the result to see what it looks like.

# Print the results to the console
res

## $`https://thinkr.fr`
##  [1] "\n"                                                                                        
##  [2] "\n"                                                                                        
##  [3] "Nos formations Certifiantes à R sont finançables à 100% via le CPF"                        
##  [4] "R niveau 3 – Développeur – Conception d’interfaces Shiny – Formation certifiante mars 2022"
##  [5] "Comment faire ses templates RMarkdown et Shiny ?"                                          
##  [6] "\nAfficher le numéro01 85 09 14 03\n"                                                      
##  [7] "Des formateurs amouReux"                                                                   
##  [8] "Bénéficiez d'une formation sur-mesure pour vous et votre équipe"                           
##  [9] "Les différents moyens de faire financer votre formation."                                  
## [10] "“De la Création au Déploiement d’Applications {shiny} avec {golem}”"                       
## 
## $`https://colinfay.me`
## character(0)
## 
## $`https://en.wikipedia.org`
##  [1] "From today's featured article" "Did you know ..."             
##  [3] "In the news"                   "On this day"                  
##  [5] "From today's featured list"    "Today's featured picture"     
##  [7] "Other areas of Wikipedia"      "Wikipedia's sister projects"  
##  [9] "Wikipedia languages"           "Navigation menu"              
## 
## $`http://cran.r-project.org/`
## character(0)

Well played! You now have a nice process to extract content from a webpage.

7.3.2 Another extractor

In the previous exercise, we built a function that was able to extract the text content from H2 headers.

We’ll try something else here: we want to extract all the links that exist on a specific page. To do this, we will need to call two httr functions: html_nodes(), with the css argument set to “a” (a is the HTML tag for links) and html_attr(), which extract a given attribute from a node — in our case, this attribute will be “href”, which is the link address.

purrr and rvest has been loaded for you. You can still find the urls vector in your workspace.

Prefill the html_nodes() with the css argument set to “a”.

# Create a partial version of html_nodes(), with the css param set to "a"
get_a <- partial(html_nodes, css = "a")

Create the href() function, which will be a prefilled version of html_attr().

# Create href(), a partial version of html_attr()
href <- partial(html_attr, name = "href")

Compose a new combination of href(), get_a() and read_html().

# Combine href(), get_a(), and read_html()
get_links <- compose(href, get_a, read_html)

Map this new function on the urls vector.

# Map get_links() to the urls list
res <- map(urls, get_links) %>%
  set_names(urls)

# See the result
map(res,~head(.x))

## $`https://thinkr.fr`
## [1] "https://thinkr.fr"                              
## [2] "#"                                              
## [3] "https://thinkr.fr/notre-vision-de-la-formation/"
## [4] "https://thinkr.fr/notre-vision-de-la-formation/"
## [5] "https://thinkr.fr/equipe/"                      
## [6] "https://thinkr.fr/blog/"                        
## 
## $`https://colinfay.me`
## [1] "https://thinkr.fr/"   "/"                    "/categories/"        
## [4] "/about/"              "/talks-publications/" "/open-source/"       
## 
## $`https://en.wikipedia.org`
## [1] NA                   "#mw-head"           "#searchInput"      
## [4] "/wiki/Wikipedia"    "/wiki/Free_content" "/wiki/Encyclopedia"
## 
## $`http://cran.r-project.org/`
## [1] "navbar.html"

Well played! See how easy it is to write a web mining function with just a few lines of code?

7.4 List columns

7.4.1 About list-columns

You’ve been introduced in the video to a new kind of data structure: list columns. List-columns are, as their name suggests, columns which behave like lists, but are inside a special kind of dataframe — a tibble, which are an implementation of dataframe used in the tidyverse.

Nested dataframes — dataframes with list-columns, look like standard dataframes, but cells of that columns are not of length 1, and can contain any kind of elements. Just like a list.

df <- data.frame(
  classic = c("a", "b","c"), 
  list = list(
    c("a", "b","c"), 
    c("a", "b","c", "d"), 
    c("a", "b","c", "d", "e")
  )
)
df
# A tibble: 3 x 2
  classic list     
  <chr>   <list>   
1 a       <chr [3]>
2 b       <chr [4]>
3 c       <chr [5]>

But why is this a useful format?

To sound cool on Twitter.
They print pretty in the console.

To combine tools like dplyr and the flexibility of lists.

Indeed! List-columns allow you to build a tidyverse workflow while never leaving the dataframe structure.

7.4.2 Create a list-column data.frame

Let’s end our chapter with an implementation of our links extractor, but using a list-column. The idea when using a nested dataframe (i.e., dataframe with a list column) is to keep everything inside a dataframe so that the workflow stays tidy.

You have been provided a tibble called df, which has a column urls with the four URLs you’ve been using since the beginning of this chapter. If you want to have a look at this dataframe, feel free to print it in the console.

We are going to create a new column called links, which contains the results of the get_links() function (available in your workspace). As the outputs of this function have different lengths, the output will be a list column that you will then need to unnest() to get back a standard dataframe.

Load the three necessary packages: dplyr, tidyr, and purrr

# Load dplyr, tidyr, and purrr
library(dplyr)
library(tidyr)
library(purrr)

Take the df element, and run mutate() on it. mutate() will map the get_links() function on the urls column.

df=data.frame(urls=urls)
# Create a column named links with mutate(), that maps get_links() on urls
df2 <- df %>%
  mutate(links = map(urls, get_links))

Print the result.

# Print df2 to see what it looks like
df2

##                         urls
## 1          https://thinkr.fr
## 2        https://colinfay.me
## 3   https://en.wikipedia.org
## 4 http://cran.r-project.org/
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    links
## 1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             https://thinkr.fr, #, https://thinkr.fr/notre-vision-de-la-formation/, https://thinkr.fr/notre-vision-de-la-formation/, https://thinkr.fr/equipe/, https://thinkr.fr/blog/, https://thinkr.fr/faq/, https://thinkr.fr/recrutement/, https://thinkr.fr/equipe/, https://thinkr.fr/diagnostic-et-accompagnement-a-lexploitation-de-la-donnee/, https://thinkr.fr/analyse-de-donnees/, https://thinkr.fr/visualisation-et-communication/, https://thinkr.fr/collecte-de-donnees/, https://thinkr.fr/bases-de-donnees/, https://thinkr.fr/formation-au-logiciel-r/, https://thinkr.fr/formation-au-logiciel-r/rs5073-analyse-statistique-de-donnees-avec-le-langage-r/, https://thinkr.fr/formation-au-logiciel-r/creation-de-packages-r/, https://thinkr.fr/formation-au-logiciel-r/formation-shiny/, https://thinkr.fr/formation-au-logiciel-r/formation-sur-mesure/, https://thinkr.fr/formation-au-logiciel-r/introduction-et-remise-a-niveau-langage-r/, https://thinkr.fr/formation-au-logiciel-r/developper-package-r/, https://thinkr.fr/formation-au-logiciel-r/suivi-de-version-avec-git-et-rstudio/, https://thinkr.fr/formation-au-logiciel-r/creation-de-graphiques-avec-ggplot2/, https://thinkr.fr/formation-au-logiciel-r/cartographie-et-sig-avec-r/, https://rtask.thinkr.fr/fr/, https://rtask.thinkr.fr/fr/usecase/, https://thinkr.fr/blog/, https://rtask.thinkr.fr, https://thinkr.fr/formation-r/, https://thinkr.fr/formation-au-logiciel-r/, https://thinkr.fr/formation-au-logiciel-r/analyser-des-donnees-avec-r/, #, https://thinkr.fr/formation-au-logiciel-r/, https://www.moncompteformation.gouv.fr/espace-prive/html/#/formation/recherche/81006451900020_certifR_1/81006451900020_certifR_1d_042020, https://thinkr.fr/formation-au-logiciel-r/rs5073-analyse-statistique-de-donnees-avec-le-langage-r/, https://thinkr.fr/formation-au-logiciel-r/conception-dinterfaces-shiny/, https://thinkr.fr/formation-au-logiciel-r/creation-de-packages-r/, https://thinkr.fr/contact/, https://thinkr.fr/formation-au-logiciel-r/, https://thinkr.fr/formation-au-logiciel-r/formation-shiny/, https://thinkr.fr/formation-au-logiciel-r/, https://thinkr.fr/comment-faire-ses-templates-rmarkdown-et-shiny/, https://thinkr.fr/ressources/, tel:0185091403, #, https://thinkr.fr/notre-vision-de-la-formation/, https://thinkr.fr/faq/, https://thinkr.fr/calendrier/formation/, https://thinkr.fr/equipe/, https://thinkr.fr/faq/, #, #, #, #, #, #, #, #, #, #, #, #, #, #, #, #, #, #, #, #, #, #, #, #, #, #, #, #, https://rtask.thinkr.fr, https://twitter.com/thinkR_fr, https://www.meetup.com/fr-FR/R-Lille/, https://thinkr.fr/calendrier/conference/, https://thinkr.fr, https://thinkr.fr/notre-vision-de-la-formation/, https://thinkr.fr/blog/, https://thinkr.fr/faq/, https://rtask.thinkr.fr/fr/contributions-open-source/, https://thinkr.fr/equipe/, https://thinkr.fr/recrutement/, https://thinkr.fr/formation-au-logiciel-r/, https://rtask.thinkr.fr, https://twitter.com/thinkR_fr, https://github.com/ThinkR-open, https://www.meetup.com/fr-FR/rparis/, https://thinkr.fr/contact/, tel:0185091403, https://thinkr.fr/mentions-legales/, https://thinkr.fr/qualiopi, https://www.avis-verifies.com/avis-clients/thinkr.fr, /cdn-cgi/l/email-protection
## 2                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        https://thinkr.fr/, /, /categories/, /about/, /talks-publications/, /open-source/, https://engineering-shiny.org/, /search/, /aoc-2021-02/, /aoc-2021-01/, /engineering-shiny-print/, /post-request-shiny-app-brochure/, /brochure-r-package/, /aoc-2020-09/, /aoc-2020-08/, /aoc-2020-07/, /aoc-2020-06/, /aoc-2020-05/, /aoc-2020-04/, /aoc-2020-03/, /aoc-2020-02/, /aoc-2020-01/, /we-run-rladies/, /run-rladies/, /r-package-npm/, /hexmake-shiny-contest/, /clients-db/, /hello-hordes/, #, #, /page2/, /page3/, #, /page7/, /page2/, https://twitter.com/_ColinFay, https://github.com/ColinFay, https://www.linkedin.com/in/colinfay, mailto:, /feed.xml, https://jekyllrb.com, https://mademistakes.com/work/minimal-mistakes-jekyll-theme/, https://www.r-bloggers.com/, http://www.rweekly.org, https://creativecommons.org/licenses/by-nc-sa/4.0//, https://opensource.org/licenses/mit-license.php
## 3 NA, #mw-head, #searchInput, /wiki/Wikipedia, /wiki/Free_content, /wiki/Encyclopedia, /wiki/Help:Introduction_to_Wikipedia, /wiki/Special:Statistics, /wiki/English_language, /wiki/Portal:The_arts, /wiki/Portal:Biography, /wiki/Portal:Geography, /wiki/Portal:History, /wiki/Portal:Mathematics, /wiki/Portal:Science, /wiki/Portal:Society, /wiki/Portal:Technology, /wiki/Wikipedia:Contents/Portals, /wiki/File:Richard_II_of_England.jpg, /wiki/Wonderful_Parliament, /wiki/Legislative_session, /wiki/Parliament_of_England, /wiki/Westminster_Abbey, /wiki/Richard_II_of_England, /wiki/Favourite, /wiki/Hundred_Years%27_War, /wiki/Lord_Chancellor, /wiki/Michael_de_la_Pole,_1st_Earl_of_Suffolk, /wiki/Impeachment, /wiki/Wonderful_Parliament, /wiki/Ur-Quan, /wiki/SS_Choctaw, /wiki/David_Berman_(musician), /wiki/Wikipedia:Today%27s_featured_article/February_2022, https://lists.wikimedia.org/postorius/lists/daily-article-l.lists.wikimedia.org/, /wiki/Wikipedia:Featured_articles, /wiki/File:Cairo-citadel-1800s.jpg, /wiki/Baha_al-Din_Qaraqush, /wiki/Cairo_Citadel, /wiki/Saladin, /wiki/Canada_v_United_States_(2012_Summer_Olympics), /wiki/Canada_women%27s_national_soccer_team, /wiki/United_States_women%27s_national_soccer_team, /wiki/Henry_Fitzcount, /wiki/Fifth_Crusade, /wiki/Svalbard_Minute_by_Minute, /wiki/Economy_of_Svalbard#Tourism, /wiki/Dominion:_An_Anthology_of_Speculative_Fiction_From_Africa_and_the_African_Diaspora, /wiki/The_1619_Project, /wiki/Balbuena_metro_station, /wiki/Mexico_City_Metro_PCCI_fire, /wiki/Nathan_Safir, /wiki/KXTN_(AM), /wiki/Squatting_in_Hamburg, /wiki/Erotic_Art_Museum_(Hamburg), /wiki/Wikipedia:Recent_additions, /wiki/Help:Your_first_article, /wiki/Template_talk:Did_you_know, /wiki/File:Cooper_Kupp.jpg, /wiki/American_football, /wiki/Los_Angeles_Rams, /wiki/Cincinnati_Bengals, /wiki/Super_Bowl_LVI, /wiki/Super_Bowl_Most_Valuable_Player_Award, /wiki/Cooper_Kupp, /wiki/Cyclone_Batsirai, /wiki/Association_football, /wiki/2021_Africa_Cup_of_Nations, /wiki/Senegal_national_football_team, /wiki/Egypt_national_football_team, /wiki/2021_Africa_Cup_of_Nations_Final, /wiki/Playback_singer, /wiki/Lata_Mangeshkar, /wiki/Portal:Current_events, /wiki/COVID-19_pandemic, /wiki/2021%E2%80%932022_Russo-Ukrainian_crisis, /wiki/2022_Winter_Olympics, /wiki/Deaths_in_2022, /wiki/Ronald_Lou-Poy, /wiki/Gail_Halvorsen, /wiki/Luigi_De_Magistris_(cardinal), /wiki/Aled_Roberts, /wiki/Valerie_Boyd, /wiki/Raees_Mohammad, /wiki/Wikipedia:In_the_news/Candidates, /wiki/February_18, /wiki/File:Pajol.jpg, /wiki/Pierre_Claude_Pajol, /wiki/1766, /wiki/Malagasy_people, /wiki/Dutch_East_India_Company, /wiki/Meermin, /wiki/Meermin_slave_mutiny, /wiki/Cape_Agulhas, /wiki/1814, /wiki/War_of_the_Sixth_Coalition, /wiki/Napoleon, /wiki/Battle_of_Montereau, /wiki/1942, /wiki/World_War_II, /wiki/Imperial_Japanese_Army, /wiki/Sook_Ching, /wiki/Chinese_Singaporeans, /wiki/2007, /wiki/2007_Samjhauta_Express_bombings, /wiki/Samjhauta_Express, /wiki/Panipat, /wiki/Michelangelo, /wiki/George_Henschel, /wiki/Sergo_Ordzhonikidze, /wiki/February_17, /wiki/February_18, /wiki/February_19, /wiki/Wikipedia:Selected_anniversaries/February, https://lists.wikimedia.org/postorius/lists/daily-article-l.lists.wikimedia.org/, /wiki/List_of_days_of_the_year, /wiki/File:Daresbury_church_tower.jpg, /wiki/All_Saints%27_Church,_Daresbury, /wiki/Listed_buildings_in_Runcorn_(rural_area), /wiki/Runcorn, /wiki/Borough_of_Halton, /wiki/Cheshire, /wiki/Listed_building, /wiki/Rocksavage, /wiki/Telephone_booth, /wiki/Daresbury, /wiki/Court, /wiki/Listed_buildings_in_Runcorn_(rural_area), /wiki/Robert_Bathurst_filmography, /wiki/List_of_alumni_of_Jesus_College,_Oxford, /wiki/Andre_Norton_Award, /wiki/Wikipedia:Today%27s_featured_list/February_2022, /wiki/Wikipedia:Featured_lists, /wiki/File:Giraffa_camelopardalis_head_(Profil).jpg, /wiki/Northern_giraffe, /wiki/North_Africa, /wiki/Ossicone, /wiki/Zoo_d%27Amn%C3%A9ville, https://commons.wikimedia.org/wiki/User:Ritchyblack, /wiki/Template:POTD/2022-02-17, /wiki/Template:POTD/2022-02-16, /wiki/Template:POTD/2022-02-15, /wiki/Wikipedia:Picture_of_the_day/Archive, /wiki/Wikipedia:Featured_pictures, /wiki/Wikipedia:Community_portal, /wiki/Wikipedia:Help_desk, /wiki/Wikipedia:Reference_desk, /wiki/Wikipedia:News, /wiki/Wikipedia:Teahouse, /wiki/Wikipedia:Village_pump, /wiki/Wikimedia_Foundation, https://wikimediafoundation.org/our-work/wikimedia-projects/, https://commons.wikimedia.org/wiki/, https://commons.wikimedia.org/wiki/, https://www.mediawiki.org/wiki/, https://www.mediawiki.org/wiki/, https://meta.wikimedia.org/wiki/, https://meta.wikimedia.org/wiki/, https://en.wikibooks.org/wiki/, https://en.wikibooks.org/wiki/, https://www.wikidata.org/wiki/, https://www.wikidata.org/wiki/, https://en.wikinews.org/wiki/, https://en.wikinews.org/wiki/, https://en.wikiquote.org/wiki/, https://en.wikiquote.org/wiki/, https://en.wikisource.org/wiki/, https://en.wikisource.org/wiki/, https://species.wikimedia.org/wiki/, https://species.wikimedia.org/wiki/, https://en.wikiversity.org/wiki/, https://en.wikiversity.org/wiki/, https://en.wikivoyage.org/wiki/, https://en.wikivoyage.org/wiki/, https://en.wiktionary.org/wiki/, https://en.wiktionary.org/wiki/, /wiki/English_language, https://meta.wikimedia.org/wiki/List_of_Wikipedias, https://ar.wikipedia.org/wiki/, https://de.wikipedia.org/wiki/, https://es.wikipedia.org/wiki/, https://fr.wikipedia.org/wiki/, https://it.wikipedia.org/wiki/, https://nl.wikipedia.org/wiki/, https://ja.wikipedia.org/wiki/, https://pl.wikipedia.org/wiki/, https://pt.wikipedia.org/wiki/, https://ru.wikipedia.org/wiki/, https://sv.wikipedia.org/wiki/, https://uk.wikipedia.org/wiki/, https://vi.wikipedia.org/wiki/, https://zh.wikipedia.org/wiki/, https://id.wikipedia.org/wiki/, https://ms.wikipedia.org/wiki/, https://zh-min-nan.wikipedia.org/wiki/, https://bg.wikipedia.org/wiki/, https://ca.wikipedia.org/wiki/, https://cs.wikipedia.org/wiki/, https://da.wikipedia.org/wiki/, https://eo.wikipedia.org/wiki/, https://eu.wikipedia.org/wiki/, https://fa.wikipedia.org/wiki/, https://he.wikipedia.org/wiki/, https://ko.wikipedia.org/wiki/, https://hu.wikipedia.org/wiki/, https://no.wikipedia.org/wiki/, https://ro.wikipedia.org/wiki/, https://sr.wikipedia.org/wiki/, https://sh.wikipedia.org/wiki/, https://fi.wikipedia.org/wiki/, https://tr.wikipedia.org/wiki/, https://ast.wikipedia.org/wiki/, https://bn.wikipedia.org/wiki/, https://bs.wikipedia.org/wiki/, https://et.wikipedia.org/wiki/, https://el.wikipedia.org/wiki/, https://simple.wikipedia.org/wiki/, https://gl.wikipedia.org/wiki/, https://hr.wikipedia.org/wiki/, https://lv.wikipedia.org/wiki/, https://lt.wikipedia.org/wiki/, https://ml.wikipedia.org/wiki/, https://mk.wikipedia.org/wiki/, https://nn.wikipedia.org/wiki/, https://sq.wikipedia.org/wiki/, https://sk.wikipedia.org/wiki/, https://sl.wikipedia.org/wiki/, https://th.wikipedia.org/wiki/, https://en.wikipedia.org/w/index.php?title=Main_Page&oldid=1069328725, /wiki/Special:MyTalk, /wiki/Special:MyContributions, /w/index.php?title=Special:CreateAccount&returnto=Main+Page, /w/index.php?title=Special:UserLogin&returnto=Main+Page, /wiki/Main_Page, /wiki/Talk:Main_Page, /wiki/Main_Page, /w/index.php?title=Main_Page&action=edit, /w/index.php?title=Main_Page&action=history, /wiki/Main_Page, /wiki/Main_Page, /wiki/Wikipedia:Contents, /wiki/Portal:Current_events, /wiki/Special:Random, /wiki/Wikipedia:About, //en.wikipedia.org/wiki/Wikipedia:Contact_us, https://donate.wikimedia.org/wiki/Special:FundraiserRedirector?utm_source=donate&utm_medium=sidebar&utm_campaign=C13_en.wikipedia.org&uselang=en, /wiki/Help:Contents, /wiki/Help:Introduction, /wiki/Wikipedia:Community_portal, /wiki/Special:RecentChanges, /wiki/Wikipedia:File_Upload_Wizard, /wiki/Special:WhatLinksHere/Main_Page, /wiki/Special:RecentChangesLinked/Main_Page, /wiki/Wikipedia:File_Upload_Wizard, /wiki/Special:SpecialPages, /w/index.php?title=Main_Page&oldid=1069328725, /w/index.php?title=Main_Page&action=info, /w/index.php?title=Special:CiteThisPage&page=Main_Page&id=1069328725&wpFormIdentifier=titleform, https://www.wikidata.org/wiki/Special:EntityPage/Q5296, /w/index.php?title=Special:DownloadAsPdf&page=Main_Page&action=show-download-screen, /w/index.php?title=Main_Page&printable=yes, https://commons.wikimedia.org/wiki/Main_Page, https://www.mediawiki.org/wiki/MediaWiki, https://meta.wikimedia.org/wiki/Main_Page, https://wikisource.org/wiki/Main_Page, https://species.wikimedia.org/wiki/Main_Page, https://en.wikibooks.org/wiki/Main_Page, https://www.wikidata.org/wiki/Wikidata:Main_Page, https://wikimania.wikimedia.org/wiki/Wikimania, https://en.wikinews.org/wiki/Main_Page, https://en.wikiquote.org/wiki/Main_Page, https://en.wikisource.org/wiki/Main_Page, https://en.wikiversity.org/wiki/Wikiversity:Main_Page, https://en.wikivoyage.org/wiki/Main_Page, https://en.wiktionary.org/wiki/Wiktionary:Main_Page, https://ar.wikipedia.org/wiki/, https://bn.wikipedia.org/wiki/, https://bg.wikipedia.org/wiki/, https://bs.wikipedia.org/wiki/, https://ca.wikipedia.org/wiki/, https://cs.wikipedia.org/wiki/, https://da.wikipedia.org/wiki/, https://de.wikipedia.org/wiki/, https://et.wikipedia.org/wiki/, https://el.wikipedia.org/wiki/, https://es.wikipedia.org/wiki/, https://eo.wikipedia.org/wiki/, https://eu.wikipedia.org/wiki/, https://fa.wikipedia.org/wiki/, https://fr.wikipedia.org/wiki/, https://gl.wikipedia.org/wiki/, https://ko.wikipedia.org/wiki/, https://hr.wikipedia.org/wiki/, https://id.wikipedia.org/wiki/, https://it.wikipedia.org/wiki/, https://he.wikipedia.org/wiki/, https://ka.wikipedia.org/wiki/, https://lv.wikipedia.org/wiki/, https://lt.wikipedia.org/wiki/, https://hu.wikipedia.org/wiki/, https://mk.wikipedia.org/wiki/, https://ms.wikipedia.org/wiki/, https://nl.wikipedia.org/wiki/, https://ja.wikipedia.org/wiki/, https://no.wikipedia.org/wiki/, https://nn.wikipedia.org/wiki/, https://pl.wikipedia.org/wiki/, https://pt.wikipedia.org/wiki/, https://ro.wikipedia.org/wiki/, https://ru.wikipedia.org/wiki/, https://simple.wikipedia.org/wiki/, https://sk.wikipedia.org/wiki/, https://sl.wikipedia.org/wiki/, https://sr.wikipedia.org/wiki/, https://sh.wikipedia.org/wiki/, https://fi.wikipedia.org/wiki/, https://sv.wikipedia.org/wiki/, https://th.wikipedia.org/wiki/, https://tr.wikipedia.org/wiki/, https://uk.wikipedia.org/wiki/, https://vi.wikipedia.org/wiki/, https://zh.wikipedia.org/wiki/, //en.wikipedia.org/wiki/Wikipedia:Text_of_Creative_Commons_Attribution-ShareAlike_3.0_Unported_License, //creativecommons.org/licenses/by-sa/3.0/, //foundation.wikimedia.org/wiki/Terms_of_Use, //foundation.wikimedia.org/wiki/Privacy_policy, //www.wikimediafoundation.org/, https://foundation.wikimedia.org/wiki/Privacy_policy, /wiki/Wikipedia:About, /wiki/Wikipedia:General_disclaimer, //en.wikipedia.org/wiki/Wikipedia:Contact_us, //en.m.wikipedia.org/w/index.php?title=Main_Page&mobileaction=toggle_view_mobile, https://www.mediawiki.org/wiki/Special:MyLanguage/How_to_contribute, https://stats.wikimedia.org/#/en.wikipedia.org, https://foundation.wikimedia.org/wiki/Cookie_statement, https://wikimediafoundation.org/, https://www.mediawiki.org/
## 4                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            navbar.html

Unnest the result.

# unnest() df2 to have a tidy dataframe
df2 %>%
  unnest()

## Warning: `cols` is now required when using unnest().
## Please use `cols = c(links)`

## # A tibble: 484 × 2
##    urls              links                                                      
##    <chr>             <chr>                                                      
##  1 https://thinkr.fr https://thinkr.fr                                          
##  2 https://thinkr.fr #                                                          
##  3 https://thinkr.fr https://thinkr.fr/notre-vision-de-la-formation/            
##  4 https://thinkr.fr https://thinkr.fr/notre-vision-de-la-formation/            
##  5 https://thinkr.fr https://thinkr.fr/equipe/                                  
##  6 https://thinkr.fr https://thinkr.fr/blog/                                    
##  7 https://thinkr.fr https://thinkr.fr/faq/                                     
##  8 https://thinkr.fr https://thinkr.fr/recrutement/                             
##  9 https://thinkr.fr https://thinkr.fr/equipe/                                  
## 10 https://thinkr.fr https://thinkr.fr/diagnostic-et-accompagnement-a-lexploita…
## # … with 474 more rows

Well, you’ve aced this chapter on programming with purrr! Just imagine how many more lines of code you would have needed to get the list of all links without the tools from purrr. Now that you’ve got a good grasp of the potential of purrr, we’ll end this course with a case-study using a real life dataset.