Functional Programming with purrr

DataCamp

Course Description

Lists can be difficult to both understand and manipulate, but they can pack a ton of information and are very powerful. In this course, you will learn to easily extract, summarize, and manipulate lists and how to export the data to your desired object, be it another list, a vector, or even something else! Throughout the course, you will work with the purrr package and a variety of datasets from the repurrrsive package, including data from Star Wars and Wes Anderson films and data collected about GitHub users and GitHub repos. Following this course, your list skills will be purrrfect!

1 Simplifying with purrr

Iteration is a powerful way to make the computer do the work for you. It can also be an area of coding where it is easy to make lots of typos and simple mistakes. The purrr package helps simplify iteration so you can focus on the next step, instead of finding typos.

1.1 The power of iteration

1.1.1 Introduction to iteration

Imagine that you need to read in hundreds of files with a similar structure and perform an action on them. You don’t want to write hundreds of repetitive lines of code to read in all the files or to perform the action. Instead, you want to iterate over them. Iteration is the process of doing the same process to multiple inputs. Being able to iterate is important to make your code efficient, and is powerful when working with lists.

For this exercise, the names of 16 CSV files have been loaded into a list called files. In your own work, you could use the list.files() function to create this list. The readr library is also already loaded.

This course touches on a lot of concepts you may have forgotten, so if you ever need a quick refresher, download the tidyverse Cheat Sheet and keep it handy!

Create a for loop, which iterates over the files list, and gives each element as an input for readr::read_csv(), which is another way of saying the read_csv() function from the readr package.

# Initialize list
all_files <- list()

Then use that input, so the result is a list where each CSV file has been read into a separate element of the newly created all_files list.

files=list.files("/Users/apple/Documents/Rstudio/DataCamp/FoundationsofFunctionalProgrammingwithpurrr/simulated_data_from_1990_to_2005", pattern = "*.csv")
files=paste("/Users/apple/Documents/Rstudio/DataCamp/FoundationsofFunctionalProgrammingwithpurrr/simulated_data_from_1990_to_2005/",files,sep="")
# For loop to read files into a list
for(i in seq_along(files)){
  all_files[[i]] <- read_csv(files[[i]])
}

head(all_files)

## [[1]]
## # A tibble: 200 × 3
##    years     a     b
##    <dbl> <dbl> <dbl>
##  1  1990  5.25  197.
##  2  1990  8.17  192.
##  3  1990  6.49  192.
##  4  1990  5.82  195.
##  5  1990  5.54  201.
##  6  1990  6.65  196.
##  7  1990 10.4   208.
##  8  1990  1.66  183.
##  9  1990  2.78  174.
## 10  1990  8.34  198.
## # … with 190 more rows
## 
## [[2]]
## # A tibble: 200 × 3
##    years     a     b
##    <dbl> <dbl> <dbl>
##  1  1991  3.70  197.
##  2  1991  5.37  187.
##  3  1991  7.05  186.
##  4  1991  1.97  207.
##  5  1991  8.05  217.
##  6  1991  1.97  213.
##  7  1991  5.33  195.
##  8  1991  4.32  204.
##  9  1991  4.46  177.
## 10  1991  4.63  222.
## # … with 190 more rows
## 
## [[3]]
## # A tibble: 200 × 3
##    years     a     b
##    <dbl> <dbl> <dbl>
##  1  1992  8.64  178.
##  2  1992  3.70  207.
##  3  1992  4.79  206.
##  4  1992  9.22  194.
##  5  1992  6.49  202.
##  6  1992  4.58  197.
##  7  1992  5.06  174.
##  8  1992  2.20  216.
##  9  1992  4.72  177.
## 10  1992 10.0   188.
## # … with 190 more rows
## 
## [[4]]
## # A tibble: 200 × 3
##    years     a     b
##    <dbl> <dbl> <dbl>
##  1  1993  2.34  204.
##  2  1993  5.44  167.
##  3  1993  6.86  213.
##  4  1993  5.70  197.
##  5  1993  2.78  193.
##  6  1993  3.24  164.
##  7  1993  5.59  234.
##  8  1993  3.02  183.
##  9  1993  4.60  182.
## 10  1993  7.56  205.
## # … with 190 more rows
## 
## [[5]]
## # A tibble: 200 × 3
##    years     a     b
##    <dbl> <dbl> <dbl>
##  1  1994  3.40  197.
##  2  1994  4.29  214.
##  3  1994  6.91  175.
##  4  1994  3.11  181.
##  5  1994  5.50  185.
##  6  1994  3.59  211.
##  7  1994  2.97  189.
##  8  1994  7.40  171.
##  9  1994  9.66  198.
## 10  1994  8.19  221.
## # … with 190 more rows
## 
## [[6]]
## # A tibble: 200 × 3
##    years     a     b
##    <dbl> <dbl> <dbl>
##  1  1995  5.12  197.
##  2  1995  4.18  219.
##  3  1995  3.70  186.
##  4  1995  4.46  204.
##  5  1995  7.48  209.
##  6  1995  8.38  204.
##  7  1995  4.51  202.
##  8  1995  5.68  208.
##  9  1995  5.24  211.
## 10  1995  3.04  212.
## # … with 190 more rows

Output the size of the all_files list.

# Output size of list object
length(all_files)

## [1] 16

Good work! Now let’s see how to do it more easily with purrr.

1.1.2 Iteration with purrr

You’ve made a great for loop, but it uses a lot of code to do something as simple as input a series of files into a list. This is where purrr comes in. We can do the same thing as a for loop in one line of code with purrr::map(). The function map() iterates over a list, and uses another function that can specified with the .f argument.

map() takes two arguments:

The first is the list over that will be iterated over
The second is a function that will act on each element of the list

The readr library is already loaded.

Load the purrr library (note the 3 Rs).

# Load purrr library
library(purrr)

Replicate the for loop from the last exercise using map() instead. Use the same list files and the same function readr::read_csv().

# Use map to iterate
all_files_purrr <- map(files, read_csv)

head(all_files_purrr)

## [[1]]
## # A tibble: 200 × 3
##    years     a     b
##    <dbl> <dbl> <dbl>
##  1  1990  5.25  197.
##  2  1990  8.17  192.
##  3  1990  6.49  192.
##  4  1990  5.82  195.
##  5  1990  5.54  201.
##  6  1990  6.65  196.
##  7  1990 10.4   208.
##  8  1990  1.66  183.
##  9  1990  2.78  174.
## 10  1990  8.34  198.
## # … with 190 more rows
## 
## [[2]]
## # A tibble: 200 × 3
##    years     a     b
##    <dbl> <dbl> <dbl>
##  1  1991  3.70  197.
##  2  1991  5.37  187.
##  3  1991  7.05  186.
##  4  1991  1.97  207.
##  5  1991  8.05  217.
##  6  1991  1.97  213.
##  7  1991  5.33  195.
##  8  1991  4.32  204.
##  9  1991  4.46  177.
## 10  1991  4.63  222.
## # … with 190 more rows
## 
## [[3]]
## # A tibble: 200 × 3
##    years     a     b
##    <dbl> <dbl> <dbl>
##  1  1992  8.64  178.
##  2  1992  3.70  207.
##  3  1992  4.79  206.
##  4  1992  9.22  194.
##  5  1992  6.49  202.
##  6  1992  4.58  197.
##  7  1992  5.06  174.
##  8  1992  2.20  216.
##  9  1992  4.72  177.
## 10  1992 10.0   188.
## # … with 190 more rows
## 
## [[4]]
## # A tibble: 200 × 3
##    years     a     b
##    <dbl> <dbl> <dbl>
##  1  1993  2.34  204.
##  2  1993  5.44  167.
##  3  1993  6.86  213.
##  4  1993  5.70  197.
##  5  1993  2.78  193.
##  6  1993  3.24  164.
##  7  1993  5.59  234.
##  8  1993  3.02  183.
##  9  1993  4.60  182.
## 10  1993  7.56  205.
## # … with 190 more rows
## 
## [[5]]
## # A tibble: 200 × 3
##    years     a     b
##    <dbl> <dbl> <dbl>
##  1  1994  3.40  197.
##  2  1994  4.29  214.
##  3  1994  6.91  175.
##  4  1994  3.11  181.
##  5  1994  5.50  185.
##  6  1994  3.59  211.
##  7  1994  2.97  189.
##  8  1994  7.40  171.
##  9  1994  9.66  198.
## 10  1994  8.19  221.
## # … with 190 more rows
## 
## [[6]]
## # A tibble: 200 × 3
##    years     a     b
##    <dbl> <dbl> <dbl>
##  1  1995  5.12  197.
##  2  1995  4.18  219.
##  3  1995  3.70  186.
##  4  1995  4.46  204.
##  5  1995  7.48  209.
##  6  1995  8.38  204.
##  7  1995  4.51  202.
##  8  1995  5.68  208.
##  9  1995  5.24  211.
## 10  1995  3.04  212.
## # … with 190 more rows

Check the length of all_files_purrr.

# Output size of list object
length(all_files_purrr)

## [1] 16

Nice! You can see from the output here that 16 different files have been read into all_files_purrr.

1.1.3 More iteration with for loops

Iteration isn’t just for reading in files though; iteration can be used to perform other actions on objects. First, you will try iterating with a for loop.

You’re going to change each element of a list into a numeric data type and then put it back into the same element in the same list.

For this exercise, you will iterate using a for loop that takes list_of_df, which is a list of character vector, but the characters are actually numbers! You need to change the character vectors to numeric so that you can perform mathematical operations on them; you can use the base R function, as.numeric() to do that.

Check the class type of the first element of list_of_df.

list_of_df=lapply(1:10,function(x){1:4})
# Check the class type of the first element
class(list_of_df[[1]])

## [1] "integer"

Build a for loop that takes each element of list_of_df, changes it into numeric data with as.numeric(), and adds it back into the same element of list_of_df.

# Change each element from a character to a number
for(i in seq_along(list_of_df)){
    list_of_df[[i]] <- as.numeric(list_of_df[[i]])
}

Check the class type of the first element of list_of_df.

# Check the class type of the first element
class(list_of_df[[1]])

## [1] "numeric"

Print list_of_df.

# Print out the list
head(list_of_df)

## [[1]]
## [1] 1 2 3 4
## 
## [[2]]
## [1] 1 2 3 4
## 
## [[3]]
## [1] 1 2 3 4
## 
## [[4]]
## [1] 1 2 3 4
## 
## [[5]]
## [1] 1 2 3 4
## 
## [[6]]
## [1] 1 2 3 4

Nice! You can see from the output that we have a list of numbers now!

1.1.4 More iteration with purrr

Now you will change each element of a list into a numeric data type and then put it back into the same element in the same list, but instead of using a for loop, you’ll use map().

You can use the purrr function map() to more easily loop over a list, and turn the characters into numbers. Instead of having to build a whole for loop, you can use one line of code.

Check the class of the first element of list_of_df.

# Check the class type of the first element
class(list_of_df[[1]])

## [1] "numeric"

Use map() to iterate over list_of_df and change each element of the list into numeric data.

# Change each character element to a number
list_of_df <- map(list_of_df, as.numeric)

Check the class of the first element of list_of_df.

# Check the class type of the first element again
class(list_of_df[[1]])

## [1] "numeric"

Print out list_of_df.

# Print out the list
head(list_of_df)

## [[1]]
## [1] 1 2 3 4
## 
## [[2]]
## [1] 1 2 3 4
## 
## [[3]]
## [1] 1 2 3 4
## 
## [[4]]
## [1] 1 2 3 4
## 
## [[5]]
## [1] 1 2 3 4
## 
## [[6]]
## [1] 1 2 3 4

Good work! Now you can fix class type issues in your lists!

1.2 Subsetting lists

1.2.1 Subsetting lists

Often when working in R, you’ll use dataframes or vectors. Another kind of R object is a list. While lists can be complicated, lists are also incredibly powerful. Lists are like Hermione Granger’s bag of holding (from Harry Potter); they can hold a wide variety of things. The contents of a list don’t have to be the same data type, and as long as you know how it’s organized, you can grab out what you need by subsetting.

Both named and unnamed lists can be subset using double square brackets [[ ]] list this: listname[[ index ]]

If a list is named, you can also use $ for subsetting. The syntax list$elementname pulls out the named element from the list. Like any other kind of object in R, you can use the str() to determine the structure of the list.

Load the repurrrsive package.

# Load repurrrsive package, to get access to the wesanderson dataset
library(repurrrsive)

Load the wesanderson dataset.

# Load wesanderson dataset
data(wesanderson)

Examine the structure of the first element in wesanderson.

# Get structure of first element in wesanderson
str(wesanderson[[1]])

##  chr [1:4] "#F1BB7B" "#FD6467" "#5B1A18" "#D67236"

Examine the structure of the GrandBudapest element in wesanderson.

# Get structure of GrandBudapest element in wesanderson
str(wesanderson$GrandBudapest)

##  chr [1:4] "#F1BB7B" "#FD6467" "#5B1A18" "#D67236"

Good work! Now you can subset and determine the structure of each part of a named or unnamed list!

1.2.2 Subsetting list elements

You can also subset within list elements using bracket notation like this: ListName$ElementName[VectorNumber]. If a list element is a dataframe, you can pull out a column like this: ListName$ElementName$ColumnName or ListName[[1]][,1].

In this exercise, you’ll examine the wesanderson and sw_films datasets from the repurrrsive package. wesanderson contains color palettes for each of Wes Anderson’s movies. These colors are recorded in hexadecimal, that is, a # followed by six digits that indicate a particular color. Here, you will be using two ways of pulling out a particular color hexadecimal.

sw_films contains information about the films in the Star Wars franchise, such as title, director, producer, etc. You’ll use subsetting to explore this dataset.

Subset the third color from the first element of wesanderson. Then subset the fourth color from GrandBudapest.

# Third element of the first wesanderson vector
wesanderson[[1]][3]

## [1] "#5B1A18"

# Fourth element of the GrandBudapest wesanderson vector
wesanderson$GrandBudapest[4]

## [1] "#D67236"

Subset the first element from sw_films. Then subset the title element from the first element.

# Subset the first element of the sw_films data
sw_films[[1]]

## $title
## [1] "A New Hope"
## 
## $episode_id
## [1] 4
## 
## $opening_crawl
## [1] "It is a period of civil war.\r\nRebel spaceships, striking\r\nfrom a hidden base, have won\r\ntheir first victory against\r\nthe evil Galactic Empire.\r\n\r\nDuring the battle, Rebel\r\nspies managed to steal secret\r\nplans to the Empire's\r\nultimate weapon, the DEATH\r\nSTAR, an armored space\r\nstation with enough power\r\nto destroy an entire planet.\r\n\r\nPursued by the Empire's\r\nsinister agents, Princess\r\nLeia races home aboard her\r\nstarship, custodian of the\r\nstolen plans that can save her\r\npeople and restore\r\nfreedom to the galaxy...."
## 
## $director
## [1] "George Lucas"
## 
## $producer
## [1] "Gary Kurtz, Rick McCallum"
## 
## $release_date
## [1] "1977-05-25"
## 
## $characters
##  [1] "http://swapi.co/api/people/1/"  "http://swapi.co/api/people/2/" 
##  [3] "http://swapi.co/api/people/3/"  "http://swapi.co/api/people/4/" 
##  [5] "http://swapi.co/api/people/5/"  "http://swapi.co/api/people/6/" 
##  [7] "http://swapi.co/api/people/7/"  "http://swapi.co/api/people/8/" 
##  [9] "http://swapi.co/api/people/9/"  "http://swapi.co/api/people/10/"
## [11] "http://swapi.co/api/people/12/" "http://swapi.co/api/people/13/"
## [13] "http://swapi.co/api/people/14/" "http://swapi.co/api/people/15/"
## [15] "http://swapi.co/api/people/16/" "http://swapi.co/api/people/18/"
## [17] "http://swapi.co/api/people/19/" "http://swapi.co/api/people/81/"
## 
## $planets
## [1] "http://swapi.co/api/planets/2/" "http://swapi.co/api/planets/3/"
## [3] "http://swapi.co/api/planets/1/"
## 
## $starships
## [1] "http://swapi.co/api/starships/2/"  "http://swapi.co/api/starships/3/" 
## [3] "http://swapi.co/api/starships/5/"  "http://swapi.co/api/starships/9/" 
## [5] "http://swapi.co/api/starships/10/" "http://swapi.co/api/starships/11/"
## [7] "http://swapi.co/api/starships/12/" "http://swapi.co/api/starships/13/"
## 
## $vehicles
## [1] "http://swapi.co/api/vehicles/4/" "http://swapi.co/api/vehicles/6/"
## [3] "http://swapi.co/api/vehicles/7/" "http://swapi.co/api/vehicles/8/"
## 
## $species
## [1] "http://swapi.co/api/species/5/" "http://swapi.co/api/species/3/"
## [3] "http://swapi.co/api/species/2/" "http://swapi.co/api/species/1/"
## [5] "http://swapi.co/api/species/4/"
## 
## $created
## [1] "2014-12-10T14:23:31.880000Z"
## 
## $edited
## [1] "2015-04-11T09:46:52.774897Z"
## 
## $url
## [1] "http://swapi.co/api/films/1/"

# Subset the first element of the sw_films data, the title column 
sw_films[[1]]$title

## [1] "A New Hope"

Great work, now you should be very comfortable subsetting lists!

1.3 The many flavors of map()

1.3.1 map() argument alternatives

You can also use iteration to answer a question, like how long is each element in the wesanderson dataset. You can do this by feeding map() a function like length(). You can do this using the map(list, function) syntax and it works just fine. However, future exercises get more complex, you will need to learn how to do this second way, using:

map(list, ~function(.x))

This second way gives the same result as map(list, function). To specify how the list is used in the function, use the argument .x to denote where the list element goes inside the function. When you want to use .x to show where the element goes in the function, you need to put a ~ in front of the function in the second argument of map().

Use map() on wesanderson and determine the length of each element in the “old” way.

# Map over wesanderson to get the length of each element
map(wesanderson, length)

## $GrandBudapest
## [1] 4
## 
## $Moonrise1
## [1] 4
## 
## $Royal1
## [1] 4
## 
## $Moonrise2
## [1] 4
## 
## $Cavalcanti
## [1] 5
## 
## $Royal2
## [1] 5
## 
## $GrandBudapest2
## [1] 4
## 
## $Moonrise3
## [1] 5
## 
## $Chevalier
## [1] 4
## 
## $Zissou
## [1] 5
## 
## $FantasticFox
## [1] 5
## 
## $Darjeeling
## [1] 5
## 
## $Rushmore
## [1] 5
## 
## $BottleRocket
## [1] 7
## 
## $Darjeeling2
## [1] 5

Use map() on wesanderson and determine the length of each element again, but this time using map(list, ~function(.x)).

# Map over wesanderson, and determine the length of each element
map(wesanderson, ~length(.x))

## $GrandBudapest
## [1] 4
## 
## $Moonrise1
## [1] 4
## 
## $Royal1
## [1] 4
## 
## $Moonrise2
## [1] 4
## 
## $Cavalcanti
## [1] 5
## 
## $Royal2
## [1] 5
## 
## $GrandBudapest2
## [1] 4
## 
## $Moonrise3
## [1] 5
## 
## $Chevalier
## [1] 4
## 
## $Zissou
## [1] 5
## 
## $FantasticFox
## [1] 5
## 
## $Darjeeling
## [1] 5
## 
## $Rushmore
## [1] 5
## 
## $BottleRocket
## [1] 7
## 
## $Darjeeling2
## [1] 5

Great Job! This new way of writing map_*() functions will come in handy in future exercises, so make a mental note of the ~ and the .x argument.

1.3.2 map_*

The map() function will return its output as a list. However, there are several different map() functions; you can use map_() functions to tell purrr the type of output you want. The in map_*() represents different R data types. For instance, you might want the output to be a vector of numbers so that we can put it inside a dataframe. So, unless you want something to be returned as a list, you need to determine what you want the output to be before you write your map() function.

Determine the length of each element of the wesanderson dataset using our original map() function. Examine the output.

# Map over wesanderson, to determine the length of each element
map(wesanderson, length)

## $GrandBudapest
## [1] 4
## 
## $Moonrise1
## [1] 4
## 
## $Royal1
## [1] 4
## 
## $Moonrise2
## [1] 4
## 
## $Cavalcanti
## [1] 5
## 
## $Royal2
## [1] 5
## 
## $GrandBudapest2
## [1] 4
## 
## $Moonrise3
## [1] 5
## 
## $Chevalier
## [1] 4
## 
## $Zissou
## [1] 5
## 
## $FantasticFox
## [1] 5
## 
## $Darjeeling
## [1] 5
## 
## $Rushmore
## [1] 5
## 
## $BottleRocket
## [1] 7
## 
## $Darjeeling2
## [1] 5

Create a dataframe that has the number of colors from each movie, using map_dbl(). The dbl means a double or a number that can have a decimal.

# Create a numcolors column and fill with length of each wesanderson element
data.frame(numcolors = map_dbl(wesanderson, ~length(.x)))

##                numcolors
## GrandBudapest          4
## Moonrise1              4
## Royal1                 4
## Moonrise2              4
## Cavalcanti             5
## Royal2                 5
## GrandBudapest2         4
## Moonrise3              5
## Chevalier              4
## Zissou                 5
## FantasticFox           5
## Darjeeling             5
## Rushmore               5
## BottleRocket           7
## Darjeeling2            5

Good work! Notice how much cleaner the output was using map_dbl()! It’s always worth thinking through which map_*() function will get you where to need to go before coding it out. In our next chapter, we’ll dive into more complex uses of purrr.

2 More complex iterations

purrr is much more than a for loop; it works well with pipes, we can use it to run models and simulate data, and make nested loops!

2.1 Working with unnamed lists

2.1.1 Names & pipe refresher

It is easy to determine if a list has names using names(). Understanding the named elements of a list can make working with the list elements easier because you can pull out the information you need by name, instead of searching for the correct numbered element.

purrr is a part of the tidyverse, a system of packages designed to be used together, and used with pipes. Let’s do a quick refresh on how pipes work. A pipe %>% takes the output from the function that comes before it, and feeds it into the function that comes after the pipe as its first argument.

function_before() %>% 
    function_after()

You don’t need to use pipes when you use purrr functions, but for the purposes of these lessons, you will be.

Check to see if the sw_films list has named elements with pipes.

# Use pipes to check for names in sw_films
sw_films %>%
    names()

## NULL

Good work! Now that you know how to check to see if a list has names in a tidy way, you’re ready to dive in.

2.1.2 Setting names

If you have an unnamed list, you can, of course, name each element. This can be very useful for being able to call out certain elements in a list, regardless of their order, especially if you are working with a list that may grow or change over time, or if you use the same code on several different lists. For instance, if you have a list that contains, a dataframe, a model, and a plot, being able to call out $plot instead of searching to figure out what numbered element of the plot, is much easier.

With a piped workflow:

Name each element of sw_films list and assign to a new list, sw_films_named.
Iterate over the title element.

# Set names so each element of the list is named for the film title
sw_films_named <- sw_films %>% 
  set_names(map_chr(sw_films, "title"))

Check to make sure the new list has names.

# Check to see if the names worked/are correct
names(sw_films_named)

## [1] "A New Hope"              "Attack of the Clones"   
## [3] "The Phantom Menace"      "Revenge of the Sith"    
## [5] "Return of the Jedi"      "The Empire Strikes Back"
## [7] "The Force Awakens"

Good work! Naming lists makes working in purrr easier and more human-readable.

2.1.3 Pipes in map()

So you’ve refreshed your memory on how pipes can be used between functions. You can also use pipes on the inside of map() function to help you iterate a pipeline of tasks over a list of inputs.

Here instead of using one of the repurrrsive datasets, you will be working with a list of numbers so that you can do a few mathematical operations.

Create a list that contains the values 1 through 10, each as a separate element.

# Create a list of values from 1 through 10
numlist <- list(1,2,3,4,5,6,7,8,9,10)

Create a pipeline within one map() function that takes the sqrt() of each element, and then the sin() of each element.

# Iterate over the numlist 
map(numlist, ~.x %>% sqrt() %>% sin()) %>% head()

## [[1]]
## [1] 0.841471
## 
## [[2]]
## [1] 0.9877659
## 
## [[3]]
## [1] 0.9870266
## 
## [[4]]
## [1] 0.9092974
## 
## [[5]]
## [1] 0.7867491
## 
## [[6]]
## [1] 0.6381576

Good work! Using pipes inside of map() makes iterating over multiple functions easy.

2.2 More map()

2.2.1 Simulating Data with Purrr

Often when trying to solve a problem with data we first need to build some simulated data to see if our idea is even possible. For example, you may want to test models with data that have known differences, to see if the models are working correctly.

In this exercise, you will see how this works in purrr by simulating data for two populations, a and b, from the sites: “north”, “east”, and “west”. The two populations will be randomly drawn from a normal distribution, with different means and standard deviations.

Create a list of site names, “north”, “east”, and “west”.

# List of sites north, east, and west
sites <- list("north","east","west")

Then use map() to create a list of dataframes with three columns, the first column is sites.

The second is population a, which has a mean of 5, a sample size n of 200, and an sd of (5/2).
The third is population b, which has a mean of 200, a sample size n of 200, and an sd of 15.

# Create a list of dataframes, each with a years, a, and b column
list_of_df <-  map(sites,  
  ~data.frame(sites = .x,
              a = rnorm(mean = 5,   n = 200, sd = (5/2)),
              b = rnorm(mean = 200, n = 200, sd = 15)))

map(list_of_df,~head(.x))

## [[1]]
##   sites         a        b
## 1 north  6.671339 197.9598
## 2 north 10.090051 212.0460
## 3 north  4.785466 185.1392
## 4 north  4.930145 233.7320
## 5 north  4.438924 205.4188
## 6 north -1.017154 212.5341
## 
## [[2]]
##   sites        a        b
## 1  east 7.099261 160.7708
## 2  east 0.112617 185.9312
## 3  east 3.535525 186.0787
## 4  east 2.920502 205.4623
## 5  east 6.971227 183.9190
## 6  east 6.388077 195.9712
## 
## [[3]]
##   sites        a        b
## 1  west 9.524335 186.3719
## 2  west 5.210117 199.9047
## 3  west 6.457159 201.0256
## 4  west 2.181551 215.0170
## 5  west 4.436093 195.9054
## 6  west 8.011461 189.4125

Good work! Now you can simulate data with ease.

2.2.2 Run a linear model

You can use map() to do more than just take the square root of a number or simulate data. You can also use map() to loop over different inputs to run several models, each using the unique values of a given list element. You can also then iterate over the models you’ve run to create the model summaries and look at the results.

The lists sites and list_of_df are preloaded.

Pipe list_of_df into map() along with the lm() linear model function, to compare a as the response and b as the predictor variable.
- Use the syntax: lm(response ~ predictor, data = )
Then pipe the linear model output into map() and generate the summary() of each model.

# Map over the models to look at the relationship of a vs b
list_of_df %>%
    map(~ lm(a ~ b, data = .)) %>%
    map(summary)

## [[1]]
## 
## Call:
## lm(formula = a ~ b, data = .)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.9401 -1.9836 -0.1301  1.6425  5.7177 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)  6.86981    2.27909   3.014  0.00291 **
## b           -0.00916    0.01139  -0.804  0.42211   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.445 on 198 degrees of freedom
## Multiple R-squared:  0.003258,   Adjusted R-squared:  -0.001776 
## F-statistic: 0.6471 on 1 and 198 DF,  p-value: 0.4221
## 
## 
## [[2]]
## 
## Call:
## lm(formula = a ~ b, data = .)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -6.261 -1.462  0.050  1.651  6.573 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept) -0.73189    2.19967  -0.333   0.7397  
## b            0.02786    0.01104   2.524   0.0124 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.479 on 198 degrees of freedom
## Multiple R-squared:  0.03117,    Adjusted R-squared:  0.02627 
## F-statistic:  6.37 on 1 and 198 DF,  p-value: 0.01239
## 
## 
## [[3]]
## 
## Call:
## lm(formula = a ~ b, data = .)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.8803 -1.7816  0.1036  1.8021  5.6971 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)  
## (Intercept)  4.895e+00  2.294e+00   2.134   0.0341 *
## b           -9.945e-05  1.146e-02  -0.009   0.9931  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.407 on 198 degrees of freedom
## Multiple R-squared:  3.803e-07,  Adjusted R-squared:  -0.00505 
## F-statistic: 7.53e-05 on 1 and 198 DF,  p-value: 0.9931

Good work! This will make running multiple models and summarizing their results much easier.

2.2.3 map_chr()

In this exercise, you’ll dive a bit deeper into the different map_() variants. The map() function always outputs a list. map_() outputs other kinds of information. Study the table below and make sure you’re clear on the type of output for each map_*() variant.

`map_*()`	Output
`map_chr()`	character vector
`map_lgl()`	logical vector [TRUE or FALSE]
`map_int()`	integer vector
`map_dbl()`	double vector

Compare the results of map() and map_chr() for the director named element sw_films.

# Pull out the director element of sw_films in a list and character vector
map(sw_films, ~.x[["director"]])

## [[1]]
## [1] "George Lucas"
## 
## [[2]]
## [1] "George Lucas"
## 
## [[3]]
## [1] "George Lucas"
## 
## [[4]]
## [1] "George Lucas"
## 
## [[5]]
## [1] "Richard Marquand"
## 
## [[6]]
## [1] "Irvin Kershner"
## 
## [[7]]
## [1] "J. J. Abrams"

map_chr(sw_films, ~.x[["director"]])

## [1] "George Lucas"     "George Lucas"     "George Lucas"     "George Lucas"    
## [5] "Richard Marquand" "Irvin Kershner"   "J. J. Abrams"

Compare the map() and map_lgl() outputs on sw_films for director == George Lucas.

# Compare outputs when checking if director is George Lucas
map(sw_films, ~.x[["director"]] == "George Lucas")

## [[1]]
## [1] TRUE
## 
## [[2]]
## [1] TRUE
## 
## [[3]]
## [1] TRUE
## 
## [[4]]
## [1] TRUE
## 
## [[5]]
## [1] FALSE
## 
## [[6]]
## [1] FALSE
## 
## [[7]]
## [1] FALSE

map_lgl(sw_films, ~.x[["director"]] == "George Lucas")

## [1]  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE

Good work! Mastering the flavors of map_*() is key for success in purrr.

2.2.4 map_dbl() and map_int()

Some flavors of map_() are very similar. map_dbl() and map_int() both output numbers. map_int() outputs integer vectors, which have numbers with no decimals. map_dbl() outputs double vectors, which have numbers that can have decimals. Take a closer look at how using different map_() functions affect outputs.

Here is the map_*() table again as a reference.

`map_*()`	Output
`map_chr()`	character vector
`map_lgl()`	logical vector [TRUE or FALSE]
`map_int()`	integer vector
`map_dbl()`	double vector

Compare the map() and map_dbl() outputs for pulling out the episode_id for each element of sw_films.

# Pull out episode_id element as list
map(sw_films, ~.x[["episode_id"]])

## [[1]]
## [1] 4
## 
## [[2]]
## [1] 2
## 
## [[3]]
## [1] 1
## 
## [[4]]
## [1] 3
## 
## [[5]]
## [1] 6
## 
## [[6]]
## [1] 5
## 
## [[7]]
## [1] 7

# Pull out episode_id element as double vector
map_dbl(sw_films, ~.x[["episode_id"]])

## [1] 4 2 1 3 6 5 7

Compare the map() and map_int() outputs for pulling out the episode_id for each element of sw_films.

# Pull out episode_id element as a list
map(sw_films, ~.x[["episode_id"]])

## [[1]]
## [1] 4
## 
## [[2]]
## [1] 2
## 
## [[3]]
## [1] 1
## 
## [[4]]
## [1] 3
## 
## [[5]]
## [1] 6
## 
## [[6]]
## [1] 5
## 
## [[7]]
## [1] 7

# Pull out episode_id element as integer vector
map_int(sw_films, ~.x[["episode_id"]])

## [1] 4 2 1 3 6 5 7

Good work! Now you can output numbers without decimals!

2.3 map2() and pmap()

2.3.1 Simulating data with multiple inputs using map2()

The map() function is great if you need to iterate over one list, however, you will often need to iterate over two lists at the same time. This is where map2() comes in. While map() takes the list as the .x argument; map2() takes two lists as two arguments: .x and .y.

To test out map2(), you are going to create a simple dataset, with one list of numbers and one list of strings. You will put these two lists together and create some simulated data.

Create a list, means, of the values 1 through 3.

# List of 1, 2 and 3
means <- list(1,2,3)

Create a sites list with “north”, “west”, and “east”.

# Create sites list
sites <- list("north","west","east")

map2() over the sites and means lists to create a dataframe with two columns.

First column is sites; second column is generated by rnorm() with mean from the means list.

# Map over two arguments: sites and means
list_of_files_map2 <- map2(sites, means, ~data.frame(sites = .x,
                             a = rnorm(mean = .y, n = 200, sd = (5/2))))


map(list_of_files_map2,~head(.x))

## [[1]]
##   sites         a
## 1 north  3.449187
## 2 north  2.893941
## 3 north -2.361453
## 4 north  1.442438
## 5 north  1.414757
## 6 north  2.054845
## 
## [[2]]
##   sites          a
## 1  west  2.1773931
## 2  west  1.8438938
## 3  west  4.9336391
## 4  west  3.2757952
## 5  west -0.2904645
## 6  west  2.6134759
## 
## [[3]]
##   sites        a
## 1  east 2.297837
## 2  east 2.864035
## 3  east 3.616742
## 4  east 8.251796
## 5  east 3.199242
## 6  east 1.196774

Good work! Now you can you two lists together!

2.3.2 Simulating data 3+ inputs with pmap()

What if you need to iterate over three lists? Is there a map3()? To iterate over more than two lists, whether it’s three, four, or even 20, you’ll need to use pmap(). However, pmap() does require us to supply our list arguments a bit differently.

To use pmap(), you first need to create a master list of all the lists we want to iterate over. The master list is the input for pmap(). Instead of using .x or .y, use the list names as the argument names.

You are going to simulate data one more time, using five lists as inputs, instead of two. Using pmap() gives you complete control over our simulated dataset, and will allow you to use two different means and two different standard deviations along with the different sites.

Create a named list containing the sites, means, means2, sigma, and sigma2 lists.

means2=list(0.5,1,1.5)
sigma2=list(0.5,1,1.5)
sigma=list(1,2,3)
# Create a master list, a list of lists
pmapinputs <- list(sites = sites, means = means, sigma = sigma, 
                   means2 = means2, sigma2 = sigma2)

pmap() over the list of lists, to create a list of dataframes with three columns; the first column is sites.

The second column is a, which is rnorm() with mean = means, and sd = sigma.
The third column is b, which is rnorm() with mean = means2, and sd = sigma2.

# Create a master list, a list of lists
pmapinputs <- list(sites = sites, means = means, sigma = sigma, 
                   means2 = means2, sigma2 = sigma2)

# Map over the master list
list_of_files_pmap <- pmap(pmapinputs, 
  function(sites, means, sigma, means2, sigma2){
    data.frame(sites = sites,
        a = rnorm(mean = means,  n = 200, sd = sigma),
        b = rnorm(mean = means2, n = 200, sd = sigma2))})

map(list_of_files_pmap,~head(.x))

## [[1]]
##   sites          a          b
## 1 north  0.8789700  0.3855860
## 2 north -0.2245231  1.0029900
## 3 north  0.6417973  0.6355501
## 4 north  1.8780409  0.9760013
## 5 north  1.5165513 -0.1304455
## 6 north  2.4963962  1.1369883
## 
## [[2]]
##   sites           a           b
## 1  west -0.09834419  1.30693846
## 2  west -0.64468010  0.57628770
## 3  west  4.81134596 -0.01585508
## 4  west -0.85907440 -0.18470665
## 5  west  0.47639746  0.11106034
## 6  west  2.02665430  1.06197220
## 
## [[3]]
##   sites         a         b
## 1  east  5.537842 2.2437003
## 2  east  2.314830 1.1598322
## 3  east  1.287959 2.8198972
## 4  east  9.464502 1.1001475
## 5  east -1.857650 2.6695855
## 6  east  4.580386 0.1986446

Good work! With pmap() you now have all the power in purrr.

3 Troubleshooting lists with purrr

Like anything in R, understanding how to troubleshoot issues is an important skill set. This can be particularly important with lists, where finding the problem can be tricky.

3.1 How to purrr safely()

3.1.1 safely() replace with NA

If you map() over a list, and one of the elements does not have the right data type, you will not get the output you expect. Perhaps you are trying to do a mathematical operation on each element, and it turns out one of the elements is a character - it simply won’t work.

If you have a very large list, figuring out where things went wrong, and what exactly went wrong can be hard. That is where safely() comes in; it shows you both your results and where the errors occurred in your map() call.

Use safely() with log(). This will fail to work on -10, so we’ll pipe it into transpose() to put the results first.

# Map safely over log
a <- list(-10, 1, 10, 0) %>% 
      map(safely(log, otherwise = NA_real_)) %>%
    # Transpose the result
      transpose()

## Warning in .f(...): NaNs produced

Print out a.

# Print the list
a

## $result
## $result[[1]]
## [1] NaN
## 
## $result[[2]]
## [1] 0
## 
## $result[[3]]
## [1] 2.302585
## 
## $result[[4]]
## [1] -Inf
## 
## 
## $error
## $error[[1]]
## NULL
## 
## $error[[2]]
## NULL
## 
## $error[[3]]
## NULL
## 
## $error[[4]]
## NULL

Print out the “result” element of a.

# Print the result element in the list
a[["result"]]

## [[1]]
## [1] NaN
## 
## [[2]]
## [1] 0
## 
## [[3]]
## [1] 2.302585
## 
## [[4]]
## [1] -Inf

Print out just the error messages from a.

# Print the error element in the list
a[["error"]]

## [[1]]
## NULL
## 
## [[2]]
## NULL
## 
## [[3]]
## NULL
## 
## [[4]]
## NULL

Good work! Now you have the power to start debugging your lists, and you can do it with simple element subsetting.

3.1.2 Convert data to numeric with purrr

In the sw_people dataset, some of the Star Wars characters have unknown heights. If you want to do some data exploration and determine how character height differs depending on their home planet, you need to write your code so that R understands the difference between heights and missing values. Currently, the missing values are entered as “unknown”, but you would like them as NA. In this exercise, you will combine map() and ifelse() to fix this issue.

Load the sw_people dataset.

# Load sw_people data
data(sw_people)

Map over sw_people and pull out “height”.

Then map over the output and if an element is labeled as “unknown” change it to NA, otherwise, convert the value into a number with as.numeric().

# Map over sw_people and pull out the height element
height_cm <- map(sw_people, "height") %>%
  map(function(x){
    ifelse(x == "unknown",NA,
    as.numeric(x))
})

Good work! Now you can use purrr for data wrangling to help clean numeric data in lists.

3.1.3 Finding the problem areas

When you are working with a small list, it might not seem like a lot of work to go through things manually and figure out what element has an issue. But if you have a list with hundreds or thousands of elements, you want to automate that process.

Now you’ll look at a situation with a larger list, where you can see how the error message can be useful to check through the entire list for issues.

map() over sw_people and pull out the “height” element.

map() over safely() to convert the heights from centimeters into feet.

Set quiet = FALSE so that errors are printed.

# Map over sw_people and pull out the height element
height_ft <- map(sw_people, "height")  %>% 
  map(safely(function(x){
    x * 0.0328084
  }, quiet = FALSE)) %>%
transpose()

Pipe into transpose(), to print the results first.

# Print your list, the result element, and the error element
#height_ft
#height_ft[["result"]]
#height_ft[["error"]]

Good work! Now you are ready to troubleshoot lists too large to check by hand.

3.2 Another way to possibly() purrr

3.2.1 Replace safely() with possibly()

Once you have figured out how to solve an issue with safely(), (e.g., output an NA in place of an error), swap out safely() with possibly(). possibly() will run through your code and implement your desired changes without printing out the error messages.

You’ll now map() over log() again, but you will use possibly() instead of safely() since you already know how to resolve your errors.

Create a list with the values -10, 1, 10, and 0.
map() over this list to take the log() of each element, using possibly().
Use NA_real_ to fix any elements that are not the right data type.

# Take the log of each element in the list
a <- list(-10, 1, 10, 0) %>% 
  map(possibly(function(x){
    log(x)
},NA_real_))

## Warning in log(x): NaNs produced

Good work! Now you can solve issues in lists using safely(), and then continue with your analysis using possibly().

3.2.2 Convert values with possibly()

Let’s say you need to convert the Star Wars character heights in sw_people from centimeters to feet. You already know that some of the heights have missing data, so you will use possibly() to convert missing values into NA. Then you will multiply each of the existing values by 0.0328084 to convert them from centimeters into feet.

To get a feel for your data, print out height_cm in the console to check out the heights in centimeters.

Pipe the height_cm object into a map_*() function that returns double vectors.
Convert each element in height_cm into feet (multiply it by 0.0328084).
Since not all elements are numeric, use possibly() to replace instances that do not work with NA_real_.

# Create a piped workflow that returns double vectors
height_cm %>%  
  map_dbl(possibly(function(x){
  # Convert centimeters to feet
  x * 0.0328084
}, NA_real_))

##  [1] 5.643045 5.479003 3.149606 6.627297 4.921260 5.839895 5.413386 3.182415
##  [9] 6.003937 5.971129 6.167979 5.905512 7.480315 5.905512 5.675853 5.741470
## [17] 5.577428 5.905512 2.165354 5.577428 6.003937 6.561680 6.233596 5.807087
## [25] 5.741470 5.905512 4.921260       NA 2.887139 5.249344 6.332021 6.266404
## [33] 5.577428 6.430446 7.349082 6.758530 6.003937 4.494751 3.674541 6.003937
## [41] 5.347769 5.741470 5.905512 5.839895 3.083990 4.002625 5.347769 6.167979
## [49] 6.496063 6.430446 5.610236 6.036746 6.167979 8.661418 6.167979 6.430446
## [57] 6.069554 5.150919 6.003937 6.003937 5.577428 5.446194 5.413386 6.332021
## [65] 6.266404 6.003937 5.511811 6.496063 7.513124 6.988189 5.479003 2.591864
## [73] 3.149606 6.332021 6.266404 5.839895 7.086614 7.677166 6.167979 5.839895
## [81] 6.758530       NA       NA       NA       NA       NA 5.413386

Good work! Using possibly() helps us work with problem data in a really clean and efficient way.

3.3 purrr is a walk() in the park

3.3.1 Comparing walk() vs no walk() outputs

Printing out lists with map() shows a lot of bracketed text in the console, which can be useful for understanding their structure, but this information is usually not important for communicating with your end users. If you need to print, using walk() prints out lists in a more compact and human-readable way, without all those brackets. walk() is also great for printing out plots without printing anything to the console.

Here, you’ll be using the people_by_film dataset, which dataset derived from sw_films that has the url of each character and the film they appear in.

Print people_by_film to the console.

# Print normally
people_by_film=read.csv("https://docs.google.com/spreadsheets/d/e/2PACX-1vRObsvb_OQ7qeXRvkTEbWBbQcYfyebglhoxAt9cIdRzH7Exf5s-mMqSgtjkHC0qNgK4PVsku7Q0bwfS/pub?gid=0&single=true&output=csv")
people_by_film %>% head()

##                             url                     film_url
## 1 http://swapi.co/api/people/1/ http://swapi.co/api/films/6/
## 2 http://swapi.co/api/people/1/ http://swapi.co/api/films/3/
## 3 http://swapi.co/api/people/1/ http://swapi.co/api/films/2/
## 4 http://swapi.co/api/people/1/ http://swapi.co/api/films/1/
## 5 http://swapi.co/api/people/1/ http://swapi.co/api/films/7/
## 6 http://swapi.co/api/people/2/ http://swapi.co/api/films/5/

Print out people_by_film using walk() and print().

# Print with walk
walk(people_by_film, print)

##   [1] "http://swapi.co/api/people/1/"  "http://swapi.co/api/people/1/" 
##   [3] "http://swapi.co/api/people/1/"  "http://swapi.co/api/people/1/" 
##   [5] "http://swapi.co/api/people/1/"  "http://swapi.co/api/people/2/" 
##   [7] "http://swapi.co/api/people/2/"  "http://swapi.co/api/people/2/" 
##   [9] "http://swapi.co/api/people/2/"  "http://swapi.co/api/people/2/" 
##  [11] "http://swapi.co/api/people/2/"  "http://swapi.co/api/people/3/" 
##  [13] "http://swapi.co/api/people/3/"  "http://swapi.co/api/people/3/" 
##  [15] "http://swapi.co/api/people/3/"  "http://swapi.co/api/people/3/" 
##  [17] "http://swapi.co/api/people/3/"  "http://swapi.co/api/people/3/" 
##  [19] "http://swapi.co/api/people/4/"  "http://swapi.co/api/people/4/" 
##  [21] "http://swapi.co/api/people/4/"  "http://swapi.co/api/people/4/" 
##  [23] "http://swapi.co/api/people/5/"  "http://swapi.co/api/people/5/" 
##  [25] "http://swapi.co/api/people/5/"  "http://swapi.co/api/people/5/" 
##  [27] "http://swapi.co/api/people/5/"  "http://swapi.co/api/people/6/" 
##  [29] "http://swapi.co/api/people/6/"  "http://swapi.co/api/people/6/" 
##  [31] "http://swapi.co/api/people/7/"  "http://swapi.co/api/people/7/" 
##  [33] "http://swapi.co/api/people/7/"  "http://swapi.co/api/people/8/" 
##  [35] "http://swapi.co/api/people/9/"  "http://swapi.co/api/people/10/"
##  [37] "http://swapi.co/api/people/10/" "http://swapi.co/api/people/10/"
##  [39] "http://swapi.co/api/people/10/" "http://swapi.co/api/people/10/"
##  [41] "http://swapi.co/api/people/10/" "http://swapi.co/api/people/11/"
##  [43] "http://swapi.co/api/people/11/" "http://swapi.co/api/people/11/"
##  [45] "http://swapi.co/api/people/12/" "http://swapi.co/api/people/12/"
##  [47] "http://swapi.co/api/people/13/" "http://swapi.co/api/people/13/"
##  [49] "http://swapi.co/api/people/13/" "http://swapi.co/api/people/13/"
##  [51] "http://swapi.co/api/people/13/" "http://swapi.co/api/people/14/"
##  [53] "http://swapi.co/api/people/14/" "http://swapi.co/api/people/14/"
##  [55] "http://swapi.co/api/people/14/" "http://swapi.co/api/people/15/"
##  [57] "http://swapi.co/api/people/16/" "http://swapi.co/api/people/16/"
##  [59] "http://swapi.co/api/people/16/" "http://swapi.co/api/people/18/"
##  [61] "http://swapi.co/api/people/18/" "http://swapi.co/api/people/18/"
##  [63] "http://swapi.co/api/people/19/" "http://swapi.co/api/people/20/"
##  [65] "http://swapi.co/api/people/20/" "http://swapi.co/api/people/20/"
##  [67] "http://swapi.co/api/people/20/" "http://swapi.co/api/people/20/"
##  [69] "http://swapi.co/api/people/21/" "http://swapi.co/api/people/21/"
##  [71] "http://swapi.co/api/people/21/" "http://swapi.co/api/people/21/"
##  [73] "http://swapi.co/api/people/21/" "http://swapi.co/api/people/22/"
##  [75] "http://swapi.co/api/people/22/" "http://swapi.co/api/people/22/"
##  [77] "http://swapi.co/api/people/23/" "http://swapi.co/api/people/24/"
##  [79] "http://swapi.co/api/people/25/" "http://swapi.co/api/people/25/"
##  [81] "http://swapi.co/api/people/26/" "http://swapi.co/api/people/27/"
##  [83] "http://swapi.co/api/people/27/" "http://swapi.co/api/people/28/"
##  [85] "http://swapi.co/api/people/29/" "http://swapi.co/api/people/30/"
##  [87] "http://swapi.co/api/people/31/" "http://swapi.co/api/people/32/"
##  [89] "http://swapi.co/api/people/33/" "http://swapi.co/api/people/33/"
##  [91] "http://swapi.co/api/people/33/" "http://swapi.co/api/people/34/"
##  [93] "http://swapi.co/api/people/36/" "http://swapi.co/api/people/36/"
##  [95] "http://swapi.co/api/people/37/" "http://swapi.co/api/people/38/"
##  [97] "http://swapi.co/api/people/39/" "http://swapi.co/api/people/40/"
##  [99] "http://swapi.co/api/people/40/" "http://swapi.co/api/people/41/"
## [101] "http://swapi.co/api/people/42/" "http://swapi.co/api/people/43/"
## [103] "http://swapi.co/api/people/43/" "http://swapi.co/api/people/44/"
## [105] "http://swapi.co/api/people/45/" "http://swapi.co/api/people/46/"
## [107] "http://swapi.co/api/people/46/" "http://swapi.co/api/people/46/"
## [109] "http://swapi.co/api/people/48/" "http://swapi.co/api/people/49/"
## [111] "http://swapi.co/api/people/50/" "http://swapi.co/api/people/51/"
## [113] "http://swapi.co/api/people/51/" "http://swapi.co/api/people/51/"
## [115] "http://swapi.co/api/people/52/" "http://swapi.co/api/people/52/"
## [117] "http://swapi.co/api/people/52/" "http://swapi.co/api/people/53/"
## [119] "http://swapi.co/api/people/53/" "http://swapi.co/api/people/53/"
## [121] "http://swapi.co/api/people/54/" "http://swapi.co/api/people/54/"
## [123] "http://swapi.co/api/people/55/" "http://swapi.co/api/people/55/"
## [125] "http://swapi.co/api/people/56/" "http://swapi.co/api/people/56/"
## [127] "http://swapi.co/api/people/57/" "http://swapi.co/api/people/58/"
## [129] "http://swapi.co/api/people/58/" "http://swapi.co/api/people/58/"
## [131] "http://swapi.co/api/people/59/" "http://swapi.co/api/people/59/"
## [133] "http://swapi.co/api/people/60/" "http://swapi.co/api/people/61/"
## [135] "http://swapi.co/api/people/62/" "http://swapi.co/api/people/63/"
## [137] "http://swapi.co/api/people/63/" "http://swapi.co/api/people/64/"
## [139] "http://swapi.co/api/people/64/" "http://swapi.co/api/people/65/"
## [141] "http://swapi.co/api/people/66/" "http://swapi.co/api/people/67/"
## [143] "http://swapi.co/api/people/67/" "http://swapi.co/api/people/68/"
## [145] "http://swapi.co/api/people/68/" "http://swapi.co/api/people/69/"
## [147] "http://swapi.co/api/people/70/" "http://swapi.co/api/people/71/"
## [149] "http://swapi.co/api/people/72/" "http://swapi.co/api/people/73/"
## [151] "http://swapi.co/api/people/74/" "http://swapi.co/api/people/47/"
## [153] "http://swapi.co/api/people/75/" "http://swapi.co/api/people/75/"
## [155] "http://swapi.co/api/people/76/" "http://swapi.co/api/people/77/"
## [157] "http://swapi.co/api/people/78/" "http://swapi.co/api/people/78/"
## [159] "http://swapi.co/api/people/79/" "http://swapi.co/api/people/80/"
## [161] "http://swapi.co/api/people/81/" "http://swapi.co/api/people/81/"
## [163] "http://swapi.co/api/people/82/" "http://swapi.co/api/people/82/"
## [165] "http://swapi.co/api/people/83/" "http://swapi.co/api/people/84/"
## [167] "http://swapi.co/api/people/85/" "http://swapi.co/api/people/86/"
## [169] "http://swapi.co/api/people/87/" "http://swapi.co/api/people/88/"
## [171] "http://swapi.co/api/people/35/" "http://swapi.co/api/people/35/"
## [173] "http://swapi.co/api/people/35/"
##   [1] "http://swapi.co/api/films/6/" "http://swapi.co/api/films/3/"
##   [3] "http://swapi.co/api/films/2/" "http://swapi.co/api/films/1/"
##   [5] "http://swapi.co/api/films/7/" "http://swapi.co/api/films/5/"
##   [7] "http://swapi.co/api/films/4/" "http://swapi.co/api/films/6/"
##   [9] "http://swapi.co/api/films/3/" "http://swapi.co/api/films/2/"
##  [11] "http://swapi.co/api/films/1/" "http://swapi.co/api/films/5/"
##  [13] "http://swapi.co/api/films/4/" "http://swapi.co/api/films/6/"
##  [15] "http://swapi.co/api/films/3/" "http://swapi.co/api/films/2/"
##  [17] "http://swapi.co/api/films/1/" "http://swapi.co/api/films/7/"
##  [19] "http://swapi.co/api/films/6/" "http://swapi.co/api/films/3/"
##  [21] "http://swapi.co/api/films/2/" "http://swapi.co/api/films/1/"
##  [23] "http://swapi.co/api/films/6/" "http://swapi.co/api/films/3/"
##  [25] "http://swapi.co/api/films/2/" "http://swapi.co/api/films/1/"
##  [27] "http://swapi.co/api/films/7/" "http://swapi.co/api/films/5/"
##  [29] "http://swapi.co/api/films/6/" "http://swapi.co/api/films/1/"
##  [31] "http://swapi.co/api/films/5/" "http://swapi.co/api/films/6/"
##  [33] "http://swapi.co/api/films/1/" "http://swapi.co/api/films/1/"
##  [35] "http://swapi.co/api/films/1/" "http://swapi.co/api/films/5/"
##  [37] "http://swapi.co/api/films/4/" "http://swapi.co/api/films/6/"
##  [39] "http://swapi.co/api/films/3/" "http://swapi.co/api/films/2/"
##  [41] "http://swapi.co/api/films/1/" "http://swapi.co/api/films/5/"
##  [43] "http://swapi.co/api/films/4/" "http://swapi.co/api/films/6/"
##  [45] "http://swapi.co/api/films/6/" "http://swapi.co/api/films/1/"
##  [47] "http://swapi.co/api/films/6/" "http://swapi.co/api/films/3/"
##  [49] "http://swapi.co/api/films/2/" "http://swapi.co/api/films/1/"
##  [51] "http://swapi.co/api/films/7/" "http://swapi.co/api/films/3/"
##  [53] "http://swapi.co/api/films/2/" "http://swapi.co/api/films/1/"
##  [55] "http://swapi.co/api/films/7/" "http://swapi.co/api/films/1/"
##  [57] "http://swapi.co/api/films/4/" "http://swapi.co/api/films/3/"
##  [59] "http://swapi.co/api/films/1/" "http://swapi.co/api/films/3/"
##  [61] "http://swapi.co/api/films/2/" "http://swapi.co/api/films/1/"
##  [63] "http://swapi.co/api/films/1/" "http://swapi.co/api/films/5/"
##  [65] "http://swapi.co/api/films/4/" "http://swapi.co/api/films/6/"
##  [67] "http://swapi.co/api/films/3/" "http://swapi.co/api/films/2/"
##  [69] "http://swapi.co/api/films/5/" "http://swapi.co/api/films/4/"
##  [71] "http://swapi.co/api/films/6/" "http://swapi.co/api/films/3/"
##  [73] "http://swapi.co/api/films/2/" "http://swapi.co/api/films/5/"
##  [75] "http://swapi.co/api/films/3/" "http://swapi.co/api/films/2/"
##  [77] "http://swapi.co/api/films/2/" "http://swapi.co/api/films/2/"
##  [79] "http://swapi.co/api/films/3/" "http://swapi.co/api/films/2/"
##  [81] "http://swapi.co/api/films/2/" "http://swapi.co/api/films/3/"
##  [83] "http://swapi.co/api/films/7/" "http://swapi.co/api/films/3/"
##  [85] "http://swapi.co/api/films/3/" "http://swapi.co/api/films/3/"
##  [87] "http://swapi.co/api/films/3/" "http://swapi.co/api/films/4/"
##  [89] "http://swapi.co/api/films/5/" "http://swapi.co/api/films/4/"
##  [91] "http://swapi.co/api/films/6/" "http://swapi.co/api/films/4/"
##  [93] "http://swapi.co/api/films/5/" "http://swapi.co/api/films/4/"
##  [95] "http://swapi.co/api/films/4/" "http://swapi.co/api/films/4/"
##  [97] "http://swapi.co/api/films/4/" "http://swapi.co/api/films/5/"
##  [99] "http://swapi.co/api/films/4/" "http://swapi.co/api/films/4/"
## [101] "http://swapi.co/api/films/4/" "http://swapi.co/api/films/5/"
## [103] "http://swapi.co/api/films/4/" "http://swapi.co/api/films/4/"
## [105] "http://swapi.co/api/films/3/" "http://swapi.co/api/films/5/"
## [107] "http://swapi.co/api/films/4/" "http://swapi.co/api/films/6/"
## [109] "http://swapi.co/api/films/4/" "http://swapi.co/api/films/4/"
## [111] "http://swapi.co/api/films/4/" "http://swapi.co/api/films/5/"
## [113] "http://swapi.co/api/films/4/" "http://swapi.co/api/films/6/"
## [115] "http://swapi.co/api/films/5/" "http://swapi.co/api/films/4/"
## [117] "http://swapi.co/api/films/6/" "http://swapi.co/api/films/5/"
## [119] "http://swapi.co/api/films/4/" "http://swapi.co/api/films/6/"
## [121] "http://swapi.co/api/films/4/" "http://swapi.co/api/films/6/"
## [123] "http://swapi.co/api/films/4/" "http://swapi.co/api/films/6/"
## [125] "http://swapi.co/api/films/4/" "http://swapi.co/api/films/6/"
## [127] "http://swapi.co/api/films/4/" "http://swapi.co/api/films/5/"
## [129] "http://swapi.co/api/films/4/" "http://swapi.co/api/films/6/"
## [131] "http://swapi.co/api/films/5/" "http://swapi.co/api/films/4/"
## [133] "http://swapi.co/api/films/5/" "http://swapi.co/api/films/5/"
## [135] "http://swapi.co/api/films/5/" "http://swapi.co/api/films/5/"
## [137] "http://swapi.co/api/films/6/" "http://swapi.co/api/films/5/"
## [139] "http://swapi.co/api/films/6/" "http://swapi.co/api/films/5/"
## [141] "http://swapi.co/api/films/5/" "http://swapi.co/api/films/5/"
## [143] "http://swapi.co/api/films/6/" "http://swapi.co/api/films/5/"
## [145] "http://swapi.co/api/films/6/" "http://swapi.co/api/films/5/"
## [147] "http://swapi.co/api/films/5/" "http://swapi.co/api/films/5/"
## [149] "http://swapi.co/api/films/5/" "http://swapi.co/api/films/5/"
## [151] "http://swapi.co/api/films/5/" "http://swapi.co/api/films/4/"
## [153] "http://swapi.co/api/films/5/" "http://swapi.co/api/films/6/"
## [155] "http://swapi.co/api/films/5/" "http://swapi.co/api/films/5/"
## [157] "http://swapi.co/api/films/5/" "http://swapi.co/api/films/6/"
## [159] "http://swapi.co/api/films/6/" "http://swapi.co/api/films/6/"
## [161] "http://swapi.co/api/films/6/" "http://swapi.co/api/films/1/"
## [163] "http://swapi.co/api/films/5/" "http://swapi.co/api/films/6/"
## [165] "http://swapi.co/api/films/6/" "http://swapi.co/api/films/7/"
## [167] "http://swapi.co/api/films/7/" "http://swapi.co/api/films/7/"
## [169] "http://swapi.co/api/films/7/" "http://swapi.co/api/films/7/"
## [171] "http://swapi.co/api/films/5/" "http://swapi.co/api/films/4/"
## [173] "http://swapi.co/api/films/6/"

Good work! Now you can use walk() to make your outputs cleaner and more human-readable.

3.3.2 walk() for printing cleaner list outputs

Now you will try one more use of walk(), specifically creating plots using walk(). In the previous exercise, you printed some lists, and you saw that printing lists is much cleaner using walk() than using the base R way. You can also use walk() to display multiple plots sequentially.

Here, use your map() knowledge along with ggplot2 functions to create a graph for the first ten elements of gap_split and then display each graph with walk().

Load the gap_split dataset.

# Load the gap_split data
data(gap_split)

map2() over the first 10 elements of gap_split, and the first 10 names of gap_split.

# Map over the first 10 elements of gap_split
plots <- map2(gap_split[1:10], 
              names(gap_split[1:10]), 
              ~ ggplot(.x, aes(year, lifeExp)) + 
                geom_line() +
                labs(title = .y))

Then walk() over the new plots object and supply print() as an argument to print all plots.

# Object name, then function name
walk(plots, print)

Good work! Now you can print out multiple plots easily using walk().

4 Problem solving with purrr

Now that you have the building blocks, we will start tackling some more complex data problems with purrr.

4.1 Using purrr in your workflow

4.1.1 Name review

Now, you’ll quickly review how to check if a list has names, and how to pull out a specific element from a list. Remember, you can use the names() function to see if a list is named. There are several ways to extract a named element from a list, but the key difference when working with dataframes is to remember the [[double bracket]] syntax.

Load the gh_users data.

# Load the data
data(gh_users)

Examine the names of gh_users.

# Check if data has names
names(gh_users)

## NULL

Extract the names for each element of gh_users.

# Map over name element of list
map(gh_users, ~.x[["name"]])

## [[1]]
## [1] "Gábor Csárdi"
## 
## [[2]]
## [1] "Jennifer (Jenny) Bryan"
## 
## [[3]]
## [1] "Jeff L."
## 
## [[4]]
## [1] "Julia Silge"
## 
## [[5]]
## [1] "Thomas J. Leeper"
## 
## [[6]]
## [1] "Maëlle Salmon"

Good work, now we have refreshed the basics of named lists, we can dive into our next task.

4.1.2 Setting names

Setting list names makes working with lists much easier in many scenarios; it makes the code easier to read, which is especially important when reviewing code weeks or months later.

Here you are going to work with the gh_repos and gh_users datasets and set their names in two different ways. The two methods will give the same result: a list with named elements.

Set the names on gh_users using the “name” element and use the map_*() function that outputs a character vector.

# Name gh_users with the names of the users
gh_users_named <- gh_users %>% 
    set_names(map_chr(gh_users, "name"))

Explore the structure of gh_repos to see where the owner info is stored.

# Check gh_repos structure
#str(gh_repos)

Set the names of a new list gh_repos_named based on the login of the owner of the repo, using the set_names() and map_*() functions.

# Name gh_repos with the names of the repo owner
gh_repos_named <- gh_repos %>% 
    map_chr(~ .[[1]]$owner$login) %>% 
    set_names(gh_repos, .)

Good work! Sometimes list naming is tricky but purrr makes it simpler by easily extracting the element we want to use as the names.

4.1.3 Asking questions from a list

One of the great things about purrr is you can easily move from having a question about the data to an answer, with just a few lines of code. Here you are going to use the gh_users data to ask three questions:

Which user joined GitHub first?
Are all the repositories user-owned, rather than organization-owned?
Which user has the most public repositories?

In this exercise, your map_*() knowledge is really tested, so make sure to reflect on all the different flavors of map_*() and how they should be used.

Name gh_users with the “name” element and sort the “created_at” element to determine who joined GitHub first.

# Determine who joined github first
map_chr(gh_users, ~.x[["created_at"]]) %>%
      set_names(map_chr(gh_users, "name")) %>%
    sort()

## Jennifer (Jenny) Bryan           Gábor Csárdi                Jeff L. 
## "2011-02-03T22:37:41Z" "2011-03-09T17:29:25Z" "2012-03-24T18:16:43Z" 
##       Thomas J. Leeper          Maëlle Salmon            Julia Silge 
## "2013-02-07T21:07:00Z" "2014-08-05T08:10:04Z" "2015-05-19T02:51:23Z"

Output a vector that returns TRUE for each element where the “type” is “USER”.

# Determine user versus organization
map_lgl(gh_users, ~.x[["type"]] == "User")

## [1] TRUE TRUE TRUE TRUE TRUE TRUE

Output a named numeric vector of the number of “public_repos”.

# Determine who has the most public repositories
map_int(gh_users, ~.x[["public_repos"]]) %>%
      set_names(map_chr(gh_users, "name")) %>%
    sort()

##            Julia Silge          Maëlle Salmon           Gábor Csárdi 
##                     26                     31                     52 
##                Jeff L.       Thomas J. Leeper Jennifer (Jenny) Bryan 
##                     67                     99                    168

Good work! Now you can use functions you already know to ask any question of your data in just a few lines of code.

4.2 Even more complex problems

Questions about gh_repos

You’re going to use gh_repos again, a list where each element is information about a GitHub repository. Here you will use map() and map_dbl() to answer the question:

Which repository is the largest?’

GitHub repository size is measured in megabytes. This information could be useful to document if you are working with a list based dataset that changes over time, and need to be able to pull out information, like the largest repository, in the most recent dataset.

map() over gh_repos.
map_dbl() over the `“size” element.
Then map() to determine which repo is the largest.

# Map over gh_repos to generate numeric output
map(gh_repos,
    ~map_dbl(.x, 
             ~.x[["size"]])) %>%
    # Grab the largest element
    map(~max(.x))

## [[1]]
## [1] 39461
## 
## [[2]]
## [1] 96325
## 
## [[3]]
## [1] 374812
## 
## [[4]]
## [1] 24070
## 
## [[5]]
## [1] 558176
## 
## [[6]]
## [1] 76455

Good work! You’re gaining great skills to be able to answer questions in a reproducible way with your datasets.

4.3 Graphs in purrr

4.3.1 ggplot() refresher

You’ve already been introduced to the package ggplot2 in the prerequisite for this course, but let’s do a quick refresher.

geom_point() makes scatterplots
geom_histogram() makes histograms

In this exercise, you are going to use a dataframe created from the gh_users dataset, called gh_users_df that has two columns; one for the number of public repositories a user has and another for how many followers that user has. Each row is a different user. Then you will make it into a scatter plot, a plot where the data are displayed with points.

Create a scatterplot with public_repos on the x axis and followers on the y axis.

gh_users_df=tribble(~public_repos, ~followers,
52,       303,
168,       780,
67,      3958,
26,       115,
99,       213,
31,        34)
# Scatter plot of public repos and followers
ggplot(data = gh_users_df, 
       aes(x = public_repos, y = followers))+
    geom_point()

Create a histogram of followers by piping in gh_users_df.

# Histogram of followers    
gh_users_df %>%
    ggplot(aes(x = followers))+
        geom_histogram()

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Good work! Isn’t making plots fun? Now let’s dive into how purrr can help make more of them!

4.3.2 purrr and scatterplots

Since ggplot() does not accept lists as an input, it can be paired up with purrr to go from a list to a dataframe to a ggplot() graph in just a few lines of code.

You will continue to work with the gh_users data for this exercise. You will use a map_*() function to pull out a few of the named elements and transform them into the correct datatype. Then create a scatterplot that compares the user’s number of followers to the user’s number of public repositories.

map() over gh_users, use the map_*() function that creates a dataframe, with four columns, named “login”, “name”, “followers” and “public_repos”.
Pipe that dataframe into a scatterplot, where the x axis is followers and y is public_repos.

# Create a dataframe with four columns
map_df(gh_users, `[`, 
       c("login","name","followers","public_repos")) %>%
  # Plot followers by public_repos
  ggplot(., 
         aes(x = followers, y = public_repos)) + 
      # Create scatter plots
      geom_point()

Good work! Now you can go from list to plot using a tidy workflow!

4.3.3 purrr and histograms

Now you’re going to put together everything you’ve learned, starting with two different lists, which will be turned into a faceted histogram. You’re going to work again with the Stars Wars data from the sw_films and sw_people datasets to answer a question:

What is the distribution of heights of characters in each of the Star Wars films?

Different movies take place on different sets of planets, so you might expect to see different distributions of heights from the characters. Your first task is to transform the two datasets into dataframes since ggplot() requires a dataframe input. Then you will join them together, and plot the result, a histogram with a different facet, or subplot, for each film.

Create a dataframe with the “title” of each film, and the “characters” from each film in the sw_films dataset.

# Turn data into correct dataframe format
film_by_character <- tibble(filmtitle = map_chr(sw_films, "title")) %>%
    mutate(filmtitle, characters = map(sw_films, "characters")) %>%
    unnest()

## Warning: `cols` is now required when using unnest().
## Please use `cols = c(characters)`

Create a dataframe with the “height”, “mass”, “name”, and “url” elements from sw_people.

# Pull out elements from sw_people
sw_characters <- map_df(sw_people, `[`, c("height","mass","name","url"))

Join the two dataframes together using the “characters” and “url” keys.

# Join our two new objects
character_data <- inner_join(film_by_character, sw_characters, by = c("characters" = "url")) %>%
    # Make sure the columns are numbers
    mutate(height = as.numeric(height), mass = as.numeric(mass))

## Warning in mask$eval_all_mutate(quo): NAs introduced by coercion

## Warning in mask$eval_all_mutate(quo): NAs introduced by coercion

Create a ggplot() histogram with x = height, faceted by filmtitle.

# Plot the heights, faceted by film title
ggplot(character_data, aes(x = height)) +
  geom_histogram(stat = "count") +
  facet_wrap(~ filmtitle)

## Warning: Ignoring unknown parameters: binwidth, bins, pad

## Warning: Removed 6 rows containing non-finite values (stat_count).

Good work! Now you’ve learned all the basics of how you can use purrr to make tasks that require iteration and working with lists, more manageable, and human readable!