Functional Programming with purrr

DataCamp


Course Description

Lists can be difficult to both understand and manipulate, but they can pack a ton of information and are very powerful. In this course, you will learn to easily extract, summarize, and manipulate lists and how to export the data to your desired object, be it another list, a vector, or even something else! Throughout the course, you will work with the purrr package and a variety of datasets from the repurrrsive package, including data from Star Wars and Wes Anderson films and data collected about GitHub users and GitHub repos. Following this course, your list skills will be purrrfect!

1 Simplifying with purrr

Iteration is a powerful way to make the computer do the work for you. It can also be an area of coding where it is easy to make lots of typos and simple mistakes. The purrr package helps simplify iteration so you can focus on the next step, instead of finding typos.

1.1 The power of iteration

1.1.1 Introduction to iteration

Imagine that you need to read in hundreds of files with a similar structure and perform an action on them. You don’t want to write hundreds of repetitive lines of code to read in all the files or to perform the action. Instead, you want to iterate over them. Iteration is the process of doing the same process to multiple inputs. Being able to iterate is important to make your code efficient, and is powerful when working with lists.

For this exercise, the names of 16 CSV files have been loaded into a list called files. In your own work, you could use the list.files() function to create this list. The readr library is also already loaded.

This course touches on a lot of concepts you may have forgotten, so if you ever need a quick refresher, download the tidyverse Cheat Sheet and keep it handy!

  • Create a for loop, which iterates over the files list, and gives each element as an input for readr::read_csv(), which is another way of saying the read_csv() function from the readr package.
  • # Initialize list
    all_files <- list()
  • Then use that input, so the result is a list where each CSV file has been read into a separate element of the newly created all_files list.
  • files=list.files("/Users/apple/Documents/Rstudio/DataCamp/FoundationsofFunctionalProgrammingwithpurrr/simulated_data_from_1990_to_2005", pattern = "*.csv")
    files=paste("/Users/apple/Documents/Rstudio/DataCamp/FoundationsofFunctionalProgrammingwithpurrr/simulated_data_from_1990_to_2005/",files,sep="")
    # For loop to read files into a list
    for(i in seq_along(files)){
      all_files[[i]] <- read_csv(files[[i]])
    }
    head(all_files)
    ## [[1]]
    ## # A tibble: 200 × 3
    ##    years     a     b
    ##    <dbl> <dbl> <dbl>
    ##  1  1990  5.25  197.
    ##  2  1990  8.17  192.
    ##  3  1990  6.49  192.
    ##  4  1990  5.82  195.
    ##  5  1990  5.54  201.
    ##  6  1990  6.65  196.
    ##  7  1990 10.4   208.
    ##  8  1990  1.66  183.
    ##  9  1990  2.78  174.
    ## 10  1990  8.34  198.
    ## # … with 190 more rows
    ## 
    ## [[2]]
    ## # A tibble: 200 × 3
    ##    years     a     b
    ##    <dbl> <dbl> <dbl>
    ##  1  1991  3.70  197.
    ##  2  1991  5.37  187.
    ##  3  1991  7.05  186.
    ##  4  1991  1.97  207.
    ##  5  1991  8.05  217.
    ##  6  1991  1.97  213.
    ##  7  1991  5.33  195.
    ##  8  1991  4.32  204.
    ##  9  1991  4.46  177.
    ## 10  1991  4.63  222.
    ## # … with 190 more rows
    ## 
    ## [[3]]
    ## # A tibble: 200 × 3
    ##    years     a     b
    ##    <dbl> <dbl> <dbl>
    ##  1  1992  8.64  178.
    ##  2  1992  3.70  207.
    ##  3  1992  4.79  206.
    ##  4  1992  9.22  194.
    ##  5  1992  6.49  202.
    ##  6  1992  4.58  197.
    ##  7  1992  5.06  174.
    ##  8  1992  2.20  216.
    ##  9  1992  4.72  177.
    ## 10  1992 10.0   188.
    ## # … with 190 more rows
    ## 
    ## [[4]]
    ## # A tibble: 200 × 3
    ##    years     a     b
    ##    <dbl> <dbl> <dbl>
    ##  1  1993  2.34  204.
    ##  2  1993  5.44  167.
    ##  3  1993  6.86  213.
    ##  4  1993  5.70  197.
    ##  5  1993  2.78  193.
    ##  6  1993  3.24  164.
    ##  7  1993  5.59  234.
    ##  8  1993  3.02  183.
    ##  9  1993  4.60  182.
    ## 10  1993  7.56  205.
    ## # … with 190 more rows
    ## 
    ## [[5]]
    ## # A tibble: 200 × 3
    ##    years     a     b
    ##    <dbl> <dbl> <dbl>
    ##  1  1994  3.40  197.
    ##  2  1994  4.29  214.
    ##  3  1994  6.91  175.
    ##  4  1994  3.11  181.
    ##  5  1994  5.50  185.
    ##  6  1994  3.59  211.
    ##  7  1994  2.97  189.
    ##  8  1994  7.40  171.
    ##  9  1994  9.66  198.
    ## 10  1994  8.19  221.
    ## # … with 190 more rows
    ## 
    ## [[6]]
    ## # A tibble: 200 × 3
    ##    years     a     b
    ##    <dbl> <dbl> <dbl>
    ##  1  1995  5.12  197.
    ##  2  1995  4.18  219.
    ##  3  1995  3.70  186.
    ##  4  1995  4.46  204.
    ##  5  1995  7.48  209.
    ##  6  1995  8.38  204.
    ##  7  1995  4.51  202.
    ##  8  1995  5.68  208.
    ##  9  1995  5.24  211.
    ## 10  1995  3.04  212.
    ## # … with 190 more rows
  • Output the size of the all_files list.
  • # Output size of list object
    length(all_files)
    ## [1] 16

    Good work! Now let’s see how to do it more easily with purrr.

    1.1.2 Iteration with purrr

    You’ve made a great for loop, but it uses a lot of code to do something as simple as input a series of files into a list. This is where purrr comes in. We can do the same thing as a for loop in one line of code with purrr::map(). The function map() iterates over a list, and uses another function that can specified with the .f argument.

    map() takes two arguments:

    • The first is the list over that will be iterated over
    • The second is a function that will act on each element of the list

    The readr library is already loaded.

  • Load the purrr library (note the 3 Rs).
  • # Load purrr library
    library(purrr)
  • Replicate the for loop from the last exercise using map() instead. Use the same list files and the same function readr::read_csv().
  • # Use map to iterate
    all_files_purrr <- map(files, read_csv)
    head(all_files_purrr)
    ## [[1]]
    ## # A tibble: 200 × 3
    ##    years     a     b
    ##    <dbl> <dbl> <dbl>
    ##  1  1990  5.25  197.
    ##  2  1990  8.17  192.
    ##  3  1990  6.49  192.
    ##  4  1990  5.82  195.
    ##  5  1990  5.54  201.
    ##  6  1990  6.65  196.
    ##  7  1990 10.4   208.
    ##  8  1990  1.66  183.
    ##  9  1990  2.78  174.
    ## 10  1990  8.34  198.
    ## # … with 190 more rows
    ## 
    ## [[2]]
    ## # A tibble: 200 × 3
    ##    years     a     b
    ##    <dbl> <dbl> <dbl>
    ##  1  1991  3.70  197.
    ##  2  1991  5.37  187.
    ##  3  1991  7.05  186.
    ##  4  1991  1.97  207.
    ##  5  1991  8.05  217.
    ##  6  1991  1.97  213.
    ##  7  1991  5.33  195.
    ##  8  1991  4.32  204.
    ##  9  1991  4.46  177.
    ## 10  1991  4.63  222.
    ## # … with 190 more rows
    ## 
    ## [[3]]
    ## # A tibble: 200 × 3
    ##    years     a     b
    ##    <dbl> <dbl> <dbl>
    ##  1  1992  8.64  178.
    ##  2  1992  3.70  207.
    ##  3  1992  4.79  206.
    ##  4  1992  9.22  194.
    ##  5  1992  6.49  202.
    ##  6  1992  4.58  197.
    ##  7  1992  5.06  174.
    ##  8  1992  2.20  216.
    ##  9  1992  4.72  177.
    ## 10  1992 10.0   188.
    ## # … with 190 more rows
    ## 
    ## [[4]]
    ## # A tibble: 200 × 3
    ##    years     a     b
    ##    <dbl> <dbl> <dbl>
    ##  1  1993  2.34  204.
    ##  2  1993  5.44  167.
    ##  3  1993  6.86  213.
    ##  4  1993  5.70  197.
    ##  5  1993  2.78  193.
    ##  6  1993  3.24  164.
    ##  7  1993  5.59  234.
    ##  8  1993  3.02  183.
    ##  9  1993  4.60  182.
    ## 10  1993  7.56  205.
    ## # … with 190 more rows
    ## 
    ## [[5]]
    ## # A tibble: 200 × 3
    ##    years     a     b
    ##    <dbl> <dbl> <dbl>
    ##  1  1994  3.40  197.
    ##  2  1994  4.29  214.
    ##  3  1994  6.91  175.
    ##  4  1994  3.11  181.
    ##  5  1994  5.50  185.
    ##  6  1994  3.59  211.
    ##  7  1994  2.97  189.
    ##  8  1994  7.40  171.
    ##  9  1994  9.66  198.
    ## 10  1994  8.19  221.
    ## # … with 190 more rows
    ## 
    ## [[6]]
    ## # A tibble: 200 × 3
    ##    years     a     b
    ##    <dbl> <dbl> <dbl>
    ##  1  1995  5.12  197.
    ##  2  1995  4.18  219.
    ##  3  1995  3.70  186.
    ##  4  1995  4.46  204.
    ##  5  1995  7.48  209.
    ##  6  1995  8.38  204.
    ##  7  1995  4.51  202.
    ##  8  1995  5.68  208.
    ##  9  1995  5.24  211.
    ## 10  1995  3.04  212.
    ## # … with 190 more rows
  • Check the length of all_files_purrr.
  • # Output size of list object
    length(all_files_purrr)
    ## [1] 16

    Nice! You can see from the output here that 16 different files have been read into all_files_purrr.

    1.1.3 More iteration with for loops

    Iteration isn’t just for reading in files though; iteration can be used to perform other actions on objects. First, you will try iterating with a for loop.

    You’re going to change each element of a list into a numeric data type and then put it back into the same element in the same list.

    For this exercise, you will iterate using a for loop that takes list_of_df, which is a list of character vector, but the characters are actually numbers! You need to change the character vectors to numeric so that you can perform mathematical operations on them; you can use the base R function, as.numeric() to do that.

  • Check the class type of the first element of list_of_df.
  • list_of_df=lapply(1:10,function(x){1:4})
    # Check the class type of the first element
    class(list_of_df[[1]])
    ## [1] "integer"
  • Build a for loop that takes each element of list_of_df, changes it into numeric data with as.numeric(), and adds it back into the same element of list_of_df.
  • # Change each element from a character to a number
    for(i in seq_along(list_of_df)){
        list_of_df[[i]] <- as.numeric(list_of_df[[i]])
    }
  • Check the class type of the first element of list_of_df.
  • # Check the class type of the first element
    class(list_of_df[[1]])
    ## [1] "numeric"
  • Print list_of_df.
  • # Print out the list
    head(list_of_df)
    ## [[1]]
    ## [1] 1 2 3 4
    ## 
    ## [[2]]
    ## [1] 1 2 3 4
    ## 
    ## [[3]]
    ## [1] 1 2 3 4
    ## 
    ## [[4]]
    ## [1] 1 2 3 4
    ## 
    ## [[5]]
    ## [1] 1 2 3 4
    ## 
    ## [[6]]
    ## [1] 1 2 3 4

    Nice! You can see from the output that we have a list of numbers now!

    1.1.4 More iteration with purrr

    Now you will change each element of a list into a numeric data type and then put it back into the same element in the same list, but instead of using a for loop, you’ll use map().

    You can use the purrr function map() to more easily loop over a list, and turn the characters into numbers. Instead of having to build a whole for loop, you can use one line of code.

  • Check the class of the first element of list_of_df.
  • # Check the class type of the first element
    class(list_of_df[[1]])  
    ## [1] "numeric"
  • Use map() to iterate over list_of_df and change each element of the list into numeric data.
  • # Change each character element to a number
    list_of_df <- map(list_of_df, as.numeric)
  • Check the class of the first element of list_of_df.
  • # Check the class type of the first element again
    class(list_of_df[[1]]) 
    ## [1] "numeric"
  • Print out list_of_df.
  • # Print out the list
    head(list_of_df)
    ## [[1]]
    ## [1] 1 2 3 4
    ## 
    ## [[2]]
    ## [1] 1 2 3 4
    ## 
    ## [[3]]
    ## [1] 1 2 3 4
    ## 
    ## [[4]]
    ## [1] 1 2 3 4
    ## 
    ## [[5]]
    ## [1] 1 2 3 4
    ## 
    ## [[6]]
    ## [1] 1 2 3 4

    Good work! Now you can fix class type issues in your lists!

    1.2 Subsetting lists

    1.2.1 Subsetting lists

    Often when working in R, you’ll use dataframes or vectors. Another kind of R object is a list. While lists can be complicated, lists are also incredibly powerful. Lists are like Hermione Granger’s bag of holding (from Harry Potter); they can hold a wide variety of things. The contents of a list don’t have to be the same data type, and as long as you know how it’s organized, you can grab out what you need by subsetting.

    Both named and unnamed lists can be subset using double square brackets [[ ]] list this: listname[[ index ]]

    If a list is named, you can also use $ for subsetting. The syntax list$elementname pulls out the named element from the list. Like any other kind of object in R, you can use the str() to determine the structure of the list.

  • Load the repurrrsive package.
  • # Load repurrrsive package, to get access to the wesanderson dataset
    library(repurrrsive)
  • Load the wesanderson dataset.
  • # Load wesanderson dataset
    data(wesanderson)
  • Examine the structure of the first element in wesanderson.
  • # Get structure of first element in wesanderson
    str(wesanderson[[1]])
    ##  chr [1:4] "#F1BB7B" "#FD6467" "#5B1A18" "#D67236"
  • Examine the structure of the GrandBudapest element in wesanderson.
  • # Get structure of GrandBudapest element in wesanderson
    str(wesanderson$GrandBudapest)
    ##  chr [1:4] "#F1BB7B" "#FD6467" "#5B1A18" "#D67236"

    Good work! Now you can subset and determine the structure of each part of a named or unnamed list!

    1.2.2 Subsetting list elements

    You can also subset within list elements using bracket notation like this: ListName$ElementName[VectorNumber]. If a list element is a dataframe, you can pull out a column like this: ListName$ElementName$ColumnName or ListName[[1]][,1].

    In this exercise, you’ll examine the wesanderson and sw_films datasets from the repurrrsive package. wesanderson contains color palettes for each of Wes Anderson’s movies. These colors are recorded in hexadecimal, that is, a # followed by six digits that indicate a particular color. Here, you will be using two ways of pulling out a particular color hexadecimal.

    sw_films contains information about the films in the Star Wars franchise, such as title, director, producer, etc. You’ll use subsetting to explore this dataset.

    Subset the third color from the first element of wesanderson. Then subset the fourth color from GrandBudapest.

    # Third element of the first wesanderson vector
    wesanderson[[1]][3]
    ## [1] "#5B1A18"
    # Fourth element of the GrandBudapest wesanderson vector
    wesanderson$GrandBudapest[4]
    ## [1] "#D67236"

    Subset the first element from sw_films. Then subset the title element from the first element.

    # Subset the first element of the sw_films data
    sw_films[[1]]
    ## $title
    ## [1] "A New Hope"
    ## 
    ## $episode_id
    ## [1] 4
    ## 
    ## $opening_crawl
    ## [1] "It is a period of civil war.\r\nRebel spaceships, striking\r\nfrom a hidden base, have won\r\ntheir first victory against\r\nthe evil Galactic Empire.\r\n\r\nDuring the battle, Rebel\r\nspies managed to steal secret\r\nplans to the Empire's\r\nultimate weapon, the DEATH\r\nSTAR, an armored space\r\nstation with enough power\r\nto destroy an entire planet.\r\n\r\nPursued by the Empire's\r\nsinister agents, Princess\r\nLeia races home aboard her\r\nstarship, custodian of the\r\nstolen plans that can save her\r\npeople and restore\r\nfreedom to the galaxy...."
    ## 
    ## $director
    ## [1] "George Lucas"
    ## 
    ## $producer
    ## [1] "Gary Kurtz, Rick McCallum"
    ## 
    ## $release_date
    ## [1] "1977-05-25"
    ## 
    ## $characters
    ##  [1] "http://swapi.co/api/people/1/"  "http://swapi.co/api/people/2/" 
    ##  [3] "http://swapi.co/api/people/3/"  "http://swapi.co/api/people/4/" 
    ##  [5] "http://swapi.co/api/people/5/"  "http://swapi.co/api/people/6/" 
    ##  [7] "http://swapi.co/api/people/7/"  "http://swapi.co/api/people/8/" 
    ##  [9] "http://swapi.co/api/people/9/"  "http://swapi.co/api/people/10/"
    ## [11] "http://swapi.co/api/people/12/" "http://swapi.co/api/people/13/"
    ## [13] "http://swapi.co/api/people/14/" "http://swapi.co/api/people/15/"
    ## [15] "http://swapi.co/api/people/16/" "http://swapi.co/api/people/18/"
    ## [17] "http://swapi.co/api/people/19/" "http://swapi.co/api/people/81/"
    ## 
    ## $planets
    ## [1] "http://swapi.co/api/planets/2/" "http://swapi.co/api/planets/3/"
    ## [3] "http://swapi.co/api/planets/1/"
    ## 
    ## $starships
    ## [1] "http://swapi.co/api/starships/2/"  "http://swapi.co/api/starships/3/" 
    ## [3] "http://swapi.co/api/starships/5/"  "http://swapi.co/api/starships/9/" 
    ## [5] "http://swapi.co/api/starships/10/" "http://swapi.co/api/starships/11/"
    ## [7] "http://swapi.co/api/starships/12/" "http://swapi.co/api/starships/13/"
    ## 
    ## $vehicles
    ## [1] "http://swapi.co/api/vehicles/4/" "http://swapi.co/api/vehicles/6/"
    ## [3] "http://swapi.co/api/vehicles/7/" "http://swapi.co/api/vehicles/8/"
    ## 
    ## $species
    ## [1] "http://swapi.co/api/species/5/" "http://swapi.co/api/species/3/"
    ## [3] "http://swapi.co/api/species/2/" "http://swapi.co/api/species/1/"
    ## [5] "http://swapi.co/api/species/4/"
    ## 
    ## $created
    ## [1] "2014-12-10T14:23:31.880000Z"
    ## 
    ## $edited
    ## [1] "2015-04-11T09:46:52.774897Z"
    ## 
    ## $url
    ## [1] "http://swapi.co/api/films/1/"
    # Subset the first element of the sw_films data, the title column 
    sw_films[[1]]$title
    ## [1] "A New Hope"

    Great work, now you should be very comfortable subsetting lists!

    1.3 The many flavors of map()

    1.3.1 map() argument alternatives

    You can also use iteration to answer a question, like how long is each element in the wesanderson dataset. You can do this by feeding map() a function like length(). You can do this using the map(list, function) syntax and it works just fine. However, future exercises get more complex, you will need to learn how to do this second way, using:

    map(list, ~function(.x))

    This second way gives the same result as map(list, function). To specify how the list is used in the function, use the argument .x to denote where the list element goes inside the function. When you want to use .x to show where the element goes in the function, you need to put a ~ in front of the function in the second argument of map().

    Use map() on wesanderson and determine the length of each element in the “old” way.

    # Map over wesanderson to get the length of each element
    map(wesanderson, length)
    ## $GrandBudapest
    ## [1] 4
    ## 
    ## $Moonrise1
    ## [1] 4
    ## 
    ## $Royal1
    ## [1] 4
    ## 
    ## $Moonrise2
    ## [1] 4
    ## 
    ## $Cavalcanti
    ## [1] 5
    ## 
    ## $Royal2
    ## [1] 5
    ## 
    ## $GrandBudapest2
    ## [1] 4
    ## 
    ## $Moonrise3
    ## [1] 5
    ## 
    ## $Chevalier
    ## [1] 4
    ## 
    ## $Zissou
    ## [1] 5
    ## 
    ## $FantasticFox
    ## [1] 5
    ## 
    ## $Darjeeling
    ## [1] 5
    ## 
    ## $Rushmore
    ## [1] 5
    ## 
    ## $BottleRocket
    ## [1] 7
    ## 
    ## $Darjeeling2
    ## [1] 5

    Use map() on wesanderson and determine the length of each element again, but this time using map(list, ~function(.x)).

    # Map over wesanderson, and determine the length of each element
    map(wesanderson, ~length(.x))
    ## $GrandBudapest
    ## [1] 4
    ## 
    ## $Moonrise1
    ## [1] 4
    ## 
    ## $Royal1
    ## [1] 4
    ## 
    ## $Moonrise2
    ## [1] 4
    ## 
    ## $Cavalcanti
    ## [1] 5
    ## 
    ## $Royal2
    ## [1] 5
    ## 
    ## $GrandBudapest2
    ## [1] 4
    ## 
    ## $Moonrise3
    ## [1] 5
    ## 
    ## $Chevalier
    ## [1] 4
    ## 
    ## $Zissou
    ## [1] 5
    ## 
    ## $FantasticFox
    ## [1] 5
    ## 
    ## $Darjeeling
    ## [1] 5
    ## 
    ## $Rushmore
    ## [1] 5
    ## 
    ## $BottleRocket
    ## [1] 7
    ## 
    ## $Darjeeling2
    ## [1] 5

    Great Job! This new way of writing map_*() functions will come in handy in future exercises, so make a mental note of the ~ and the .x argument.

    1.3.2 map_*

    The map() function will return its output as a list. However, there are several different map() functions; you can use map_() functions to tell purrr the type of output you want. The in map_*() represents different R data types. For instance, you might want the output to be a vector of numbers so that we can put it inside a dataframe. So, unless you want something to be returned as a list, you need to determine what you want the output to be before you write your map() function.

    • Determine the length of each element of the wesanderson dataset using our original map() function. Examine the output.
    # Map over wesanderson, to determine the length of each element
    map(wesanderson, length)
    ## $GrandBudapest
    ## [1] 4
    ## 
    ## $Moonrise1
    ## [1] 4
    ## 
    ## $Royal1
    ## [1] 4
    ## 
    ## $Moonrise2
    ## [1] 4
    ## 
    ## $Cavalcanti
    ## [1] 5
    ## 
    ## $Royal2
    ## [1] 5
    ## 
    ## $GrandBudapest2
    ## [1] 4
    ## 
    ## $Moonrise3
    ## [1] 5
    ## 
    ## $Chevalier
    ## [1] 4
    ## 
    ## $Zissou
    ## [1] 5
    ## 
    ## $FantasticFox
    ## [1] 5
    ## 
    ## $Darjeeling
    ## [1] 5
    ## 
    ## $Rushmore
    ## [1] 5
    ## 
    ## $BottleRocket
    ## [1] 7
    ## 
    ## $Darjeeling2
    ## [1] 5
    • Create a dataframe that has the number of colors from each movie, using map_dbl(). The dbl means a double or a number that can have a decimal.
    # Create a numcolors column and fill with length of each wesanderson element
    data.frame(numcolors = map_dbl(wesanderson, ~length(.x)))
    ##                numcolors
    ## GrandBudapest          4
    ## Moonrise1              4
    ## Royal1                 4
    ## Moonrise2              4
    ## Cavalcanti             5
    ## Royal2                 5
    ## GrandBudapest2         4
    ## Moonrise3              5
    ## Chevalier              4
    ## Zissou                 5
    ## FantasticFox           5
    ## Darjeeling             5
    ## Rushmore               5
    ## BottleRocket           7
    ## Darjeeling2            5

    Good work! Notice how much cleaner the output was using map_dbl()! It’s always worth thinking through which map_*() function will get you where to need to go before coding it out. In our next chapter, we’ll dive into more complex uses of purrr.

    2 More complex iterations

    purrr is much more than a for loop; it works well with pipes, we can use it to run models and simulate data, and make nested loops!

    2.1 Working with unnamed lists

    2.1.1 Names & pipe refresher

    It is easy to determine if a list has names using names(). Understanding the named elements of a list can make working with the list elements easier because you can pull out the information you need by name, instead of searching for the correct numbered element.

    purrr is a part of the tidyverse, a system of packages designed to be used together, and used with pipes. Let’s do a quick refresh on how pipes work. A pipe %>% takes the output from the function that comes before it, and feeds it into the function that comes after the pipe as its first argument.

    function_before() %>% 
        function_after()
    

    You don’t need to use pipes when you use purrr functions, but for the purposes of these lessons, you will be.

    • Check to see if the sw_films list has named elements with pipes.
    # Use pipes to check for names in sw_films
    sw_films %>%
        names()
    ## NULL

    Good work! Now that you know how to check to see if a list has names in a tidy way, you’re ready to dive in.

    2.1.2 Setting names

    If you have an unnamed list, you can, of course, name each element. This can be very useful for being able to call out certain elements in a list, regardless of their order, especially if you are working with a list that may grow or change over time, or if you use the same code on several different lists. For instance, if you have a list that contains, a dataframe, a model, and a plot, being able to call out $plot instead of searching to figure out what numbered element of the plot, is much easier.

  • With a piped workflow:
    • Name each element of sw_films list and assign to a new list, sw_films_named.
    • Iterate over the title element.
  • # Set names so each element of the list is named for the film title
    sw_films_named <- sw_films %>% 
      set_names(map_chr(sw_films, "title"))
  • Check to make sure the new list has names.
  • # Check to see if the names worked/are correct
    names(sw_films_named)
    ## [1] "A New Hope"              "Attack of the Clones"   
    ## [3] "The Phantom Menace"      "Revenge of the Sith"    
    ## [5] "Return of the Jedi"      "The Empire Strikes Back"
    ## [7] "The Force Awakens"

    Good work! Naming lists makes working in purrr easier and more human-readable.

    2.1.3 Pipes in map()

    So you’ve refreshed your memory on how pipes can be used between functions. You can also use pipes on the inside of map() function to help you iterate a pipeline of tasks over a list of inputs.

    Here instead of using one of the repurrrsive datasets, you will be working with a list of numbers so that you can do a few mathematical operations.

  • Create a list that contains the values 1 through 10, each as a separate element.
  • # Create a list of values from 1 through 10
    numlist <- list(1,2,3,4,5,6,7,8,9,10)
  • Create a pipeline within one map() function that takes the sqrt() of each element, and then the sin() of each element.
  • # Iterate over the numlist 
    map(numlist, ~.x %>% sqrt() %>% sin()) %>% head()
    ## [[1]]
    ## [1] 0.841471
    ## 
    ## [[2]]
    ## [1] 0.9877659
    ## 
    ## [[3]]
    ## [1] 0.9870266
    ## 
    ## [[4]]
    ## [1] 0.9092974
    ## 
    ## [[5]]
    ## [1] 0.7867491
    ## 
    ## [[6]]
    ## [1] 0.6381576

    Good work! Using pipes inside of map() makes iterating over multiple functions easy.

    2.2 More map()

    2.2.1 Simulating Data with Purrr

    Often when trying to solve a problem with data we first need to build some simulated data to see if our idea is even possible. For example, you may want to test models with data that have known differences, to see if the models are working correctly.

    In this exercise, you will see how this works in purrr by simulating data for two populations, a and b, from the sites: “north”, “east”, and “west”. The two populations will be randomly drawn from a normal distribution, with different means and standard deviations.

  • Create a list of site names, “north”, “east”, and “west”.
  • # List of sites north, east, and west
    sites <- list("north","east","west")
  • Then use map() to create a list of dataframes with three columns, the first column is sites.
    • The second is population a, which has a mean of 5, a sample size n of 200, and an sd of (5/2).
    • The third is population b, which has a mean of 200, a sample size n of 200, and an sd of 15.
  • # Create a list of dataframes, each with a years, a, and b column
    list_of_df <-  map(sites,  
      ~data.frame(sites = .x,
                  a = rnorm(mean = 5,   n = 200, sd = (5/2)),
                  b = rnorm(mean = 200, n = 200, sd = 15)))
    
    map(list_of_df,~head(.x))
    ## [[1]]
    ##   sites         a        b
    ## 1 north  6.671339 197.9598
    ## 2 north 10.090051 212.0460
    ## 3 north  4.785466 185.1392
    ## 4 north  4.930145 233.7320
    ## 5 north  4.438924 205.4188
    ## 6 north -1.017154 212.5341
    ## 
    ## [[2]]
    ##   sites        a        b
    ## 1  east 7.099261 160.7708
    ## 2  east 0.112617 185.9312
    ## 3  east 3.535525 186.0787
    ## 4  east 2.920502 205.4623
    ## 5  east 6.971227 183.9190
    ## 6  east 6.388077 195.9712
    ## 
    ## [[3]]
    ##   sites        a        b
    ## 1  west 9.524335 186.3719
    ## 2  west 5.210117 199.9047
    ## 3  west 6.457159 201.0256
    ## 4  west 2.181551 215.0170
    ## 5  west 4.436093 195.9054
    ## 6  west 8.011461 189.4125

    Good work! Now you can simulate data with ease.

    2.2.2 Run a linear model

    You can use map() to do more than just take the square root of a number or simulate data. You can also use map() to loop over different inputs to run several models, each using the unique values of a given list element. You can also then iterate over the models you’ve run to create the model summaries and look at the results.

    The lists sites and list_of_df are preloaded.

    • Pipe list_of_df into map() along with the lm() linear model function, to compare a as the response and b as the predictor variable.
      • Use the syntax: lm(response ~ predictor, data = )
    • Then pipe the linear model output into map() and generate the summary() of each model.
    # Map over the models to look at the relationship of a vs b
    list_of_df %>%
        map(~ lm(a ~ b, data = .)) %>%
        map(summary)
    ## [[1]]
    ## 
    ## Call:
    ## lm(formula = a ~ b, data = .)
    ## 
    ## Residuals:
    ##     Min      1Q  Median      3Q     Max 
    ## -5.9401 -1.9836 -0.1301  1.6425  5.7177 
    ## 
    ## Coefficients:
    ##             Estimate Std. Error t value Pr(>|t|)   
    ## (Intercept)  6.86981    2.27909   3.014  0.00291 **
    ## b           -0.00916    0.01139  -0.804  0.42211   
    ## ---
    ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    ## 
    ## Residual standard error: 2.445 on 198 degrees of freedom
    ## Multiple R-squared:  0.003258,   Adjusted R-squared:  -0.001776 
    ## F-statistic: 0.6471 on 1 and 198 DF,  p-value: 0.4221
    ## 
    ## 
    ## [[2]]
    ## 
    ## Call:
    ## lm(formula = a ~ b, data = .)
    ## 
    ## Residuals:
    ##    Min     1Q Median     3Q    Max 
    ## -6.261 -1.462  0.050  1.651  6.573 
    ## 
    ## Coefficients:
    ##             Estimate Std. Error t value Pr(>|t|)  
    ## (Intercept) -0.73189    2.19967  -0.333   0.7397  
    ## b            0.02786    0.01104   2.524   0.0124 *
    ## ---
    ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    ## 
    ## Residual standard error: 2.479 on 198 degrees of freedom
    ## Multiple R-squared:  0.03117,    Adjusted R-squared:  0.02627 
    ## F-statistic:  6.37 on 1 and 198 DF,  p-value: 0.01239
    ## 
    ## 
    ## [[3]]
    ## 
    ## Call:
    ## lm(formula = a ~ b, data = .)
    ## 
    ## Residuals:
    ##     Min      1Q  Median      3Q     Max 
    ## -4.8803 -1.7816  0.1036  1.8021  5.6971 
    ## 
    ## Coefficients:
    ##               Estimate Std. Error t value Pr(>|t|)  
    ## (Intercept)  4.895e+00  2.294e+00   2.134   0.0341 *
    ## b           -9.945e-05  1.146e-02  -0.009   0.9931  
    ## ---
    ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    ## 
    ## Residual standard error: 2.407 on 198 degrees of freedom
    ## Multiple R-squared:  3.803e-07,  Adjusted R-squared:  -0.00505 
    ## F-statistic: 7.53e-05 on 1 and 198 DF,  p-value: 0.9931

    Good work! This will make running multiple models and summarizing their results much easier.

    2.2.3 map_chr()

    In this exercise, you’ll dive a bit deeper into the different map_() variants. The map() function always outputs a list. map_() outputs other kinds of information. Study the table below and make sure you’re clear on the type of output for each map_*() variant.

    map_*() Output
    map_chr() character vector
    map_lgl() logical vector [TRUE or FALSE]
    map_int() integer vector
    map_dbl() double vector
    • Compare the results of map() and map_chr() for the director named element sw_films.
    # Pull out the director element of sw_films in a list and character vector
    map(sw_films, ~.x[["director"]])
    ## [[1]]
    ## [1] "George Lucas"
    ## 
    ## [[2]]
    ## [1] "George Lucas"
    ## 
    ## [[3]]
    ## [1] "George Lucas"
    ## 
    ## [[4]]
    ## [1] "George Lucas"
    ## 
    ## [[5]]
    ## [1] "Richard Marquand"
    ## 
    ## [[6]]
    ## [1] "Irvin Kershner"
    ## 
    ## [[7]]
    ## [1] "J. J. Abrams"
    map_chr(sw_films, ~.x[["director"]])
    ## [1] "George Lucas"     "George Lucas"     "George Lucas"     "George Lucas"    
    ## [5] "Richard Marquand" "Irvin Kershner"   "J. J. Abrams"
    • Compare the map() and map_lgl() outputs on sw_films for director == George Lucas.
    # Compare outputs when checking if director is George Lucas
    map(sw_films, ~.x[["director"]] == "George Lucas")
    ## [[1]]
    ## [1] TRUE
    ## 
    ## [[2]]
    ## [1] TRUE
    ## 
    ## [[3]]
    ## [1] TRUE
    ## 
    ## [[4]]
    ## [1] TRUE
    ## 
    ## [[5]]
    ## [1] FALSE
    ## 
    ## [[6]]
    ## [1] FALSE
    ## 
    ## [[7]]
    ## [1] FALSE
    map_lgl(sw_films, ~.x[["director"]] == "George Lucas")
    ## [1]  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE

    Good work! Mastering the flavors of map_*() is key for success in purrr.

    2.2.4 map_dbl() and map_int()

    Some flavors of map_() are very similar. map_dbl() and map_int() both output numbers. map_int() outputs integer vectors, which have numbers with no decimals. map_dbl() outputs double vectors, which have numbers that can have decimals. Take a closer look at how using different map_() functions affect outputs.

    Here is the map_*() table again as a reference.

    map_*() Output
    map_chr() character vector
    map_lgl() logical vector [TRUE or FALSE]
    map_int() integer vector
    map_dbl() double vector

    Compare the map() and map_dbl() outputs for pulling out the episode_id for each element of sw_films.

    # Pull out episode_id element as list
    map(sw_films, ~.x[["episode_id"]])
    ## [[1]]
    ## [1] 4
    ## 
    ## [[2]]
    ## [1] 2
    ## 
    ## [[3]]
    ## [1] 1
    ## 
    ## [[4]]
    ## [1] 3
    ## 
    ## [[5]]
    ## [1] 6
    ## 
    ## [[6]]
    ## [1] 5
    ## 
    ## [[7]]
    ## [1] 7
    # Pull out episode_id element as double vector
    map_dbl(sw_films, ~.x[["episode_id"]])
    ## [1] 4 2 1 3 6 5 7

    Compare the map() and map_int() outputs for pulling out the episode_id for each element of sw_films.

    # Pull out episode_id element as a list
    map(sw_films, ~.x[["episode_id"]])
    ## [[1]]
    ## [1] 4
    ## 
    ## [[2]]
    ## [1] 2
    ## 
    ## [[3]]
    ## [1] 1
    ## 
    ## [[4]]
    ## [1] 3
    ## 
    ## [[5]]
    ## [1] 6
    ## 
    ## [[6]]
    ## [1] 5
    ## 
    ## [[7]]
    ## [1] 7
    # Pull out episode_id element as integer vector
    map_int(sw_films, ~.x[["episode_id"]])
    ## [1] 4 2 1 3 6 5 7

    Good work! Now you can output numbers without decimals!

    2.3 map2() and pmap()

    2.3.1 Simulating data with multiple inputs using map2()

    The map() function is great if you need to iterate over one list, however, you will often need to iterate over two lists at the same time. This is where map2() comes in. While map() takes the list as the .x argument; map2() takes two lists as two arguments: .x and .y.

    To test out map2(), you are going to create a simple dataset, with one list of numbers and one list of strings. You will put these two lists together and create some simulated data.

  • Create a list, means, of the values 1 through 3.
  • # List of 1, 2 and 3
    means <- list(1,2,3)
  • Create a sites list with “north”, “west”, and “east”.
  • # Create sites list
    sites <- list("north","west","east")
  • map2() over the sites and means lists to create a dataframe with two columns.
    • First column is sites; second column is generated by rnorm() with mean from the means list.
  • # Map over two arguments: sites and means
    list_of_files_map2 <- map2(sites, means, ~data.frame(sites = .x,
                                 a = rnorm(mean = .y, n = 200, sd = (5/2))))
    
    
    map(list_of_files_map2,~head(.x))
    ## [[1]]
    ##   sites         a
    ## 1 north  3.449187
    ## 2 north  2.893941
    ## 3 north -2.361453
    ## 4 north  1.442438
    ## 5 north  1.414757
    ## 6 north  2.054845
    ## 
    ## [[2]]
    ##   sites          a
    ## 1  west  2.1773931
    ## 2  west  1.8438938
    ## 3  west  4.9336391
    ## 4  west  3.2757952
    ## 5  west -0.2904645
    ## 6  west  2.6134759
    ## 
    ## [[3]]
    ##   sites        a
    ## 1  east 2.297837
    ## 2  east 2.864035
    ## 3  east 3.616742
    ## 4  east 8.251796
    ## 5  east 3.199242
    ## 6  east 1.196774

    Good work! Now you can you two lists together!

    2.3.2 Simulating data 3+ inputs with pmap()

    What if you need to iterate over three lists? Is there a map3()? To iterate over more than two lists, whether it’s three, four, or even 20, you’ll need to use pmap(). However, pmap() does require us to supply our list arguments a bit differently.

    To use pmap(), you first need to create a master list of all the lists we want to iterate over. The master list is the input for pmap(). Instead of using .x or .y, use the list names as the argument names.

    You are going to simulate data one more time, using five lists as inputs, instead of two. Using pmap() gives you complete control over our simulated dataset, and will allow you to use two different means and two different standard deviations along with the different sites.

  • Create a named list containing the sites, means, means2, sigma, and sigma2 lists.
  • means2=list(0.5,1,1.5)
    sigma2=list(0.5,1,1.5)
    sigma=list(1,2,3)
    # Create a master list, a list of lists
    pmapinputs <- list(sites = sites, means = means, sigma = sigma, 
                       means2 = means2, sigma2 = sigma2)
  • pmap() over the list of lists, to create a list of dataframes with three columns; the first column is sites.
    • The second column is a, which is rnorm() with mean = means, and sd = sigma.
    • The third column is b, which is rnorm() with mean = means2, and sd = sigma2.
  • # Create a master list, a list of lists
    pmapinputs <- list(sites = sites, means = means, sigma = sigma, 
                       means2 = means2, sigma2 = sigma2)
    
    # Map over the master list
    list_of_files_pmap <- pmap(pmapinputs, 
      function(sites, means, sigma, means2, sigma2){
        data.frame(sites = sites,
            a = rnorm(mean = means,  n = 200, sd = sigma),
            b = rnorm(mean = means2, n = 200, sd = sigma2))})
    
    map(list_of_files_pmap,~head(.x))
    ## [[1]]
    ##   sites          a          b
    ## 1 north  0.8789700  0.3855860
    ## 2 north -0.2245231  1.0029900
    ## 3 north  0.6417973  0.6355501
    ## 4 north  1.8780409  0.9760013
    ## 5 north  1.5165513 -0.1304455
    ## 6 north  2.4963962  1.1369883
    ## 
    ## [[2]]
    ##   sites           a           b
    ## 1  west -0.09834419  1.30693846
    ## 2  west -0.64468010  0.57628770
    ## 3  west  4.81134596 -0.01585508
    ## 4  west -0.85907440 -0.18470665
    ## 5  west  0.47639746  0.11106034
    ## 6  west  2.02665430  1.06197220
    ## 
    ## [[3]]
    ##   sites         a         b
    ## 1  east  5.537842 2.2437003
    ## 2  east  2.314830 1.1598322
    ## 3  east  1.287959 2.8198972
    ## 4  east  9.464502 1.1001475
    ## 5  east -1.857650 2.6695855
    ## 6  east  4.580386 0.1986446

    Good work! With pmap() you now have all the power in purrr.

    3 Troubleshooting lists with purrr

    Like anything in R, understanding how to troubleshoot issues is an important skill set. This can be particularly important with lists, where finding the problem can be tricky.

    3.1 How to purrr safely()

    3.1.1 safely() replace with NA

    If you map() over a list, and one of the elements does not have the right data type, you will not get the output you expect. Perhaps you are trying to do a mathematical operation on each element, and it turns out one of the elements is a character - it simply won’t work.

    If you have a very large list, figuring out where things went wrong, and what exactly went wrong can be hard. That is where safely() comes in; it shows you both your results and where the errors occurred in your map() call.

    • Use safely() with log(). This will fail to work on -10, so we’ll pipe it into transpose() to put the results first.
    # Map safely over log
    a <- list(-10, 1, 10, 0) %>% 
          map(safely(log, otherwise = NA_real_)) %>%
        # Transpose the result
          transpose()
    ## Warning in .f(...): NaNs produced
    • Print out a.
    # Print the list
    a
    ## $result
    ## $result[[1]]
    ## [1] NaN
    ## 
    ## $result[[2]]
    ## [1] 0
    ## 
    ## $result[[3]]
    ## [1] 2.302585
    ## 
    ## $result[[4]]
    ## [1] -Inf
    ## 
    ## 
    ## $error
    ## $error[[1]]
    ## NULL
    ## 
    ## $error[[2]]
    ## NULL
    ## 
    ## $error[[3]]
    ## NULL
    ## 
    ## $error[[4]]
    ## NULL
    • Print out the “result” element of a.
    # Print the result element in the list
    a[["result"]]
    ## [[1]]
    ## [1] NaN
    ## 
    ## [[2]]
    ## [1] 0
    ## 
    ## [[3]]
    ## [1] 2.302585
    ## 
    ## [[4]]
    ## [1] -Inf
    • Print out just the error messages from a.
    # Print the error element in the list
    a[["error"]]
    ## [[1]]
    ## NULL
    ## 
    ## [[2]]
    ## NULL
    ## 
    ## [[3]]
    ## NULL
    ## 
    ## [[4]]
    ## NULL

    Good work! Now you have the power to start debugging your lists, and you can do it with simple element subsetting.

    3.1.2 Convert data to numeric with purrr

    In the sw_people dataset, some of the Star Wars characters have unknown heights. If you want to do some data exploration and determine how character height differs depending on their home planet, you need to write your code so that R understands the difference between heights and missing values. Currently, the missing values are entered as “unknown”, but you would like them as NA. In this exercise, you will combine map() and ifelse() to fix this issue.

  • Load the sw_people dataset.
  • # Load sw_people data
    data(sw_people)
  • Map over sw_people and pull out “height”.
  • Then map over the output and if an element is labeled as “unknown” change it to NA, otherwise, convert the value into a number with as.numeric().
  • # Map over sw_people and pull out the height element
    height_cm <- map(sw_people, "height") %>%
      map(function(x){
        ifelse(x == "unknown",NA,
        as.numeric(x))
    })

    Good work! Now you can use purrr for data wrangling to help clean numeric data in lists.

    3.1.3 Finding the problem areas

    When you are working with a small list, it might not seem like a lot of work to go through things manually and figure out what element has an issue. But if you have a list with hundreds or thousands of elements, you want to automate that process.

    Now you’ll look at a situation with a larger list, where you can see how the error message can be useful to check through the entire list for issues.

  • map() over sw_people and pull out the “height” element.
  • map() over safely() to convert the heights from centimeters into feet.
  • Set quiet = FALSE so that errors are printed.
  • # Map over sw_people and pull out the height element
    height_ft <- map(sw_people, "height")  %>% 
      map(safely(function(x){
        x * 0.0328084
      }, quiet = FALSE)) %>%
    transpose() 
  • Pipe into transpose(), to print the results first.
  • # Print your list, the result element, and the error element
    #height_ft
    #height_ft[["result"]]
    #height_ft[["error"]]

    Good work! Now you are ready to troubleshoot lists too large to check by hand.

    3.2 Another way to possibly() purrr

    3.2.1 Replace safely() with possibly()

    Once you have figured out how to solve an issue with safely(), (e.g., output an NA in place of an error), swap out safely() with possibly(). possibly() will run through your code and implement your desired changes without printing out the error messages.

    You’ll now map() over log() again, but you will use possibly() instead of safely() since you already know how to resolve your errors.

    • Create a list with the values -10, 1, 10, and 0.
    • map() over this list to take the log() of each element, using possibly().
    • Use NA_real_ to fix any elements that are not the right data type.
    # Take the log of each element in the list
    a <- list(-10, 1, 10, 0) %>% 
      map(possibly(function(x){
        log(x)
    },NA_real_))
    ## Warning in log(x): NaNs produced

    Good work! Now you can solve issues in lists using safely(), and then continue with your analysis using possibly().

    3.2.2 Convert values with possibly()

    Let’s say you need to convert the Star Wars character heights in sw_people from centimeters to feet. You already know that some of the heights have missing data, so you will use possibly() to convert missing values into NA. Then you will multiply each of the existing values by 0.0328084 to convert them from centimeters into feet.

    To get a feel for your data, print out height_cm in the console to check out the heights in centimeters.

    • Pipe the height_cm object into a map_*() function that returns double vectors.
    • Convert each element in height_cm into feet (multiply it by 0.0328084).
    • Since not all elements are numeric, use possibly() to replace instances that do not work with NA_real_.
    # Create a piped workflow that returns double vectors
    height_cm %>%  
      map_dbl(possibly(function(x){
      # Convert centimeters to feet
      x * 0.0328084
    }, NA_real_)) 
    ##  [1] 5.643045 5.479003 3.149606 6.627297 4.921260 5.839895 5.413386 3.182415
    ##  [9] 6.003937 5.971129 6.167979 5.905512 7.480315 5.905512 5.675853 5.741470
    ## [17] 5.577428 5.905512 2.165354 5.577428 6.003937 6.561680 6.233596 5.807087
    ## [25] 5.741470 5.905512 4.921260       NA 2.887139 5.249344 6.332021 6.266404
    ## [33] 5.577428 6.430446 7.349082 6.758530 6.003937 4.494751 3.674541 6.003937
    ## [41] 5.347769 5.741470 5.905512 5.839895 3.083990 4.002625 5.347769 6.167979
    ## [49] 6.496063 6.430446 5.610236 6.036746 6.167979 8.661418 6.167979 6.430446
    ## [57] 6.069554 5.150919 6.003937 6.003937 5.577428 5.446194 5.413386 6.332021
    ## [65] 6.266404 6.003937 5.511811 6.496063 7.513124 6.988189 5.479003 2.591864
    ## [73] 3.149606 6.332021 6.266404 5.839895 7.086614 7.677166 6.167979 5.839895
    ## [81] 6.758530       NA       NA       NA       NA       NA 5.413386

    Good work! Using possibly() helps us work with problem data in a really clean and efficient way.

    3.3 purrr is a walk() in the park

    3.3.1 Comparing walk() vs no walk() outputs

    Printing out lists with map() shows a lot of bracketed text in the console, which can be useful for understanding their structure, but this information is usually not important for communicating with your end users. If you need to print, using walk() prints out lists in a more compact and human-readable way, without all those brackets. walk() is also great for printing out plots without printing anything to the console.

    Here, you’ll be using the people_by_film dataset, which dataset derived from sw_films that has the url of each character and the film they appear in.

    Print people_by_film to the console.

    # Print normally
    people_by_film=read.csv("https://docs.google.com/spreadsheets/d/e/2PACX-1vRObsvb_OQ7qeXRvkTEbWBbQcYfyebglhoxAt9cIdRzH7Exf5s-mMqSgtjkHC0qNgK4PVsku7Q0bwfS/pub?gid=0&single=true&output=csv")
    people_by_film %>% head()
    ##                             url                     film_url
    ## 1 http://swapi.co/api/people/1/ http://swapi.co/api/films/6/
    ## 2 http://swapi.co/api/people/1/ http://swapi.co/api/films/3/
    ## 3 http://swapi.co/api/people/1/ http://swapi.co/api/films/2/
    ## 4 http://swapi.co/api/people/1/ http://swapi.co/api/films/1/
    ## 5 http://swapi.co/api/people/1/ http://swapi.co/api/films/7/
    ## 6 http://swapi.co/api/people/2/ http://swapi.co/api/films/5/

    Print out people_by_film using walk() and print().

    # Print with walk
    walk(people_by_film, print)
    ##   [1] "http://swapi.co/api/people/1/"  "http://swapi.co/api/people/1/" 
    ##   [3] "http://swapi.co/api/people/1/"  "http://swapi.co/api/people/1/" 
    ##   [5] "http://swapi.co/api/people/1/"  "http://swapi.co/api/people/2/" 
    ##   [7] "http://swapi.co/api/people/2/"  "http://swapi.co/api/people/2/" 
    ##   [9] "http://swapi.co/api/people/2/"  "http://swapi.co/api/people/2/" 
    ##  [11] "http://swapi.co/api/people/2/"  "http://swapi.co/api/people/3/" 
    ##  [13] "http://swapi.co/api/people/3/"  "http://swapi.co/api/people/3/" 
    ##  [15] "http://swapi.co/api/people/3/"  "http://swapi.co/api/people/3/" 
    ##  [17] "http://swapi.co/api/people/3/"  "http://swapi.co/api/people/3/" 
    ##  [19] "http://swapi.co/api/people/4/"  "http://swapi.co/api/people/4/" 
    ##  [21] "http://swapi.co/api/people/4/"  "http://swapi.co/api/people/4/" 
    ##  [23] "http://swapi.co/api/people/5/"  "http://swapi.co/api/people/5/" 
    ##  [25] "http://swapi.co/api/people/5/"  "http://swapi.co/api/people/5/" 
    ##  [27] "http://swapi.co/api/people/5/"  "http://swapi.co/api/people/6/" 
    ##  [29] "http://swapi.co/api/people/6/"  "http://swapi.co/api/people/6/" 
    ##  [31] "http://swapi.co/api/people/7/"  "http://swapi.co/api/people/7/" 
    ##  [33] "http://swapi.co/api/people/7/"  "http://swapi.co/api/people/8/" 
    ##  [35] "http://swapi.co/api/people/9/"  "http://swapi.co/api/people/10/"
    ##  [37] "http://swapi.co/api/people/10/" "http://swapi.co/api/people/10/"
    ##  [39] "http://swapi.co/api/people/10/" "http://swapi.co/api/people/10/"
    ##  [41] "http://swapi.co/api/people/10/" "http://swapi.co/api/people/11/"
    ##  [43] "http://swapi.co/api/people/11/" "http://swapi.co/api/people/11/"
    ##  [45] "http://swapi.co/api/people/12/" "http://swapi.co/api/people/12/"
    ##  [47] "http://swapi.co/api/people/13/" "http://swapi.co/api/people/13/"
    ##  [49] "http://swapi.co/api/people/13/" "http://swapi.co/api/people/13/"
    ##  [51] "http://swapi.co/api/people/13/" "http://swapi.co/api/people/14/"
    ##  [53] "http://swapi.co/api/people/14/" "http://swapi.co/api/people/14/"
    ##  [55] "http://swapi.co/api/people/14/" "http://swapi.co/api/people/15/"
    ##  [57] "http://swapi.co/api/people/16/" "http://swapi.co/api/people/16/"
    ##  [59] "http://swapi.co/api/people/16/" "http://swapi.co/api/people/18/"
    ##  [61] "http://swapi.co/api/people/18/" "http://swapi.co/api/people/18/"
    ##  [63] "http://swapi.co/api/people/19/" "http://swapi.co/api/people/20/"
    ##  [65] "http://swapi.co/api/people/20/" "http://swapi.co/api/people/20/"
    ##  [67] "http://swapi.co/api/people/20/" "http://swapi.co/api/people/20/"
    ##  [69] "http://swapi.co/api/people/21/" "http://swapi.co/api/people/21/"
    ##  [71] "http://swapi.co/api/people/21/" "http://swapi.co/api/people/21/"
    ##  [73] "http://swapi.co/api/people/21/" "http://swapi.co/api/people/22/"
    ##  [75] "http://swapi.co/api/people/22/" "http://swapi.co/api/people/22/"
    ##  [77] "http://swapi.co/api/people/23/" "http://swapi.co/api/people/24/"
    ##  [79] "http://swapi.co/api/people/25/" "http://swapi.co/api/people/25/"
    ##  [81] "http://swapi.co/api/people/26/" "http://swapi.co/api/people/27/"
    ##  [83] "http://swapi.co/api/people/27/" "http://swapi.co/api/people/28/"
    ##  [85] "http://swapi.co/api/people/29/" "http://swapi.co/api/people/30/"
    ##  [87] "http://swapi.co/api/people/31/" "http://swapi.co/api/people/32/"
    ##  [89] "http://swapi.co/api/people/33/" "http://swapi.co/api/people/33/"
    ##  [91] "http://swapi.co/api/people/33/" "http://swapi.co/api/people/34/"
    ##  [93] "http://swapi.co/api/people/36/" "http://swapi.co/api/people/36/"
    ##  [95] "http://swapi.co/api/people/37/" "http://swapi.co/api/people/38/"
    ##  [97] "http://swapi.co/api/people/39/" "http://swapi.co/api/people/40/"
    ##  [99] "http://swapi.co/api/people/40/" "http://swapi.co/api/people/41/"
    ## [101] "http://swapi.co/api/people/42/" "http://swapi.co/api/people/43/"
    ## [103] "http://swapi.co/api/people/43/" "http://swapi.co/api/people/44/"
    ## [105] "http://swapi.co/api/people/45/" "http://swapi.co/api/people/46/"
    ## [107] "http://swapi.co/api/people/46/" "http://swapi.co/api/people/46/"
    ## [109] "http://swapi.co/api/people/48/" "http://swapi.co/api/people/49/"
    ## [111] "http://swapi.co/api/people/50/" "http://swapi.co/api/people/51/"
    ## [113] "http://swapi.co/api/people/51/" "http://swapi.co/api/people/51/"
    ## [115] "http://swapi.co/api/people/52/" "http://swapi.co/api/people/52/"
    ## [117] "http://swapi.co/api/people/52/" "http://swapi.co/api/people/53/"
    ## [119] "http://swapi.co/api/people/53/" "http://swapi.co/api/people/53/"
    ## [121] "http://swapi.co/api/people/54/" "http://swapi.co/api/people/54/"
    ## [123] "http://swapi.co/api/people/55/" "http://swapi.co/api/people/55/"
    ## [125] "http://swapi.co/api/people/56/" "http://swapi.co/api/people/56/"
    ## [127] "http://swapi.co/api/people/57/" "http://swapi.co/api/people/58/"
    ## [129] "http://swapi.co/api/people/58/" "http://swapi.co/api/people/58/"
    ## [131] "http://swapi.co/api/people/59/" "http://swapi.co/api/people/59/"
    ## [133] "http://swapi.co/api/people/60/" "http://swapi.co/api/people/61/"
    ## [135] "http://swapi.co/api/people/62/" "http://swapi.co/api/people/63/"
    ## [137] "http://swapi.co/api/people/63/" "http://swapi.co/api/people/64/"
    ## [139] "http://swapi.co/api/people/64/" "http://swapi.co/api/people/65/"
    ## [141] "http://swapi.co/api/people/66/" "http://swapi.co/api/people/67/"
    ## [143] "http://swapi.co/api/people/67/" "http://swapi.co/api/people/68/"
    ## [145] "http://swapi.co/api/people/68/" "http://swapi.co/api/people/69/"
    ## [147] "http://swapi.co/api/people/70/" "http://swapi.co/api/people/71/"
    ## [149] "http://swapi.co/api/people/72/" "http://swapi.co/api/people/73/"
    ## [151] "http://swapi.co/api/people/74/" "http://swapi.co/api/people/47/"
    ## [153] "http://swapi.co/api/people/75/" "http://swapi.co/api/people/75/"
    ## [155] "http://swapi.co/api/people/76/" "http://swapi.co/api/people/77/"
    ## [157] "http://swapi.co/api/people/78/" "http://swapi.co/api/people/78/"
    ## [159] "http://swapi.co/api/people/79/" "http://swapi.co/api/people/80/"
    ## [161] "http://swapi.co/api/people/81/" "http://swapi.co/api/people/81/"
    ## [163] "http://swapi.co/api/people/82/" "http://swapi.co/api/people/82/"
    ## [165] "http://swapi.co/api/people/83/" "http://swapi.co/api/people/84/"
    ## [167] "http://swapi.co/api/people/85/" "http://swapi.co/api/people/86/"
    ## [169] "http://swapi.co/api/people/87/" "http://swapi.co/api/people/88/"
    ## [171] "http://swapi.co/api/people/35/" "http://swapi.co/api/people/35/"
    ## [173] "http://swapi.co/api/people/35/"
    ##   [1] "http://swapi.co/api/films/6/" "http://swapi.co/api/films/3/"
    ##   [3] "http://swapi.co/api/films/2/" "http://swapi.co/api/films/1/"
    ##   [5] "http://swapi.co/api/films/7/" "http://swapi.co/api/films/5/"
    ##   [7] "http://swapi.co/api/films/4/" "http://swapi.co/api/films/6/"
    ##   [9] "http://swapi.co/api/films/3/" "http://swapi.co/api/films/2/"
    ##  [11] "http://swapi.co/api/films/1/" "http://swapi.co/api/films/5/"
    ##  [13] "http://swapi.co/api/films/4/" "http://swapi.co/api/films/6/"
    ##  [15] "http://swapi.co/api/films/3/" "http://swapi.co/api/films/2/"
    ##  [17] "http://swapi.co/api/films/1/" "http://swapi.co/api/films/7/"
    ##  [19] "http://swapi.co/api/films/6/" "http://swapi.co/api/films/3/"
    ##  [21] "http://swapi.co/api/films/2/" "http://swapi.co/api/films/1/"
    ##  [23] "http://swapi.co/api/films/6/" "http://swapi.co/api/films/3/"
    ##  [25] "http://swapi.co/api/films/2/" "http://swapi.co/api/films/1/"
    ##  [27] "http://swapi.co/api/films/7/" "http://swapi.co/api/films/5/"
    ##  [29] "http://swapi.co/api/films/6/" "http://swapi.co/api/films/1/"
    ##  [31] "http://swapi.co/api/films/5/" "http://swapi.co/api/films/6/"
    ##  [33] "http://swapi.co/api/films/1/" "http://swapi.co/api/films/1/"
    ##  [35] "http://swapi.co/api/films/1/" "http://swapi.co/api/films/5/"
    ##  [37] "http://swapi.co/api/films/4/" "http://swapi.co/api/films/6/"
    ##  [39] "http://swapi.co/api/films/3/" "http://swapi.co/api/films/2/"
    ##  [41] "http://swapi.co/api/films/1/" "http://swapi.co/api/films/5/"
    ##  [43] "http://swapi.co/api/films/4/" "http://swapi.co/api/films/6/"
    ##  [45] "http://swapi.co/api/films/6/" "http://swapi.co/api/films/1/"
    ##  [47] "http://swapi.co/api/films/6/" "http://swapi.co/api/films/3/"
    ##  [49] "http://swapi.co/api/films/2/" "http://swapi.co/api/films/1/"
    ##  [51] "http://swapi.co/api/films/7/" "http://swapi.co/api/films/3/"
    ##  [53] "http://swapi.co/api/films/2/" "http://swapi.co/api/films/1/"
    ##  [55] "http://swapi.co/api/films/7/" "http://swapi.co/api/films/1/"
    ##  [57] "http://swapi.co/api/films/4/" "http://swapi.co/api/films/3/"
    ##  [59] "http://swapi.co/api/films/1/" "http://swapi.co/api/films/3/"
    ##  [61] "http://swapi.co/api/films/2/" "http://swapi.co/api/films/1/"
    ##  [63] "http://swapi.co/api/films/1/" "http://swapi.co/api/films/5/"
    ##  [65] "http://swapi.co/api/films/4/" "http://swapi.co/api/films/6/"
    ##  [67] "http://swapi.co/api/films/3/" "http://swapi.co/api/films/2/"
    ##  [69] "http://swapi.co/api/films/5/" "http://swapi.co/api/films/4/"
    ##  [71] "http://swapi.co/api/films/6/" "http://swapi.co/api/films/3/"
    ##  [73] "http://swapi.co/api/films/2/" "http://swapi.co/api/films/5/"
    ##  [75] "http://swapi.co/api/films/3/" "http://swapi.co/api/films/2/"
    ##  [77] "http://swapi.co/api/films/2/" "http://swapi.co/api/films/2/"
    ##  [79] "http://swapi.co/api/films/3/" "http://swapi.co/api/films/2/"
    ##  [81] "http://swapi.co/api/films/2/" "http://swapi.co/api/films/3/"
    ##  [83] "http://swapi.co/api/films/7/" "http://swapi.co/api/films/3/"
    ##  [85] "http://swapi.co/api/films/3/" "http://swapi.co/api/films/3/"
    ##  [87] "http://swapi.co/api/films/3/" "http://swapi.co/api/films/4/"
    ##  [89] "http://swapi.co/api/films/5/" "http://swapi.co/api/films/4/"
    ##  [91] "http://swapi.co/api/films/6/" "http://swapi.co/api/films/4/"
    ##  [93] "http://swapi.co/api/films/5/" "http://swapi.co/api/films/4/"
    ##  [95] "http://swapi.co/api/films/4/" "http://swapi.co/api/films/4/"
    ##  [97] "http://swapi.co/api/films/4/" "http://swapi.co/api/films/5/"
    ##  [99] "http://swapi.co/api/films/4/" "http://swapi.co/api/films/4/"
    ## [101] "http://swapi.co/api/films/4/" "http://swapi.co/api/films/5/"
    ## [103] "http://swapi.co/api/films/4/" "http://swapi.co/api/films/4/"
    ## [105] "http://swapi.co/api/films/3/" "http://swapi.co/api/films/5/"
    ## [107] "http://swapi.co/api/films/4/" "http://swapi.co/api/films/6/"
    ## [109] "http://swapi.co/api/films/4/" "http://swapi.co/api/films/4/"
    ## [111] "http://swapi.co/api/films/4/" "http://swapi.co/api/films/5/"
    ## [113] "http://swapi.co/api/films/4/" "http://swapi.co/api/films/6/"
    ## [115] "http://swapi.co/api/films/5/" "http://swapi.co/api/films/4/"
    ## [117] "http://swapi.co/api/films/6/" "http://swapi.co/api/films/5/"
    ## [119] "http://swapi.co/api/films/4/" "http://swapi.co/api/films/6/"
    ## [121] "http://swapi.co/api/films/4/" "http://swapi.co/api/films/6/"
    ## [123] "http://swapi.co/api/films/4/" "http://swapi.co/api/films/6/"
    ## [125] "http://swapi.co/api/films/4/" "http://swapi.co/api/films/6/"
    ## [127] "http://swapi.co/api/films/4/" "http://swapi.co/api/films/5/"
    ## [129] "http://swapi.co/api/films/4/" "http://swapi.co/api/films/6/"
    ## [131] "http://swapi.co/api/films/5/" "http://swapi.co/api/films/4/"
    ## [133] "http://swapi.co/api/films/5/" "http://swapi.co/api/films/5/"
    ## [135] "http://swapi.co/api/films/5/" "http://swapi.co/api/films/5/"
    ## [137] "http://swapi.co/api/films/6/" "http://swapi.co/api/films/5/"
    ## [139] "http://swapi.co/api/films/6/" "http://swapi.co/api/films/5/"
    ## [141] "http://swapi.co/api/films/5/" "http://swapi.co/api/films/5/"
    ## [143] "http://swapi.co/api/films/6/" "http://swapi.co/api/films/5/"
    ## [145] "http://swapi.co/api/films/6/" "http://swapi.co/api/films/5/"
    ## [147] "http://swapi.co/api/films/5/" "http://swapi.co/api/films/5/"
    ## [149] "http://swapi.co/api/films/5/" "http://swapi.co/api/films/5/"
    ## [151] "http://swapi.co/api/films/5/" "http://swapi.co/api/films/4/"
    ## [153] "http://swapi.co/api/films/5/" "http://swapi.co/api/films/6/"
    ## [155] "http://swapi.co/api/films/5/" "http://swapi.co/api/films/5/"
    ## [157] "http://swapi.co/api/films/5/" "http://swapi.co/api/films/6/"
    ## [159] "http://swapi.co/api/films/6/" "http://swapi.co/api/films/6/"
    ## [161] "http://swapi.co/api/films/6/" "http://swapi.co/api/films/1/"
    ## [163] "http://swapi.co/api/films/5/" "http://swapi.co/api/films/6/"
    ## [165] "http://swapi.co/api/films/6/" "http://swapi.co/api/films/7/"
    ## [167] "http://swapi.co/api/films/7/" "http://swapi.co/api/films/7/"
    ## [169] "http://swapi.co/api/films/7/" "http://swapi.co/api/films/7/"
    ## [171] "http://swapi.co/api/films/5/" "http://swapi.co/api/films/4/"
    ## [173] "http://swapi.co/api/films/6/"

    Good work! Now you can use walk() to make your outputs cleaner and more human-readable.

    3.3.2 walk() for printing cleaner list outputs

    Now you will try one more use of walk(), specifically creating plots using walk(). In the previous exercise, you printed some lists, and you saw that printing lists is much cleaner using walk() than using the base R way. You can also use walk() to display multiple plots sequentially.

    Here, use your map() knowledge along with ggplot2 functions to create a graph for the first ten elements of gap_split and then display each graph with walk().

  • Load the gap_split dataset.
  • # Load the gap_split data
    data(gap_split)
  • map2() over the first 10 elements of gap_split, and the first 10 names of gap_split.
  • # Map over the first 10 elements of gap_split
    plots <- map2(gap_split[1:10], 
                  names(gap_split[1:10]), 
                  ~ ggplot(.x, aes(year, lifeExp)) + 
                    geom_line() +
                    labs(title = .y))
  • Then walk() over the new plots object and supply print() as an argument to print all plots.
  • # Object name, then function name
    walk(plots, print)

    Good work! Now you can print out multiple plots easily using walk().

    4 Problem solving with purrr

    Now that you have the building blocks, we will start tackling some more complex data problems with purrr.

    4.1 Using purrr in your workflow

    4.1.1 Name review

    Now, you’ll quickly review how to check if a list has names, and how to pull out a specific element from a list. Remember, you can use the names() function to see if a list is named. There are several ways to extract a named element from a list, but the key difference when working with dataframes is to remember the [[double bracket]] syntax.

    • Load the gh_users data.
    # Load the data
    data(gh_users)
    • Examine the names of gh_users.
    # Check if data has names
    names(gh_users)
    ## NULL
    • Extract the names for each element of gh_users.
    # Map over name element of list
    map(gh_users, ~.x[["name"]])
    ## [[1]]
    ## [1] "Gábor Csárdi"
    ## 
    ## [[2]]
    ## [1] "Jennifer (Jenny) Bryan"
    ## 
    ## [[3]]
    ## [1] "Jeff L."
    ## 
    ## [[4]]
    ## [1] "Julia Silge"
    ## 
    ## [[5]]
    ## [1] "Thomas J. Leeper"
    ## 
    ## [[6]]
    ## [1] "Maëlle Salmon"

    Good work, now we have refreshed the basics of named lists, we can dive into our next task.

    4.1.2 Setting names

    Setting list names makes working with lists much easier in many scenarios; it makes the code easier to read, which is especially important when reviewing code weeks or months later.

    Here you are going to work with the gh_repos and gh_users datasets and set their names in two different ways. The two methods will give the same result: a list with named elements.

  • Set the names on gh_users using the “name” element and use the map_*() function that outputs a character vector.
  • # Name gh_users with the names of the users
    gh_users_named <- gh_users %>% 
        set_names(map_chr(gh_users, "name"))
  • Explore the structure of gh_repos to see where the owner info is stored.
  • # Check gh_repos structure
    #str(gh_repos)
  • Set the names of a new list gh_repos_named based on the login of the owner of the repo, using the set_names() and map_*() functions.
  • # Name gh_repos with the names of the repo owner
    gh_repos_named <- gh_repos %>% 
        map_chr(~ .[[1]]$owner$login) %>% 
        set_names(gh_repos, .)

    Good work! Sometimes list naming is tricky but purrr makes it simpler by easily extracting the element we want to use as the names.

    4.1.3 Asking questions from a list

    One of the great things about purrr is you can easily move from having a question about the data to an answer, with just a few lines of code. Here you are going to use the gh_users data to ask three questions:

    • Which user joined GitHub first?
    • Are all the repositories user-owned, rather than organization-owned?
    • Which user has the most public repositories?

    In this exercise, your map_*() knowledge is really tested, so make sure to reflect on all the different flavors of map_*() and how they should be used.

    Name gh_users with the “name” element and sort the “created_at” element to determine who joined GitHub first.

    # Determine who joined github first
    map_chr(gh_users, ~.x[["created_at"]]) %>%
          set_names(map_chr(gh_users, "name")) %>%
        sort()
    ## Jennifer (Jenny) Bryan           Gábor Csárdi                Jeff L. 
    ## "2011-02-03T22:37:41Z" "2011-03-09T17:29:25Z" "2012-03-24T18:16:43Z" 
    ##       Thomas J. Leeper          Maëlle Salmon            Julia Silge 
    ## "2013-02-07T21:07:00Z" "2014-08-05T08:10:04Z" "2015-05-19T02:51:23Z"

    Output a vector that returns TRUE for each element where the “type” is “USER”.

    # Determine user versus organization
    map_lgl(gh_users, ~.x[["type"]] == "User")
    ## [1] TRUE TRUE TRUE TRUE TRUE TRUE

    Output a named numeric vector of the number of “public_repos”.

    # Determine who has the most public repositories
    map_int(gh_users, ~.x[["public_repos"]]) %>%
          set_names(map_chr(gh_users, "name")) %>%
        sort()
    ##            Julia Silge          Maëlle Salmon           Gábor Csárdi 
    ##                     26                     31                     52 
    ##                Jeff L.       Thomas J. Leeper Jennifer (Jenny) Bryan 
    ##                     67                     99                    168

    Good work! Now you can use functions you already know to ask any question of your data in just a few lines of code.

    4.2 Even more complex problems

    Questions about gh_repos

    You’re going to use gh_repos again, a list where each element is information about a GitHub repository. Here you will use map() and map_dbl() to answer the question:

    • Which repository is the largest?’

    GitHub repository size is measured in megabytes. This information could be useful to document if you are working with a list based dataset that changes over time, and need to be able to pull out information, like the largest repository, in the most recent dataset.

    • map() over gh_repos.
    • map_dbl() over the `“size” element.
    • Then map() to determine which repo is the largest.
    # Map over gh_repos to generate numeric output
    map(gh_repos,
        ~map_dbl(.x, 
                 ~.x[["size"]])) %>%
        # Grab the largest element
        map(~max(.x))
    ## [[1]]
    ## [1] 39461
    ## 
    ## [[2]]
    ## [1] 96325
    ## 
    ## [[3]]
    ## [1] 374812
    ## 
    ## [[4]]
    ## [1] 24070
    ## 
    ## [[5]]
    ## [1] 558176
    ## 
    ## [[6]]
    ## [1] 76455

    Good work! You’re gaining great skills to be able to answer questions in a reproducible way with your datasets.

    4.3 Graphs in purrr

    4.3.1 ggplot() refresher

    You’ve already been introduced to the package ggplot2 in the prerequisite for this course, but let’s do a quick refresher.

    • geom_point() makes scatterplots
    • geom_histogram() makes histograms

    In this exercise, you are going to use a dataframe created from the gh_users dataset, called gh_users_df that has two columns; one for the number of public repositories a user has and another for how many followers that user has. Each row is a different user. Then you will make it into a scatter plot, a plot where the data are displayed with points.

    Create a scatterplot with public_repos on the x axis and followers on the y axis.

    gh_users_df=tribble(~public_repos, ~followers,
    52,       303,
    168,       780,
    67,      3958,
    26,       115,
    99,       213,
    31,        34)
    # Scatter plot of public repos and followers
    ggplot(data = gh_users_df, 
           aes(x = public_repos, y = followers))+
        geom_point()

    Create a histogram of followers by piping in gh_users_df.

    # Histogram of followers    
    gh_users_df %>%
        ggplot(aes(x = followers))+
            geom_histogram()
    ## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

    Good work! Isn’t making plots fun? Now let’s dive into how purrr can help make more of them!

    4.3.2 purrr and scatterplots

    Since ggplot() does not accept lists as an input, it can be paired up with purrr to go from a list to a dataframe to a ggplot() graph in just a few lines of code.

    You will continue to work with the gh_users data for this exercise. You will use a map_*() function to pull out a few of the named elements and transform them into the correct datatype. Then create a scatterplot that compares the user’s number of followers to the user’s number of public repositories.

    • map() over gh_users, use the map_*() function that creates a dataframe, with four columns, named “login”, “name”, “followers” and “public_repos”.
    • Pipe that dataframe into a scatterplot, where the x axis is followers and y is public_repos.
    # Create a dataframe with four columns
    map_df(gh_users, `[`, 
           c("login","name","followers","public_repos")) %>%
      # Plot followers by public_repos
      ggplot(., 
             aes(x = followers, y = public_repos)) + 
          # Create scatter plots
          geom_point()

    Good work! Now you can go from list to plot using a tidy workflow!

    4.3.3 purrr and histograms

    Now you’re going to put together everything you’ve learned, starting with two different lists, which will be turned into a faceted histogram. You’re going to work again with the Stars Wars data from the sw_films and sw_people datasets to answer a question:

    • What is the distribution of heights of characters in each of the Star Wars films?

    Different movies take place on different sets of planets, so you might expect to see different distributions of heights from the characters. Your first task is to transform the two datasets into dataframes since ggplot() requires a dataframe input. Then you will join them together, and plot the result, a histogram with a different facet, or subplot, for each film.

  • Create a dataframe with the “title” of each film, and the “characters” from each film in the sw_films dataset.
  • # Turn data into correct dataframe format
    film_by_character <- tibble(filmtitle = map_chr(sw_films, "title")) %>%
        mutate(filmtitle, characters = map(sw_films, "characters")) %>%
        unnest()
    ## Warning: `cols` is now required when using unnest().
    ## Please use `cols = c(characters)`
  • Create a dataframe with the “height”, “mass”, “name”, and “url” elements from sw_people.
  • # Pull out elements from sw_people
    sw_characters <- map_df(sw_people, `[`, c("height","mass","name","url"))
  • Join the two dataframes together using the “characters” and “url” keys.
  • # Join our two new objects
    character_data <- inner_join(film_by_character, sw_characters, by = c("characters" = "url")) %>%
        # Make sure the columns are numbers
        mutate(height = as.numeric(height), mass = as.numeric(mass))
    ## Warning in mask$eval_all_mutate(quo): NAs introduced by coercion
    
    ## Warning in mask$eval_all_mutate(quo): NAs introduced by coercion
  • Create a ggplot() histogram with x = height, faceted by filmtitle.
  • # Plot the heights, faceted by film title
    ggplot(character_data, aes(x = height)) +
      geom_histogram(stat = "count") +
      facet_wrap(~ filmtitle)
    ## Warning: Ignoring unknown parameters: binwidth, bins, pad
    ## Warning: Removed 6 rows containing non-finite values (stat_count).

    Good work! Now you’ve learned all the basics of how you can use purrr to make tasks that require iteration and working with lists, more manageable, and human readable!