Chapter 9 Lists

In terms of ‘data types’ (objects containing data) we have only been working with atomic vectors and matrices which are both homogenous objects – they can always only contain data of one specific type.

As we will learn, data frames are typically lists of atomic vectors of the same length.

Figure 9.1: As we will learn, data frames are typically lists of atomic vectors of the same length.

In this chapter we will learn about lists and data frames which allow to store heterogenous data, e.g., integers and characters both in the same object. This chapter does not cover all details of lists, but we will learn all the important aspects we need in the next chapter when learning about data frames as data frames are based on lists (similar to matrices being based on vectors).

Quick reminder: Frequently used data types and how they can be distinguished by there dimensionality and whether they are homogeneous (all elements of the same type) vs. heterogeneous (elements can be of different types).

Dimension Homogenous Heterogenous
1 Atomic vectors Lists
2 Matrix Data frame
\(\ge 1\) Array

9.1 List introduction

As (atomic) vectors, lists are also sequences of elements. However, in contrast to (atomic) vectors, lists allow for heterogenity. Technically a list is a generic vector but often only called ‘list’. We will use the same term in this book (atomic vectors: vectors; generic vectors: lists).

Basis: Lists serve as the basis for most complex objects in R.

  • Data frames: Lists of variables of the same length, often (but not necessarily) atomic vectors.
  • Fitted regression models: Lists of different elements such as the model parameters, covariance matrix, residuals, but also more technical information such as the regression terms or certain matrix decompositions.

Difference: The difference between vectors and lists.

  • Vector: All elements must have the same basic type.
  • List: Different elements can have different types, including vectors, matrices, lists or more complex objects.

As lists are ‘generic vectors’, empty lists can also be created using the vector() function (compare Creating vectors).

## [[1]]
## NULL
## 
## [[2]]
## NULL

Elements of (unnamed) lists are indexed by [[...]], while we had [...] for vectors. The empty list above has two elements where both (first [[1]] and second [[2]]) both contain NULL.

9.2 Creating lists

We start right away with constructing a few simple lists for illustration. While vectors can be created using c(), lists are most often constructed using the function list().

A vector: Vector of length 2.

## [1] 3 5

A list: List containing the same values.

## [[1]]
## [1] 3
## 
## [[2]]
## [1] 5

This list with two elements ([[1]], [[2]]) contains two vectors, each of length one, indicated by [1] (first vector element).

Another list: Store objects of different types/classes into a list.

## [[1]]
## [1] 3
## 
## [[2]]
## [1] "five" "six" 
## 
## [[3]]
##      [,1] [,2] [,3]
## [1,]    1    3    5
## [2,]    2    4    6

The last example shows a list of length 3 which contains a numeric vector of length \(1\) (first element; [[1]]), a character vector of length \(2\) (second element; [[2]]), and a matrix of dimension \(2 \times 3\) (third element; [[3]]).

The two functions c() and list() work very similarly, except that c(a, b) where a and b are vectors performs coercion to convert all values into one specific type (see Vectors: Coercion).

9.3 Recursive structure

A more practical example: We have some information about Peter Falk, a famous actor who played “Lieutenant Columbo” in the long-running TV series Columbo between 1968–2003.

We would like to store all the information in one single object. As we have to deal with both characters (name; month of birth) and numeric values (year and day of birth), we need an object which allows for heterogenity – a list.

## $name
##   given  family 
## "Peter"  "Falk" 
## 
## $date_of_birth
## $date_of_birth$year
## [1] 1927
## 
## $date_of_birth$month
## [1] "September"
## 
## $date_of_birth$day
## [1] 16

For named lists, the representation (print) changes again compared to unnamed lists. The elements are now indicated by e.g., $name or $date_of_birth$year which we can use to access specific elements. We will come back later when Subsetting lists.

Our new object person is a list which contains a named vector ($name) and a second element ($date_of_birth) which itself contains another named list. This is called a recursive structure, a list can contain lists (can contain lists (can …)). The figure below shows the structure of the object person with the two lists in green, and the different (integer/character) vectors in blue.

Graphical representation of the recursive list object `person`.

Figure 9.2: Graphical representation of the recursive list object person.

A nice way to get an overview of potentially complex (list-)objects is by using the function str() (structure) which we have already seen in the vectors chapter. str() returns us a text-representation similar to the image shown above.

## List of 2
##  $ name         : Named chr [1:2] "Peter" "Falk"
##   ..- attr(*, "names")= chr [1:2] "given" "family"
##  $ date_of_birth:List of 3
##   ..$ year : num 1927
##   ..$ month: chr "September"
##   ..$ day  : num 16

How to read: the (first-level) list has two elements, one called name, one date_of_birth. The name element itself contains a named character of length \(2\), the second element date_of_birth is again a list with \(3\) named elements (year, month, day) containing unnamed (plain) vectors of length \(1\) (numeric, character, numeric). The indent shows the recursive structure, the more to the left the $, the deeper a specific entry in the list.

9.4 List attributes

As all other objects lists always have a specific type and length (object properties). Besides these mandatory properties the default attributes for lists are:

  • Class: Objects of class "list".
  • Names: Can have names (optional; just like vectors).

Let us investigate the person object from above:

## [1] "list"
## [1] 2
## [1] "list"
## [1] "name"          "date_of_birth"

The is.*() function family can be used to check the object.

##      is.list    is.vector is.character   is.logical   is.numeric   is.integer 
##         TRUE         TRUE        FALSE        FALSE        FALSE        FALSE
  • is.list(): Always returns TRUE for lists.
  • is.vector(): As lists are generic vectors they also count as vectors, as long as they only have class, length, type and names (optional).
  • A list is never numeric, integer, character, or logical, no matter what the list contains.

9.5 Subsetting lists

Subsetting on lists works slightly different than on vectors and matrices. The basic concepts stay the same, however, due to the more complex structure of the object, additional operators for subsetting become available.

Operator Return Description
[i] a list Select sub-list containing one or more elements. The index vector i can be integer (possibly negative), character, or logical.
[[i]] content Select the content of a single list element if i is a single (positive) integer or a single character.
$name content Select the content of a single list element using the name of the element (without quotes).
[[j]] content If j is a vector, nested/recursive subsetting takes place.

Note that there is a distinct difference between single brackets ([...]) and double brackets ([[...]]). The first always returns a list which is a subset of the original object to be subsetted, while the latter returns the content of these elements.

Using [i]: The method is similar to vector subsetting except that the result will not be the content of the elements specified, but a sub-list. Let us call persons[1] and see what we get:

## $name
##   given  family 
## "Peter"  "Falk"
## [1] "list"
## [1] "name"

The result of person[1] is again a list, but only contains the first entry of the original object person. In the same way we can use person["name"] or person[c(TRUE, FALSE)]. This also works with vectors, e.g., extracting elements 2:1 (both but reverse order):

## List of 2
##  $ date_of_birth:List of 3
##   ..$ year : num 1927
##   ..$ month: chr "September"
##   ..$ day  : num 16
##  $ name         : Named chr [1:2] "Peter" "Falk"
##   ..- attr(*, "names")= chr [1:2] "given" "family"

A negative index (person[-1]) can be used to get all but the first element (does not work with characters). The result is again a sub-list (like for positive indices).

Using [[i]], single value: This subsetting type is most comparable to vector subsetting. Instead of a sub-list, we will get the content of the element, in this case a named character vector of length \(2\).

##  Named chr [1:2] "Peter" "Falk"
##  - attr(*, "names")= chr [1:2] "given" "family"

The same can be achieved using person[["name"]] on named lists. Note that subsetting with negative indices (person[[-1]]) does not work in combination with double brackets.

Using the $ operator: Most commonly used when working with named lists is the $ operator. This operator is called the dollar operator. Instead of calling person[["name"]] we can also call person$name as long as the name does not contain blanks or special characters. Note that there is no blank before/after the $ operator (not as shown in the output of str()).

##   given  family 
## "Peter"  "Falk"
## [1] "character"

Multiple $ operators can also be combined. If we are interested in the month of birth (month) which is stored within date_of_birth we can use:

## [1] "September"

How to read:

  • Right to left: Return month from date_of_birth of the object person.
  • Left to right: Inside object person access the element date_of_birth, inside date_of_birth access element month.

Using [[j]] with vectors: Take care, something unexpected happens. As an example, let us call person[[c(1, 2)]]. We could think this returns us person[[1]] and person[[2]], but that’s not the case. Instead, nested or recursive subsetting is performed. The two indices (c(1, 2)) are used as indices for different depths of the recursive list.

## [1] "Falk"

What happened: The first element of the vector (here 1) is used for the top-level list. Our first entry the vector name. The second element of the vector (2) is then used to extract the second element of whatever name contains. It is the same as:

## [1] "Falk"

Note: We will not use this often, but keep it in mind if you run into interesting results when subsetting lists or data frames. The same happens if you use a vector, e.g., person[[c("name", "last_name")]] or person[[c("date_of_birth", "month")]].

Exercise 9.1 Practicing subsetting on lists: The following object demo is a list with information about two persons, Frank and Petra (simply copy&paste it into your R session).

## List of 2
##  $ Petra:List of 3
##   ..$ location: chr "Birmingham"
##   ..$ kids    : NULL
##   ..$ job     : chr "Programmer"
##  $ Frank:List of 2
##   ..$ location: chr "Kufstein"
##   ..$ kids    : chr [1:2] "Peter" "Paul"
  • How do we get Franks location?
  • Try demo["Frank"]$location (will return NULL). Why doesn’t this work?
  • How many kids does Frank have (use code to answer)?

Solution. Franks location: demo is a list with two elements, one called "Frank". Thus, we can access the first list element using demo$Frank. This returns the content of this list element which itself is, again, a list. In there, we have the location we are looking for.

Thus, one option to get Franks location is to use:

## [1] "Kufstein"

Alternatively, we could use brackets and subsetting by name. Warning: we need double brackets to access the content.

## [1] "Kufstein"

Try demo["Frank"]$location: This will not work. The reason is that we only use single brackets!

## NULL

demo["Frank"] returns a sub-list, but is still a list of length one. And this list does not contain an element called $location. Thus, we get a NULL return. Lets try:

## $Frank
## $Frank$location
## [1] "Kufstein"
## 
## $Frank$kids
## [1] "Peter" "Paul"

How many kids does Frank have? To answer this question, we need to find out how long the kids vector is. Again, we can access this specific element in different ways, the most easy one:

## [1] "Peter" "Paul"
## [1] 2

Petra moves to Vienna: You can use any subsetting technique and assign a new value. I will stick to the $ operator and do the following:

## List of 2
##  $ Petra:List of 3
##   ..$ location: chr "Vienna"
##   ..$ kids    : NULL
##   ..$ job     : chr "Programmer"
##  $ Frank:List of 2
##   ..$ location: chr "Kufstein"
##   ..$ kids    : chr [1:2] "Peter" "Paul"

9.6 Replacing/deleting elements

Replacement functions are available for all subset types above which can be used to overwrite elements in an existing list or add new elements to a list.

As an example, let us replace the element $name in the person object. We use subsetting with the $ operator and assign (store) a new object. As person$name exists, it will be replaced.

## List of 2
##  $ name         : Named chr [1:3] "Max" "Maximilian" "Mustermann"
##   ..- attr(*, "names")= chr [1:3] "given_name" "middle_name" "family_name"
##  $ date_of_birth:List of 3
##   ..$ year : num 1927
##   ..$ month: chr "September"
##   ..$ day  : num 16

Or replace the month in the date of birth with an integer 9L instead of "September":

## List of 2
##  $ name         : Named chr [1:3] "Max" "Maximilian" "Mustermann"
##   ..- attr(*, "names")= chr [1:3] "given_name" "middle_name" "family_name"
##  $ date_of_birth:List of 3
##   ..$ year : num 1927
##   ..$ month: int 9
##   ..$ day  : num 16

The same way, new elements can be added. If the element we assign an object to does not yet exist, it will be added to the original list object. Let us add a job element containing "Actor":

## List of 3
##  $ name         : Named chr [1:3] "Max" "Maximilian" "Mustermann"
##   ..- attr(*, "names")= chr [1:3] "given_name" "middle_name" "family_name"
##  $ date_of_birth:List of 3
##   ..$ year : num 1927
##   ..$ month: int 9
##   ..$ day  : num 16
##  $ job          : chr "Actor"

Delete elements: To delete an element, we simply have to replace it with a NULL object. An example using a very simple list:

## $a
## [1] "first"
## 
## $b
## [1] "second"
## 
## $c
## NULL
## $b
## [1] "second"
## 
## $c
## NULL

As you can see it is possible that a list element can contain NULL (see element c) but if assigned (x$a <- NULL) R will remove the element completely (not storing NULL on it).

Exercise 9.2 Practicing replacement: As in the previous exercise we will use the following list to work with. The list contains information about Petra and Frank.

We need to update this list and add or change some of the elements.

  • Petra moves from Birmingham to Vienna. Update her location.
  • Frank just got a newborn baby called "Malena". Add her name to the kids vector.
  • Add a third person called ‘Regina’, located in ‘Sidney’. She has one child called ‘Lea’ and works a ‘Teacher’.

Solution. Petra moves to Vienna: You can use any subsetting methods we have just seen. In this solution we will stick to the $ operator. All we have to do is to access Petras location, and assign a new value.

Frank got a third child: Here, we could do the same as for Petra and simply assign a new vector with all three kids to Frank (demo$Frank$kids <- c("Peter", "Paul", "Malena")). However, this is not super nice (hard-coded).

Instead we use c() and combine the vector containing the first two kids with the new born, and store the vector (combination of subsetting and replacement) as follows:

Adding Regina: Regina is not yet in our list, however, we can assign new elements the same way we replace elements. If the element exists it will be overwritten. If it does not exist, it will be added. In this case we want to add a new list to the existing object demo called $Regina:

## List of 3
##  $ Petra :List of 3
##   ..$ location: chr "Vienna"
##   ..$ kids    : NULL
##   ..$ job     : chr "Programmer"
##  $ Frank :List of 2
##   ..$ location: chr "Kufstein"
##   ..$ kids    : chr [1:3] "Peter" "Paul" "Malena"
##  $ Regina:List of 3
##   ..$ location: chr "Sidney"
##   ..$ kids    : chr "Lea"
##   ..$ job     : chr "Teacher"

9.7 Combining lists

Multiple lists can be combined using either c() or list().

  • c(<list 1>, <list 2>): Creates a new list by combining all elements from the two lists. The result is a list with a length of the number of elements from <list 1> and <list 2> combined.
  • list(<list 1>, <list 2>): Creates a new list of length 2, where the first element contains <list 1>, the second <list 2>.

Example: Using two lists list1 and list2 (both of length 2).

## length of list 1 length of list 2 
##                2                2
## [1] 4
## List of 4
##  $ : num 3
##  $ : num 4
##  $ : num 100
##  $ : num 200
## [1] 2
## List of 2
##  $ list_one:List of 2
##   ..$ : num 3
##   ..$ : num 4
##  $ list_two:List of 2
##   ..$ : num 100
##   ..$ : num 200

The latter results in a recursive list where each element from the list res2 itself contains a list with two elements. As shown, we can also name the elements (similar to cbind()/rbind() when creating matrices; Matrices: Combining objects). Note that the naming of the list-elements has a different effect when using c() (try c(list_one = list1, list_two = list2)).

9.8 Summary

Just as a brief summary to recap the new content:

  • Creating lists: Using the function list().
  • Name attribute: Lists can be named or unnamed.
  • Heterogenity: Allows to store objects of different types.
  • Replacement: Subsetting can be used to replace existing elements, add new elements, or delete elemets (assigning NULL).
  • Recursive lists: Lists can be recursive (lists containing lists containing lists …).

An overview of the different subsetting methods we have learned for different objects (vectors, matrices, and lists).

Subset By index By name Logical
Vectors Element x[1] x["name"] [possible]
Matrices Element x[1, 1] or x[1] x["Row 1", "Col A"] [possible]
Row x[1, ] x["Row 1", ] [possible]
Column x[, 1] x[, "Col A"] [possible]
Lists List x[1] x["name"] [possible]
Element x[[1]] x[["name"]] or x$name [not possible]
Element (recursive) x[[c(1, 2)]] x[[c("name1", "name2")]] [not possible]

Most subsetting methods also work with vectors (vectors of indices or names). You will see that we can re-use most of this when working with data frames, our next topic.