Chapter 9 Lists
In terms of ‘data types’ (objects containing data) we have only been working with atomic vectors and matrices which are both homogenous objects – they can always only contain data of one specific type.
In this chapter we will learn about lists and data frames which allow to store heterogenous data, e.g., integers and characters both in the same object. This chapter does not cover all details of lists, but we will learn all the important aspects we need in the next chapter when learning about data frames as data frames are based on lists (similar to matrices being based on vectors).
Quick reminder: Frequently used data types and how they can be distinguished by their dimensionality and whether they are homogeneous (all elements of the same type) vs. heterogeneous (elements can be of different types).
Dimension | Homogenous | Heterogenous |
---|---|---|
1 | Atomic vectors | Lists |
2 | Matrix | Data frame |
\(\ge 1\) | Array |
9.1 List introduction
As (atomic) vectors, lists are also sequences of elements. However, in contrast to (atomic) vectors, lists allow for heterogenity. Technically a list is a generic vector but often only called ‘list’. We will use the same term in this book (atomic vectors: vectors; generic vectors: lists).
Basis: Lists serve as the basis for most complex objects in R.
- Data frames: Lists of variables of the same length, often (but not necessarily) atomic vectors.
- Fitted regression models: Lists of different elements such as the model parameters, covariance matrix, residuals, but also more technical information such as the regression terms or certain matrix decompositions.
Difference: The difference between vectors and lists.
- Vector: All elements must have the same basic type.
- List: Different elements can have different types, including vectors, matrices, lists or more complex objects.
As lists are ‘generic vectors’, empty lists can also be created using the
vector()
function
(compare Creating vectors).
## [[1]]
## NULL
##
## [[2]]
## NULL
Elements of (unnamed) lists are indexed by [[...]]
, while we had [...]
for vectors.
The empty list above has two elements where both (first [[1]]
and second [[2]]
) both contain
NULL
.
9.2 Creating lists
We start right away with constructing a few simple lists for illustration.
While vectors can be created using c()
, lists are most often constructed
using the function list()
.
A vector: Vector of length 2.
## [1] 3 5
A list: List containing values of the same types/classes.
## [[1]]
## [1] 3
##
## [[2]]
## [1] 5
This list with two elements ([[1]]
, [[2]]
) contains two vectors, each
of length one, indicated by [1]
(first vector element).
Another list: Store objects of different types/classes into a list.
## [[1]]
## [1] 3
##
## [[2]]
## [1] "five" "six"
##
## [[3]]
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 6
The last example shows a list of length 3 which contains a numeric vector of length \(1\)
(first element; [[1]]
), a character vector of length \(2\) (second element; [[2]]
), and a
matrix of dimension \(2 \times 3\) (third element; [[3]]
).
The two functions c()
and list()
work very similarly, except that c(a, b)
where a
and b
are vectors performs coercion to convert all values into one
specific type
(see Vectors: Coercion).
9.3 Recursive structure
A more practical example: We have some information about Peter Falk, a famous actor who played “Lieutenant Columbo” in the long-running TV series Columbo between 1968–2003.
We would like to store all the information in one single object. As we have to deal with both characters (name; month of birth) and numeric values (year and day of birth), we need an object which allows for heterogenity – a list.
(person <- list(name = c(given = "Peter", family = "Falk"),
date_of_birth = list(year = 1927, month = "September", day = 16)))
## $name
## given family
## "Peter" "Falk"
##
## $date_of_birth
## $date_of_birth$year
## [1] 1927
##
## $date_of_birth$month
## [1] "September"
##
## $date_of_birth$day
## [1] 16
For named lists, the representation (print) changes again compared to unnamed
lists. The elements are now indicated by e.g., $name
or $date_of_birth$year
which we can use to access specific elements. We will come back later when
Subsetting lists.
Our new object person
is a list which contains a named vector ($name
) and a
second element ($date_of_birth
) which itself contains another named list.
This is called a recursive structure, a list can contain lists (can contain
lists (can …)). The figure below shows the structure of the object person
with the two lists in green, and the different (integer/character) vectors in
blue.
A nice way to get an overview of potentially complex (list-)objects is by using
the function str()
(structure) which we have already seen in the vectors
chapter. str()
returns us a text-representation similar to the image shown above.
## List of 2
## $ name : Named chr [1:2] "Peter" "Falk"
## ..- attr(*, "names")= chr [1:2] "given" "family"
## $ date_of_birth:List of 3
## ..$ year : num 1927
## ..$ month: chr "September"
## ..$ day : num 16
How to read: the (first-level) list has two elements, one called name
, one
date_of_birth
. The name
element itself contains a named character of length
\(2\), the second element date_of_birth
is again a list with \(3\) named
elements (year
, month
, day
) containing unnamed (plain) vectors of length
\(1\) (numeric, character, numeric). The indent shows the recursive structure,
the more to the right of the $
, the deeper a specific entry in the list.
9.4 List attributes
As all other objects lists always have a specific type and length (object properties). Besides these mandatory properties the default attributes for lists are:
- Class: Objects of class
"list"
. - Names: Can have names (optional; just like vectors).
Let us investigate the person
object from above:
## [1] "list"
## [1] 2
## [1] "list"
## [1] "name" "date_of_birth"
The is.*()
function family can be used to check the object.
c("is.list" = is.list(person), "is.vector" = is.vector(person),
"is.character" = is.character(person), "is.logical" = is.logical(person),
"is.numeric" = is.numeric(person), "is.integer" = is.integer(person))
## is.list is.vector is.character is.logical is.numeric is.integer
## TRUE TRUE FALSE FALSE FALSE FALSE
is.list()
: Always returnsTRUE
for lists.is.vector()
: As lists are generic vectors they also count as vectors, as long as they only have class, length, type and names (optional).- A list is never numeric, integer, character, or logical, no matter what the list contains.
9.5 Subsetting lists
Subsetting on lists works slightly different than on vectors and matrices. The basic concepts stay the same, however, due to the more complex structure of the object, additional operators for subsetting become available.
Operator | Return | Description |
---|---|---|
[i] |
a list | Select sub-list containing one or more elements. The index vector i can be integer (possibly negative), character, or logical. |
[[i]] |
content | Select the content of a single list element if i is a single (positive) integer or a single character. |
$name |
content | Select the content of a single list element using the name of the element (without quotes). |
[[j]] |
content | If j is a vector, nested/recursive subsetting takes place. |
Note that there is a distinct difference between single brackets ([...]
) and
double brackets ([[...]]
). The first always returns a list which is a subset
of the original object to be subsetted, while the latter returns the content of
these elements.
Using [i]
: The method is similar to vector subsetting except that the result will
not be the content of the elements specified, but a sub-list. Let us call persons[1]
and see what we get:
## $name
## given family
## "Peter" "Falk"
## [1] "list"
## [1] "name"
The result of person[1]
is again a list, but only contains the first entry of the original object person
.
In the same way we can use person["name"]
or person[c(TRUE, FALSE)]
. This also works with vectors,
e.g., extracting elements 2:1
(both but reverse order):
## List of 2
## $ date_of_birth:List of 3
## ..$ year : num 1927
## ..$ month: chr "September"
## ..$ day : num 16
## $ name : Named chr [1:2] "Peter" "Falk"
## ..- attr(*, "names")= chr [1:2] "given" "family"
A negative index (person[-1]
) can be used to get all but the first element
(does not work with characters). The result is again a sub-list (like for positive indices).
Using [[i]]
, single value: This subsetting type is most comparable to vector subsetting.
Instead of a sub-list, we will get the content of the element, in this case a named
character vector of length \(2\).
## Named chr [1:2] "Peter" "Falk"
## - attr(*, "names")= chr [1:2] "given" "family"
The same can be achieved using person[["name"]]
on named lists. Note that subsetting
with negative indices (person[[-1]]
) does not work in combination with double brackets.
Using the $
operator: Most commonly used when working with named lists is
the $
operator. This operator is called the dollar operator. Instead of
calling person[["name"]]
we can also call person$name
as long as the name
does not contain blanks or special characters. Note that there is no blank
before/after the $
operator (not as shown in the output of str()
).
## given family
## "Peter" "Falk"
## [1] "character"
Multiple $
operators can also be combined. If we are interested in the month
of birth (month
) which is stored within date_of_birth
we can use:
## [1] "September"
How to read:
- Right to left: Return
month
fromdate_of_birth
of the objectperson
. - Left to right: Inside object
person
access the elementdate_of_birth
, insidedate_of_birth
access elementmonth
.
Using [[j]]
with vectors: Take care, something unexpected happens. As an
example, let us call person[[c(1, 2)]]
. We could think this returns us
person[[1]]
and person[[2]]
, but that’s not the case. Instead, nested or recursive
subsetting is performed. The two indices (c(1, 2)
) are used as indices for
different depths of the recursive list.
## [1] "Falk"
What happened: The first element of the vector (here 1
) is used for the top-level list. Our
first entry the vector name
. The second element of the vector (2
) is then used to extract
the second element of whatever name
contains. It is the same as:
## [1] "Falk"
Note: We will not use this often, but keep it in mind if you run into
interesting results when subsetting lists or data frames. The same happens if
you use a vector, e.g., person[[c("name", "last_name")]]
or
person[[c("date_of_birth", "month")]]
.
Exercise 9.1 Practicing subsetting on lists: The following object demo
is a list with information
about two persons, Frank and Petra (simply copy&paste it into your R session).
demo <- list(
"Petra" = list(location = "Birmingham", kids = NULL, job = "Programmer"),
"Frank" = list(location = "Kufstein", kids = c("Peter", "Paul"))
)
str(demo)
## List of 2
## $ Petra:List of 3
## ..$ location: chr "Birmingham"
## ..$ kids : NULL
## ..$ job : chr "Programmer"
## $ Frank:List of 2
## ..$ location: chr "Kufstein"
## ..$ kids : chr [1:2] "Peter" "Paul"
- How do we get Franks location?
- Try
demo["Frank"]$location
(will returnNULL
). Why doesn’t this work? - How many kids does Frank have (use code to answer)?
- Our friend Petra moves from Birmingham to Vienna. Change her
location (inside
demo
) to Vienna.
Solution. Franks location: demo
is a list with two elements, one called "Frank"
.
Thus, we can access the first list element using demo$Frank
. This returns
the content of this list element which itself is, again, a list. In there,
we have the location
we are looking for.
Thus, one option to get Franks location is to use:
## [1] "Kufstein"
Alternatively, we could use brackets and subsetting by name. Warning: we need double brackets to access the content.
## [1] "Kufstein"
Try demo["Frank"]$location
: This will not work. The reason is that we only
use single brackets!
## NULL
demo["Frank"]
returns a sub-list of demo
which now only contains "Frank"
(no longer "Petra"
). However, we do not get the content or information for Frank.
Let see:
## $Frank
## $Frank$location
## [1] "Kufstein"
##
## $Frank$kids
## [1] "Peter" "Paul"
## $Frank
## $Frank$location
## [1] "Kufstein"
##
## $Frank$kids
## [1] "Peter" "Paul"
The last line looks werid but always only just extract "Frank"
from itself but
does not access the list element for "Frank"
. Thus, when we try to access
location
(which does not exist on this level) we get a NULL
in return.
## [1] "Frank"
## NULL
How many kids does Frank have? To answer this question, we need to find out
how long the kids
vector is. Again, we can access this specific element
in different ways, the most easy one:
## [1] "Peter" "Paul"
## [1] 2
Petra moves to Vienna: You can use any subsetting technique and assign a new value.
I will stick to the $
operator and do the following:
## List of 2
## $ Petra:List of 3
## ..$ location: chr "Vienna"
## ..$ kids : NULL
## ..$ job : chr "Programmer"
## $ Frank:List of 2
## ..$ location: chr "Kufstein"
## ..$ kids : chr [1:2] "Peter" "Paul"
9.6 Replacing/deleting elements
Replacement functions are available for all subset types above which can be used to overwrite elements in an existing list or add new elements to a list.
As an example, let us replace the element $name
in the person
object.
We use subsetting with the $
operator and assign (store) a new object.
As person$name
exists, it will be replaced.
person$name <- c(given_name = "Max", middle_name = "Maximilian", family_name = "Mustermann")
str(person)
## List of 2
## $ name : Named chr [1:3] "Max" "Maximilian" "Mustermann"
## ..- attr(*, "names")= chr [1:3] "given_name" "middle_name" "family_name"
## $ date_of_birth:List of 3
## ..$ year : num 1927
## ..$ month: chr "September"
## ..$ day : num 16
Or replace the month in the date of birth with an integer 9L
instead of "September"
:
## List of 2
## $ name : Named chr [1:3] "Max" "Maximilian" "Mustermann"
## ..- attr(*, "names")= chr [1:3] "given_name" "middle_name" "family_name"
## $ date_of_birth:List of 3
## ..$ year : num 1927
## ..$ month: int 9
## ..$ day : num 16
The same way, new elements can be added. If the element we assign an object to
does not yet exist, it will be added to the original list object. Let us add a
job
element containing "Actor"
:
## List of 3
## $ name : Named chr [1:3] "Max" "Maximilian" "Mustermann"
## ..- attr(*, "names")= chr [1:3] "given_name" "middle_name" "family_name"
## $ date_of_birth:List of 3
## ..$ year : num 1927
## ..$ month: int 9
## ..$ day : num 16
## $ job : chr "Actor"
Delete elements: To delete an element, we simply have to replace it with a NULL
object. An example using a very simple list:
## $a
## [1] "first"
##
## $b
## [1] "second"
##
## $c
## NULL
## $b
## [1] "second"
##
## $c
## NULL
As you can see it is possible that a list element can contain NULL
(see element c
)
but if assigned (x$a <- NULL
) R will remove the element completely (not storing NULL
on it).
Exercise 9.2 Practicing replacement: As in the previous exercise we will use the following list to work with. The list contains information about Petra and Frank.
demo <- list(
"Petra" = list(location = "Birmingham", kids = NULL, job = "Programmer"),
"Frank" = list(location = "Kufstein", kids = c("Peter", "Paul"))
)
We need to update this list and add or change some of the elements.
- Petra moves from Birmingham to Vienna. Update her location.
- Frank just got a newborn baby called
"Malena"
. Add her name to thekids
vector. - Add a third person called ‘Regina’, located in ‘Sydney’. She has one child called ‘Lea’ and works as a ‘Teacher’.
Solution. Petra moves to Vienna: You can use any subsetting methods we have just seen.
In this solution we will stick to the $
operator. All we have to do is to
access Petras location, and assign a new value.
Frank got a third child: Here, we could do the same as for Petra and simply assign
a new vector with all three kids to Frank (demo$Frank$kids <- c("Peter", "Paul", "Malena")
).
However, this is not super nice (hard-coded).
Instead we use c()
and combine the vector containing the first two kids with the new born,
and store the vector (combination of subsetting and replacement) as follows:
Adding Regina: Regina is not yet in our list, however, we can assign new elements
the same way we replace elements. If the element exists it will be overwritten. If it
does not exist, it will be added. In this case we want to add a new list
to the existing object demo
called $Regina
:
## List of 3
## $ Petra :List of 3
## ..$ location: chr "Vienna"
## ..$ kids : NULL
## ..$ job : chr "Programmer"
## $ Frank :List of 2
## ..$ location: chr "Kufstein"
## ..$ kids : chr [1:3] "Peter" "Paul" "Malena"
## $ Regina:List of 3
## ..$ location: chr "Sydney"
## ..$ kids : chr "Lea"
## ..$ job : chr "Teacher"
9.7 Combining lists
Multiple lists can be combined using either c()
or list()
.
c(<list 1>, <list 2>)
: Creates a new list by combining all elements from the two lists. The result is a list with a length of the number of elements from<list 1>
and<list 2>
combined.list(<list 1>, <list 2>)
: Creates a new list of length 2, where the first element contains<list 1>
, the second<list 2>
.
Example: Using two lists list1
and list2
(both of length 2).
list1 <- list(3, 4)
list2 <- list(100, 200)
c("length of list 1" = length(list1), "length of list 2" = length(list2))
## length of list 1 length of list 2
## 2 2
## [1] 4
## List of 4
## $ : num 3
## $ : num 4
## $ : num 100
## $ : num 200
## [1] 2
## List of 2
## $ list_one:List of 2
## ..$ : num 3
## ..$ : num 4
## $ list_two:List of 2
## ..$ : num 100
## ..$ : num 200
The latter results in a recursive list where each element from the list
res2
itself contains a list with two elements. As shown, we can also name the
elements (similar to cbind()
/rbind()
when creating matrices;
Matrices: Combining objects).
Note that the naming of the list-elements has a different effect when using
c()
(try c(list_one = list1, list_two = list2)
).
9.8 Summary
Just as a brief summary to recap the new content:
- Creating lists: Using the function
list()
. - Name attribute: Lists can be named or unnamed.
- Heterogenity: Allows to store objects of different types.
- Replacement: Subsetting can be used to replace existing elements, add new elements, or
delete elemets (assigning
NULL
). - Recursive lists: Lists can be recursive (lists containing lists containing lists …).
An overview of the different subsetting methods we have learned for different objects (vectors, matrices, and lists).
Subset | By index | By name | Logical | |
---|---|---|---|---|
Vectors | Element | x[1] |
x["name"] |
[possible] |
Matrices | Element | x[1, 1] or x[1] |
x["Row 1", "Col A"] |
[possible] |
Row | x[1, ] |
x["Row 1", ] |
[possible] | |
Column | x[, 1] |
x[, "Col A"] |
[possible] | |
Lists | List | x[1] |
x["name"] |
[possible] |
Element | x[[1]] |
x[["name"]] or x$name |
[not possible] | |
Element (recursive) | x[[c(1, 2)]] |
x[[c("name1", "name2")]] |
[not possible] |
Most subsetting methods also work with vectors (vectors of indices or names). You will see that we can re-use most of this when working with data frames, our next topic.