Chapter 7 Conditional Execution

So far we have only written very simple functions and scripts which always did the very same and were very static.

To allow the code to be more dynamic, we need to add additional ‘control flow’ structures. Control flow is typically based on logical conditions and can make code more versatile and dynamic. Control structures are no functions but allow the code to make certain decisions (conditional execution) or repeat something multiple times/as often as required (next chapter; repetitive execution).

You may all be familiar to flow charts which are a way to graphically illustrate flow control. The following one shows both, conditional execution (e.g., the first decision “Do you have homework?”) and repetitive execution (the loop at the bottom right).

Source: GraphJam.com (offline).

Figure 7.1: Source: GraphJam.com (offline).

There are three ‘different’ conditional statements we will go trough in this chapter.

If statements: single expression.

  • If you are hungry: Eat something.
  • If the alarm clock rings: Get up.

If-else statements: single expression with additional ‘else’.

  • If the traffic light is green: Walk. Else: Stop.
  • If the coffee cup is empty: Refill. Else: Drink.

Multiple if statatements: multiple expressions.

  • If warm and dry outside: Wear a t-shirt. Else if warm and rainy: T-shirt and a jacket. Else if cold and dry: Sweater. Else: Stay home.

7.1 Logical expressions

The basis of all these decisions are logical expressions using relational and logical operators (as well as value matching). We have already seen the different operators available in R in the chapter Vectors: Mathematical operations. If you need a refresh on this topic, please go trough the corresponding section again or check ?Comparison, ?logic, and ?match. Else proceed (a great example for conditional execution! :)).

Logical expressions always return logical values, TRUE and/or FALSE. In case our logical expression (a condition) evaluates to TRUE, the corresponding part of our program/code should be executed, else the program should ignore this specific part and proceed.

Just as a brief recap:

  • Relational operators: <, >, <=, >=, ==, !=.
  • Logical operators: !, &, xor(), |, &&, ||.
  • Value matching: %in%, ! ... %in% (character operations).

Some examples:

  • Relational operators (x > y).
  • Logical operators (x & y).
  • Value matching ("Marc" %in% names).
  • all(), any(), all.equal().
  • Combinations of them (e.g., all(y < 0) | "Marc" %in% names).

Note: Logical expressions can also evaluate to NA (try NA > 3) which will result in an error when used for conditional execution. Should be avoided when designing our code.

Short and long form

As shown above, there is &/| and &&||| for logical operators. While they are related, they work slightly different.

  • Short form: & and | perform an element-wise comparison. Returns a logical vector of the same length as our original objects.
  • Long form: && and || evaluate only the first element, left to right. Proceed only if the result is determined. Else, the rest of the comparison will be ignored. Always only return one single TRUE or FALSE.

Some examples of logical expressions/comparison:

## [1]  TRUE FALSE FALSE

The logical and (&) compares if the elements are TRUE in both vectors, element by element. Therefore, we get a TRUE for the first element (both l1[1] and l2[1] are TRUE) and FALSE for element two and three as there is always at least one FALSE in the two vectors.

## [1] TRUE TRUE TRUE

The logical or (|) works similarly but evaluates if at at least l1 or l2 contain one TRUE. At least means it also evaluates to TRUE if both (all) are TRUE.

## [1] FALSE  TRUE  TRUE

Last but not least we have the exclusive or (xor()). It evaluates if we have a TRUE in either l1 or l2, but not both (in contrast to |).

On the other hand we have the long form with && and ||. Let us see how they work using the same two vectors l1 and l2.

## [1] TRUE
## [1] TRUE

For both commands we get one single TRUE. The reason is as follows: The comparison starts from ‘left to right’, wherefore the first elements are compared first (l1[1], l2[1]). In case of && this is true (l1[1] & l2[1] is TRUE) – and at this point && is already satisfied and stops. The same for l1 || l2: For the first element this is TRUE (l1[1] | l2[1]) wherefore a TRUE is returned. In other words: They stop as soon as they get the first TRUE.

This can sometimes be handy when writing code, but will not be used often in this book.

Functions

In addition to these operators, a series of useful functions exist which can be used in conditional execution. We have already seen some of them in the previous chapters.

  • all(): Are all elements TRUE?
  • any(): Is at least one element TRUE?
  • all.equal(): Are two objects (nearly) equal?

They all return one single logical element (either TRUE or FALSE) or a single missing value (NA) and no longer a vector. Thus, they are used frequently in combination with conditional execution. Let us use the same two vectors from above once again.

## all(l1) all(l2) any(l1) any(l2) 
##   FALSE   FALSE    TRUE    TRUE

As both vectors contain at least one FALSE, all() returns a logical FALSE for both vectors (not all elements are TRUE). As both also at least contain one TRUE, any() returns TRUE for both.

all.equal() is used for numeric values. As we have seen in the Vectors chapter sometimes == does not work due to the precision of the arithmetic operations (remember that 1.9 - 0.9 == 1.0 evaluates to FALSE). Instead, we used all.equal() to check for near equality.

The function also works on vectors. As a motivational example: We know that the square root of x is the inverse of x to the power of 2. Thus, if we take the square root of x and take this result to the power of 2, we should get the (nearly) same values again. Else there is something wrong with R.

## [1] TRUE

In case we check for exact equality we would get a FALSE as some elements seem to differ just so slightly (precision).

## [1] FALSE

The reason: not all elements are exactly the same. Let us try to find out which ones differ, and by how much they differ.

## [1] FALSE  TRUE  TRUE FALSE
## [1]  0.000000e+00  4.440892e-16 -4.440892e-16  0.000000e+00

The differences are basically zero; 4.44e-16 is \(4.44 \cdot 10^{-16}\), a very tiny difference which can be ignored in a wide range of applications (not always).

isTRUE() and isFALSE()

Two more functions which are very useful: isTRUE() and isFALSE().

  • Check if an object is a single TRUE or FALSE.
  • Always yields logical TRUE or FALSE (NA is not an option), returns one single logical value.

Some examples:

##  Checking TRUE Checking FALSE    Checking NA 
##           TRUE          FALSE          FALSE

This can be very convenient in some situations where we rely on logical values, such as for conditional execution. The problem of all() (as well as all.equal() or any()) may return NA as soon as a missing value is involved. In this case our script may run into unexpected problems.

However, we can use all() in combination with isTRUE() or isFALSE() to be sure we do have a logical TRUE or FALSE. Imagine the following vector x with one missing value. We need to know if all values are larger than 0. In this case, all() returns NA.

## [1] TRUE TRUE   NA
## [1] NA

We could, in addition, use isTRUE() which tells us that all() did not return TRUE (but something else, no matter what).

## [1] FALSE

A second example where isTRUE() is used: Above we checked if all.equal(sqrt(x)^2, x) is TRUE. What happens if they are not all nearly equal? To demonstrate this, we replace sqrt(x)^2 by sqrt(x) which – obviously – should not be the same as x.

## [1] "Mean relative difference: 2.929575"

The problem: We might have expected that this returns FALSE, instead all.equal() returns us a character string with the mean difference between the two vectors. This would also case an error when using for flow control. However, if we use isTRUE() we would get the result as needed.

## [1] FALSE
## [1] TRUE

If the return of all.equal() is a character string or anything different from TRUE, the function isTRUE() will return a logical FALSE.

7.2 If statements

We can make use of the logical expression above to control the flow of the execution of our scripts/program by using it as our condition. Let us start with the most basic version: a single if statement.

Basic usage:

  • Structure: if (<condition>) { <action> }
  • The condition has to be a single logical TRUE or FALSE.
  • If <condition> is evaluated to TRUE, the <action> is executed.

For example we want R to print a character string, that tells us that x is smaller than 10 if (and only if) this is true. Can you identify the condition and the action?

## [1] "x is smaller than 10"

The condition is always within round brackets (x < 10), the action is everything between the curly brackets ({ ... }). In case x smaller than 10, the character string will be printed (as in this example).

Well, if x is smaller than 10 it cannot be larger or equal to 10, right? To check that, we could adjust our if condition and execute the following:

Nothing is printed in this case as our condition (logical expression) evaluates to FALSE in this example and the action is not executed at all.

Alternative declarations

As functions (see Functions: Alternative declarations), if-conditions can be written in different ways with or without brackets. As for functions, this is only true if the action consists of one single command, else the brackets are required. All the following examples are equivalent:

Version 1: The preferred one.

Version 2: One-liner.

Version 3: Without brackets.

7.3 If-else statements

The next extension of if-statements are if-else statements. In contrast to if-statements they have an additional else clause which is executed whenever the (if-)condition is evaluated to FALSE.

Basic usage:

  • Structure: if (<condition>) { <action 1> } else { <action 2> }.
  • If <condition> is evaluated to TRUE, <action 1> is executed. Else <action 2> is executed.

Let us take the same example as above where we check if a certain number is smaller than 10.

## [1] "x is smaller than 10"

Alternative declarations

Again, if-else statements can be written in different ways. The three versions below are equivalent an all do the very same thing.

Version 1: The preferred one.

## [1] "x is smaller than 10"

Version 2: One-liner.

## [1] "x is smaller than 10"

Version 3: Without brackets.

## [1] "x is smaller than 10"

These one-line forms are sometimes useful to make the code a bit more compact. However, they are harder to read, thus we recommend to use the first version (multiple lines, with curly brackets).

7.4 Nested conditions

If-else statements can also be nested. Nested means that one of the actions itself contains another if-else statement. Important: the two if-else statements are independent.

An example:

## [1] "x is exactly 10"

The procedure is the same as for a single if-else statement. The most outer is evaluated first. In case we end up in the else block of the ‘Outer’ if-else statement, we need to evaluate the second – ‘Inner’ – if-else condition.

  • Outer if-else statement: Is x < 10? FALSE: execute action in the else-block of the outer if-else statement.
  • Inner if-else statement: Is x > 10? FALSE: execute action in the else-block of the inner if-else statement, print "x is exactely 10").

7.5 Multiple if-else statements

Instead of nested (independent) if-conditions we can once again extend the concept by adding multiple if-else conditions in one statement. The difference to nested conditions is that this is one single large statement, not several smaller independent ones.

Basic usage:

  • Structure: if (<condition 1>) { <action 1> } else if (<condition 2>) { <action 2> } else { <action 3> }.
  • If <condition 1> evaluates to TRUE, <action 1> is executed.
  • Else <condition 2> is evaluated. If TRUE, <action 2> is executed.
  • Else, <action 3> is executed (if both, <condition 1> and <condition 2>, evaluate to FALSE).

Required parts:

  • Not not limited to only 2 conditions.
  • Always needs one (and no more than one) if.
  • Can have 1 or more else ifs.
  • Can have no or 1 else-block (optional).

We can achieve the same result as above (Nested conditions) by writing the following statement.

This achieves the same result as the following statement which has no else-block.

Else or no else: This strongly depends on the task. One advantage of else-block is that it captures all cases which are not considered by one of the conditions above. Thus, the else-block is something like the “fallback case”. In some other scenarios you only want to execute something if a strict condition is TRUE or do nothing. In such cases an else-block is not necessary.

Exercise 7.1 We have a variable x with one single numeric values and two different if-statements with the following conditions:

  • Version 1: if (x < 10), else if (x > 10), and else.
  • Version 2: if (x < 10), else if (x > 10), else if (x == 10) (no else).

Two questions to think about:

  • What if x is set to NA (x <- NA)? Will we end up in the else-block of statement ‘Version 1’ and get the "x is exactely 10" (which is wrong)?
  • Would it be better to use ‘Version 2’ without an else-block?

Solution. The answer is no to both questions.

One could think that the NA ends up in the else-block, but that is not true. As we have learned above, conditions must always evaluate to a logical TRUE or FALSE. NA < 10 results in an NA and R will throw an error when trying to evaluate the first condition (if (x < 10)). Thus, we will not unexpectedly end up in the else-block.

To answer the second question: There is no benefit of the second version. We check if x < 10 and x > 10. The only option left is x == 10, thus, in this case both are fail-save and do the very same.

In some situations it is not the case that there is only one option left, and you need to think if you want to have an else-block which captures everything else (or whatever your forgot), or if you want to add another explicit if-clause for specific cases.

Exercise 7.2 Below you can find a code chunk with a series of conditions. Try to read the code and think about what is going on without actually executing the code!

Possible answers

  1. Does not work, an error occurs.
  2. "x is larger than 10." will be printed.
  3. "x is smaller or equal to 10." will be printed.
  4. Nothing will be printed.

Solution. The correct answer is (3) "x is larger than 10." is printed.

  1. Initializing x <- 4.
  2. x < 10 is TRUE, wherefore x will be re-declared and set to x <- 100.
  3. x is now 100 and thus x > 10 is TRUE: "x is larger than 10." is printed.

7.6 Return values

If-statements are not functions, but they still have a return value. By default, this return is not visible, but we can make use of it. The last time we are using the same example/statement, except that we do not print the character string, but return it and store the result (return value) on desc.

## [1] "x is larger than 10"

To break it down: The second condition evaluates to TRUE. If we remove everything related to the if-else statement which is (i) unused or (ii) only used for the statement itself, we basically end up with this:

Note: This only works for checks where the condition evaluates to a single TRUE or FALSE. We can not use these if and if-else statements element-wise on a vector. To do so, we need to use the vectorized if.

7.7 Vectorized if

There is a special function which allows perform an if-else statement element by element for each element of a vector (or matrix).

Function: ifelse() for conditional element selection.

  • Arguments: ifelse(test, yes, no), where all arguments can be vectors of the same length (recycled if necessary). Works with matrices as well.
  • Return: Vector which contains yes elements if test is TRUE, else no elements.
  • Note: All elements of yes and no are always evaluated.

Practical example: we would like to find out if an numeric value is odd (ungerade) or even (gerade). This can be done using the modulo operator (see Vectors: Mathematical operations).

If the numeric value is divisible by 2 with rest 0, it is an even number, else odd. Two examples: 4 %% 2 returns 0 as \(2 \cdot 2 = 4\), rest \(0\), thus \(4\) must be an even number. 5 %% 2 returns 1 as \(2 \cdot 2 = 4\), rest \(1\), thus \(5\) must be odd.

## [1] 1 2 3 4 5 6
## [1] FALSE  TRUE FALSE  TRUE FALSE  TRUE

x %% 2 == 0 is our test-condition. We now want to return "even" if this is TRUE for a specific element in x, and "odd" if not. This can be done as follows:

## [1] "odd"  "even" "odd"  "even" "odd"  "even"

Another example: We will again test if a number is odd or even. If even, return x, else return -x. In this case the two arguments ‘no’/‘yes’ to ifelse() are vectors. Thus, all odd numbers should now be negative odd numbers, all even numbers should stay positive.

## [1] -1  2 -3  4 -5  6

Here all three arguments are vectors; x %% 2 == 0 is a logical vector of length 6, x and -x are two numeric vectors of the same length. If the test-condition evaluates to TRUE for a specific element the corresponding value from x is returned, else from -x.

Matrices: ifelse() also works with matrices. In case the input for the condition/test is a matrix, a matrix of the same size will be returned. The vectorized if works on the underlying vector, but adds the matrix attributes again at the end.

##       Col A Col B Col C Col D
## Row 1     1     4     7    10
## Row 2     2     5     8    11
## Row 3     3     6     9    12
##       Col A Col B Col C Col D
## Row 1    -1     4    -7    10
## Row 2     2    -5     8   -11
## Row 3    -3     6    -9    12

Exercise 7.3 Exercise A: Start with a vector y <- 1:10. If the element in y is odd (ungerade), add + 1. If even (gerade), leave it as it is. The result should look as follows:

##  [1]  2  2  4  4  6  6  8  8 10 10

Exercise B: We will use some random numbers. For reproducibility we set a seed first:

y now contains 10 numeric values. Use ifelse() to replace all negative values with "neg" and all others with "pos". If you seed is set correctly the result should be:

##  [1] "neg" "neg" "pos" "pos" "pos" "pos" "pos" "neg" "neg" "neg"

Exercise C Working with two matrices, requires a logical and or or for the condition (test).

##      [,1] [,2]
## [1,]   10   -3
## [2,]    0   15
##      [,1] [,2]
## [1,]    3   -4
## [2,]   -1   17

Use ifelse(), return NA when both, the element in mat1 and in mat2, are negative. Else return 0.

##      [,1] [,2]
## [1,]    0   NA
## [2,]    0    0

Solution. Solution for exercise A

If y %% 2 == 0 (even) return the elements from y as they are, if not (odd) use y + 1.

##  [1]  2  2  4  4  6  6  8  8 10 10

We could of course also modify our test and ask for odd numbers (instead of even numbers). In this case we would also have to exchange the values for yes and no to get the correct result:

##  [1]  2  2  4  4  6  6  8  8 10 10

Solution for exercise B

If y < 0 replace the element with "neg", else with "pos".

##  [1] "neg" "neg" "pos" "pos" "pos" "pos" "pos" "neg" "neg" "neg"

As "neg" and "pos" are character vectors of length 1, while y is of length 10, they will simply be recycled. Does the very same as the following line of code where we replicate "neg" and "pos" 10 times.

##  [1] "neg" "neg" "pos" "pos" "pos" "pos" "pos" "neg" "neg" "neg"

Solution for exercise C

The condition is mat1 < 0 & mat2 < 0 which is TRUE when the elements in both values are below zero. If TRUE, return an NA, else 0.

##      [,1] [,2]
## [1,]    0   NA
## [2,]    0    0

7.8 Sanity checks

A typical application for single if-statements are input checks of a function or before proceeding to the computations and are often used in combination with stop() or warning().

  • stop(): Will show an error warning and immediately stop execution.
  • warning(): Issues a warning, but the program will still be executed.

When used inside functions, this is called a sanity check. Sanity checks should be at the very beginning of the instructions of a function and check if the arguments are sane, or if the function should throw an error because the inputs are wrong.

Let us combine functions and if-statements to write a small example. We would like to have a function which calculates the square root of one single numeric value, a mathematical operation which is invalid for negative numbers.

Long form

Our function has one input argument x. Before we start the calculation, we check if the input argument x is valid. We will check for the three conditions below – if one is violated, we will stop execution and throw an error. Else we return the square root of argument x.

  • x must be numeric.
  • x must be of length 1.
  • x must be positive.

The function we are looking for looks as follows:

If we call the function with a valid argument, the function should return the desired result. Else one of the above error messages should show up – and the execution is immediately stopped.

## [1] 3
## [1] 5
## Error in custom_sqrt("17"): Argument 'x' must be numeric!
## Error in custom_sqrt(vector("numeric", 0)): Argument 'x' must be of length 1!
## Error in custom_sqrt(c(1, 2, 3)): Argument 'x' must be of length 1!
## Error in custom_sqrt(-3): Argument 'x' must be positive (>= 0)!

The sqrt() function implemented in base R does something similar, except that it also works for vectors with length \(> 1\) and only warns us if we apply it to negative values (returns NaN; see Missing values). However, when the input argument is a character we will get an error similar to our custom function.

## [1] 1.000000 1.414214 1.732051
## Warning in sqrt(c(-3, -1, 1, 3)): NaNs produced
## [1]      NaN      NaN 1.000000 1.732051
## Error in sqrt("foo"): non-numeric argument to mathematical function

Short form

Instead of using three different checks in custom_sqrt() and three different error messages we could also combine everything in one single check using a logical | (or ||). Let us re-declare the function:

  • Advantage: The function requires less typing and looks simpler.
  • Disadvantage: When an error is thrown, you will always get the same error message ("wrong input"), but you don’t really get any information what went wrong.

Some error messages are very easy to interpret and you immediately know what went wrong, while others look more sarcastic then helpful (see below). Having precise error messages can sometimes save hours of debugging and searching for the actual problem. Thus, rather write multiple separate checks than one large one with a super general error message.

An example out of the [Microsoft Windows documentation](https://docs.microsoft.com/en-us/windows/win32/uxguide/mess-error) on how error messages should not look like (but how we all know them).

Figure 7.2: An example out of the Microsoft Windows documentation on how error messages should not look like (but how we all know them).

Additional functions

There are few additional functions which might be of interest in combination with sanity checks. You will not need them to solve the exercises in this book, but these functions are very handy to simplify sanity checks.

Command Description
stopifnot() Throws an error if not evaluated to TRUE.
inherits() Check of object contains a specific class.
match.arg() Check if an input is allowed.
file.exists() Check if a file exists.
dir.exists() Check if a directory exists.

The examples below show some minimal examples how these commands work.

Stop if not: stopifnot() is a short version of if (!...) stop("error message") and throws an automatically generated error message if the logical expression evaluates to FALSE (stop if not TRUE; see ?stopifnot).

## Error in test_stopifnot("character"): is.numeric(x) is not TRUE

This could also be written as follows (with a custom error message).

## Error in test_stopifnot("character"): x must be numeric

Inherits: Argument x in the next function must either be a matrix or a character vector. inherits() checks if the return of class() of an object contains a specific class name.

## Error in test_inherits(2L): inherits(x, c("matrix", "character")) is not TRUE

In this case, 2L is of class "integer" (check class(2L)) wherefore the test fails and an error is thrown. The same could be achieved differently, e.g.:

## Error in test_inherits(2L): x must be a matrix or a character vector

Argument matching: Another smart way to only allow for special inputs is match.arg(). An argument can be defined with a series of allowed values. We can then check if the one provided by the user is actually among them.

## [1] "female"
## Error in match.arg(x, c("male", "female")): 'arg' should be one of "male", "female"

Check if file exists: When you write a function which reads from a file or operates on a directory, file.exists() and dir.exists() can be used to check if the file/directory really exists.

## Error in test_fileexists("my_dataset.rda"): file.exists(file) is not TRUE

The same works for dir.exists() to check if a directory exists. Checking a directory and using if-conditions, this could look similar to the next function:

## Error in test_direxists("Downloads"): Cannot find directory, does not exist.