How to filter several variables with the same condition in R
9 July, 2021
Here is an imaginary dataset of books, with binary variables describing different possible genres:
library(dplyr)
set.seed(1)
books <- data.frame(
id = 1:10,
is_fantasy = rbinom(10, 1, .5),
is_scifi = rbinom(10, 1, .5),
is_classic = rbinom(10, 1, .5),
is_adventure = rbinom(10, 1, .5)
)
books
Say we want to filter on books that are classic, fantasy adventure novels. We could do this:
books %>%
filter(
is_fantasy == 1,
is_classic == 1,
is_adventure == 1
)
And this is totally fine, except that we're repeating ourselves a little bit—we're writing the filter condition three times. This is no big deal, but starts to become more annoying (and error prone) when you're filtering a large number of variables, and/or with more complicated filtering conditions. You could find yourself copying and pasting the filtering condition a bunch of times, which isn't ideal.
The across
function from dplyr is, in my view, one of the most useful functions in R. I find myself using it most frequently in combination with mutate
and filter
, to apply a function across several variables. I encourage you to check out the vignette for it to get an idea of how powerful it is when used in dplyr workflows; but I promised that this post would be brief, so here's how you can use it to filter several variables with the same logical condition:
books %>%
filter(across(c(is_fantasy, is_classic, is_adventure), ~ . == 1))
This is the same filtering logic as above, just a little bit cleaner and more extensible. Some notes:
- The first argument to
across
is a tidy-select vector of column names. - The second argument is where you specify which function you want to apply to the columns provided in the first argument. There are a few different ways to do this, but I most frequently use the method shown above: after using a tilde, you can then access the values of each variable with a period. So, for the
is_classic
variable, the syntax~ . == 1
is equivalent tois_classic == 1
.
Let's just write this out with the arguments explicitly provided and a pedantic level of indentation, just to be super clear about how it all fits together:
books %>%
filter(
across(
.cols = c(is_fantasy, is_classic, is_adventure),
.fns = ~ . == 1
)
)
That's it—I'll probably write some more about dplyr in the near future.