How to filter several variables with the same condition in R
9 July, 2021
Here is an imaginary dataset of books, with binary variables describing different possible genres:
library(dplyr) set.seed(1) books <- data.frame( id = 1:10, is_fantasy = rbinom(10, 1, .5), is_scifi = rbinom(10, 1, .5), is_classic = rbinom(10, 1, .5), is_adventure = rbinom(10, 1, .5) ) books
Say we want to filter on books that are classic, fantasy adventure novels. We could do this:
books %>% filter( is_fantasy == 1, is_classic == 1, is_adventure == 1 )
And this is totally fine, except that we're repeating ourselves a little bit—we're writing the filter condition three times. This is no big deal, but starts to become more annoying (and error prone) when you're filtering a large number of variables, and/or with more complicated filtering conditions. You could find yourself copying and pasting the filtering condition a bunch of times, which isn't ideal.
across function from dplyr is, in my view, one of the most useful functions in R. I find myself using it most frequently in combination with
filter, to apply a function across several variables. I encourage you to check out the vignette for it to get an idea of how powerful it is when used in dplyr workflows; but I promised that this post would be brief, so here's how you can use it to filter several variables with the same logical condition:
books %>% filter(across(c(is_fantasy, is_classic, is_adventure), ~ . == 1))
This is the same filtering logic as above, just a little bit cleaner and more extensible. Some notes:
- The first argument to
acrossis a tidy-select vector of column names.
- The second argument is where you specify which function you want to apply to the columns provided in the first argument. There are a few different ways to do this, but I most frequently use the method shown above: after using a tilde, you can then access the values of each variable with a period. So, for the
is_classicvariable, the syntax
~ . == 1is equivalent to
is_classic == 1.
Let's just write this out with the arguments explicitly provided and a pedantic level of indentation, just to be super clear about how it all fits together:
books %>% filter( across( .cols = c(is_fantasy, is_classic, is_adventure), .fns = ~ . == 1 ) )
That's it—I'll probably write some more about dplyr in the near future.