How to filter several variables with the same condition in R


9 July, 2021

Here is an imaginary dataset of books, with binary variables describing different possible genres:



books <- data.frame(
    id = 1:10,
    is_fantasy = rbinom(10, 1, .5),
    is_scifi = rbinom(10, 1, .5),
    is_classic = rbinom(10, 1, .5),
    is_adventure = rbinom(10, 1, .5)


Say we want to filter on books that are classic, fantasy adventure novels. We could do this:

books %>%
        is_fantasy == 1,
        is_classic == 1,
        is_adventure == 1

And this is totally fine, except that we're repeating ourselves a little bit—we're writing the filter condition three times. This is no big deal, but starts to become more annoying (and error prone) when you're filtering a large number of variables, and/or with more complicated filtering conditions. You could find yourself copying and pasting the filtering condition a bunch of times, which isn't ideal.

The across function from dplyr is, in my view, one of the most useful functions in R. I find myself using it most frequently in combination with mutate and filter, to apply a function across several variables. I encourage you to check out the vignette for it to get an idea of how powerful it is when used in dplyr workflows; but I promised that this post would be brief, so here's how you can use it to filter several variables with the same logical condition:

books %>%
    filter(across(c(is_fantasy, is_classic, is_adventure), ~ . == 1))

This is the same filtering logic as above, just a little bit cleaner and more extensible. Some notes:

Let's just write this out with the arguments explicitly provided and a pedantic level of indentation, just to be super clear about how it all fits together:

books %>%
            .cols = c(is_fantasy, is_classic, is_adventure),
            .fns = ~ . == 1

That's it—I'll probably write some more about dplyr in the near future.


Leave a comment