The magrittr (to be pronounced with a sophisticated french accent) package has two aims: decrease development time and improve readability and maintainability of code. Or even shortr: make your code smokin’ (puff puff)!
To achieve its humble aims, magrittr (remember the accent)
provides a new “pipe”-like operator, %>%
, with which you
may pipe a value forward into an expression or function call; something
along the lines of x %>% f
, rather than
f(x)
. This is not an unknown feature elsewhere; a prime
example is the |>
operator used extensively in
F#
(to say the least) and indeed this – along with Unix
pipes – served as a motivation for developing the magrittr package.
This vignette describes the main features of magrittr and demonstrates some features which have been added since the initial release.
At first encounter, you may wonder whether an operator such as
%>%
can really be all that beneficial; but as you may
notice, it semantically changes your code in a way that makes it more
intuitive to both read and write.
Consider the following example, in which the mtcars
dataset shipped with R is munged a little:
library(magrittr)
car_data <-
mtcars %>%
subset(hp > 100) %>%
aggregate(. ~ cyl, data = ., FUN = . %>% mean %>% round(2)) %>%
transform(kpl = mpg %>% multiply_by(0.4251)) %>%
print
#> cyl mpg disp hp drat wt qsec vs am gear carb kpl
#> 1 4 25.90 108.05 111.00 3.94 2.15 17.75 1.00 1.00 4.50 2.00 11.010090
#> 2 6 19.74 183.31 122.29 3.59 3.12 17.98 0.57 0.43 3.86 3.43 8.391474
#> 3 8 15.10 353.10 209.21 3.23 4.00 16.77 0.00 0.14 3.29 3.50 6.419010
We start with a value, here mtcars
(a
data.frame
). From there, we extract a subset, aggregate the
information based on the number of cylinders, and then transform the
dataset by adding a variable for kilometers per liter as a supplement to
miles per gallon. Finally we print the result before assigning it. Note
how the code is arranged in the logical order of how you think about the
task: data->transform->aggregate, which is also the same order as
the code will execute. It’s like a recipe – easy to read, easy to
follow!
A horrific alternative would be to write:
car_data <-
transform(aggregate(. ~ cyl,
data = subset(mtcars, hp > 100),
FUN = function(x) round(mean(x), 2)),
kpl = mpg*0.4251)
There is a lot more clutter with parentheses, and the mental task of deciphering the code is more challenging—particularly if you did not write it yourself.
Note also how “building” a function on the fly for use in
aggregate
is very simple in magrittr: rather than
an actual value as the left-hand side in the pipeline, just use the
placeholder. This is also very useful in R’s *apply
family
of functions.
Granted, you may make the second example better, perhaps throw in a few temporary variables (which is often avoided to some degree when using magrittr), but one often sees cluttered lines like the ones presented.
And here is another selling point: suppose I want to quickly add another step somewhere in the process. This is very easy to do in the pipeline version, but a little more challenging in the “standard” example.
The combined example shows a few neat features of the pipe (which it is not):
subset
and transform
expressions.%>%
may be used in a nested fashion, e.g. it may
appear in expressions within arguments. This is illustrated in the
mpg
to kpl
conversion.'.'
, as placeholder. This is shown in the
aggregate
expression.aggregate
expression.print
(which also returns its argument). Here,
LHS %>% print()
, or even
LHS %>% print(.)
would also work..
) as the LHS will create a
unary function. This is used to define the aggregator function.One feature, which was not demonstrated above is piping into anonymous functions, or lambdas. This is possible using standard function definitions, e.g.:
However, magrittr also allows a short-hand notation:
car_data %>%
{
if (nrow(.) > 0)
rbind(head(., 1), tail(., 1))
else .
}
#> cyl mpg disp hp drat wt qsec vs am gear carb kpl
#> 1 4 25.9 108.05 111.00 3.94 2.15 17.75 1 1.00 4.50 2.0 11.01009
#> 3 8 15.1 353.10 209.21 3.23 4.00 16.77 0 0.14 3.29 3.5 6.41901
Since all right-hand sides are really “body expressions” of unary functions, this is only the natural extension of the simple right-hand side expressions. Of course, longer and more complex functions can be made using this approach.
In the first example, the anonymous function is enclosed in parentheses. Whenever you want to use a function- or call-generating statement as right-hand side, parentheses are used to evaluate the right-hand side before piping takes place.
Another, less useful example is:
magrittr also provides three related pipe operators. These
are not as common as %>%
but they become useful in
special cases.
The “tee” pipe, %T>%
works like %>%
,
except it returns the left-hand side value, and not the result of the
right-hand side operation. This is useful when a step in a pipeline is
used for its side-effect (printing, plotting, logging, etc.). As an
example (where the actual plot is omitted here):
rnorm(200) %>%
matrix(ncol = 2) %T>%
plot %>% # plot usually does not return anything.
colSums
#> [1] 2.2237320 0.6873128
The “exposition” pipe, %$%
exposes the names within the
left-hand side object to the right-hand side expression. Essentially, it
is a short-hand for using the with
functions (and the same
left-hand side objects are accepted). This operator is handy when
functions do not themselves have a data argument, as for example
lm
and aggregate
do. Here are a few examples
as illustration:
iris %>%
subset(Sepal.Length > mean(Sepal.Length)) %$%
cor(Sepal.Length, Sepal.Width)
data.frame(z = rnorm(100)) %$%
ts.plot(z)
Finally, the “assignment” pipe %<>%
can be used as
the first pipe in a chain. The effect will be that the result of the
pipeline is assigned to the left-hand side object, rather than returning
the result as usual. It is essentially shorthand notation for
expressions like foo <- foo %>% bar %>% baz
, which
boils down to foo %<>% bar %>% baz
. Another
example is:
The %<>%
can be used whenever
expr <- ...
makes sense, e.g.
x %<>% foo %>% bar
x[1:10] %<>% foo %>% bar
x$baz %<>% foo %>% bar
In addition to the %>%
-operator, magrittr
provides some aliases for other operators which make operations such as
addition or multiplication fit well into the magrittr-syntax.
As an example, consider:
rnorm(1000) %>%
multiply_by(5) %>%
add(5) %>%
{
cat("Mean:", mean(.),
"Variance:", var(.), "\n")
head(.)
}
#> Mean: 4.899325 Variance: 25.29878
#> [1] -0.2399783 6.6772644 -2.5228049 11.9207701 7.2236172 9.3822512
which could be written in more compact form as:
To see a list of the aliases, execute
e.g. ?multiply_by
.
The magrittr package is also available in a development version at the GitHub development page: github.com/tidyverse/magrittr.