Title: | Build 'data.table' Expressions with Data Manipulation Verbs |
---|---|
Description: | A specialization of 'dplyr' data manipulation verbs that parse and build expressions which are ultimately evaluated by 'data.table', letting it handle all optimizations. A set of additional verbs is also provided to facilitate some common operations on a subset of the data. |
Authors: | Alexis Sarda-Espinosa [cre, aut] |
Maintainer: | Alexis Sarda-Espinosa <[email protected]> |
License: | MPL-2.0 |
Version: | 0.4.2 |
Built: | 2024-11-22 03:06:14 UTC |
Source: | https://github.com/asardaes/table.express |
A specialization of dplyr
verbs, as well as a set of custom ones, that
build expressions that can be used within a data.table
's
frame.
Note that since version 0.3.0, it is not possible to load table.express and dtplyr at
the same time, since they define the same data.table
methods for many dplyr generics.
Bearing in mind that data.table
s are also data.frame
s, we have to consider that other
packages may uses dplyr
internally without importing data.table
. Since dplyr
's methods are
generic, calls to these methods in such packages would fail. The functions in this package try to
detect when this happens and delegate to the data.frame
methods with a warning, which can be
safely ignored if you know that the error originates from a package that is not meant to work
with data.table
. To avoid the warning, use options(table.express.warn.cedta = FALSE)
.
This software package was developed independently of any organization or institution that is or has been associated with the author.
Alexis Sarda-Espinosa
Useful links:
Report bugs at https://github.com/asardaes/table.express/issues
require("data.table") data("mtcars") DT <- as.data.table(mtcars) # ==================================================================================== # Simple dplyr-like transformations DT %>% group_by(cyl) %>% filter(vs == 0, am == 1) %>% transmute(mean_mpg = mean(mpg)) %>% arrange(-cyl) # Equivalent to previous DT %>% start_expr %>% transmute(mean_mpg = mean(mpg)) %>% where(vs == 0, am == 1) %>% group_by(cyl) %>% order_by(-cyl) %>% end_expr # Modification by reference DT %>% where(gear %% 2 != 0, carb %% 2 == 0) %>% mutate(wt_squared = wt ^ 2) print(DT) # Deletion by reference DT %>% mutate(wt_squared = NULL) %>% print # Support for tidyslect helpers DT %>% select(ends_with("t")) # ==================================================================================== # Helpers to transform a subset of data # Like DT[, (whole) := lapply(.SD, as.integer), .SDcols = whole] whole <- names(DT)[sapply(DT, function(x) { all(x %% 1 == 0) })] DT %>% mutate_sd(as.integer, .SDcols = whole) sapply(DT, class) # Like DT[, lapply(.SD, fun), .SDcols = ...] DT %>% transmute_sd((.COL - mean(.COL)) / sd(.COL), .SDcols = setdiff(names(DT), whole)) # Filter several with the same condition DT %>% filter_sd(.COL == 1, .SDcols = c("vs", "am")) # Using secondary indices, i.e. DT[.(4, 5), on = .(cyl, gear)] DT %>% filter_on(cyl = 4, gear = 5) # note we don't use == scale_undim <- function(...) { as.numeric(scale(...)) # remove dimensions } # Chaining DT %>% start_expr %>% mutate_sd(as.integer, .SDcols = whole) %>% chain %>% filter_sd(.COL == 1, .SDcols = c("vs", "am"), .collapse = `|`) %>% transmute_sd(scale_undim, .SDcols = !is.integer(.COL)) %>% end_expr # The previous is quivalent to DT[, (whole) := lapply(.SD, as.integer), .SDcols = whole ][vs == 1 | am == 1, lapply(.SD, scale_undim), .SDcols = names(DT)[sapply(DT, Negate(is.integer))]] # Alternative to keep all columns (*copying* non-scaled ones) scale_non_integers <- function(x) { if (is.integer(x)) x else scale_undim(x) } DT %>% filter_sd(.COL == 1, .SDcols = c("vs", "am"), .collapse = `|`) %>% transmute_sd(everything(), scale_non_integers) # Without copying non-scaled DT %>% where(vs == 1 | am == 1) %>% mutate_sd(scale, .SDcols = names(DT)[sapply(DT, Negate(is.integer))]) print(DT)
require("data.table") data("mtcars") DT <- as.data.table(mtcars) # ==================================================================================== # Simple dplyr-like transformations DT %>% group_by(cyl) %>% filter(vs == 0, am == 1) %>% transmute(mean_mpg = mean(mpg)) %>% arrange(-cyl) # Equivalent to previous DT %>% start_expr %>% transmute(mean_mpg = mean(mpg)) %>% where(vs == 0, am == 1) %>% group_by(cyl) %>% order_by(-cyl) %>% end_expr # Modification by reference DT %>% where(gear %% 2 != 0, carb %% 2 == 0) %>% mutate(wt_squared = wt ^ 2) print(DT) # Deletion by reference DT %>% mutate(wt_squared = NULL) %>% print # Support for tidyslect helpers DT %>% select(ends_with("t")) # ==================================================================================== # Helpers to transform a subset of data # Like DT[, (whole) := lapply(.SD, as.integer), .SDcols = whole] whole <- names(DT)[sapply(DT, function(x) { all(x %% 1 == 0) })] DT %>% mutate_sd(as.integer, .SDcols = whole) sapply(DT, class) # Like DT[, lapply(.SD, fun), .SDcols = ...] DT %>% transmute_sd((.COL - mean(.COL)) / sd(.COL), .SDcols = setdiff(names(DT), whole)) # Filter several with the same condition DT %>% filter_sd(.COL == 1, .SDcols = c("vs", "am")) # Using secondary indices, i.e. DT[.(4, 5), on = .(cyl, gear)] DT %>% filter_on(cyl = 4, gear = 5) # note we don't use == scale_undim <- function(...) { as.numeric(scale(...)) # remove dimensions } # Chaining DT %>% start_expr %>% mutate_sd(as.integer, .SDcols = whole) %>% chain %>% filter_sd(.COL == 1, .SDcols = c("vs", "am"), .collapse = `|`) %>% transmute_sd(scale_undim, .SDcols = !is.integer(.COL)) %>% end_expr # The previous is quivalent to DT[, (whole) := lapply(.SD, as.integer), .SDcols = whole ][vs == 1 | am == 1, lapply(.SD, scale_undim), .SDcols = names(DT)[sapply(DT, Negate(is.integer))]] # Alternative to keep all columns (*copying* non-scaled ones) scale_non_integers <- function(x) { if (is.integer(x)) x else scale_undim(x) } DT %>% filter_sd(.COL == 1, .SDcols = c("vs", "am"), .collapse = `|`) %>% transmute_sd(everything(), scale_non_integers) # Without copying non-scaled DT %>% where(vs == 1 | am == 1) %>% mutate_sd(scale, .SDcols = names(DT)[sapply(DT, Negate(is.integer))]) print(DT)
Alias for order_by-table.express.
## S3 method for class 'ExprBuilder' arrange(.data, ...) ## S3 method for class 'data.table' arrange(.data, ...)
## S3 method for class 'ExprBuilder' arrange(.data, ...) ## S3 method for class 'data.table' arrange(.data, ...)
.data |
An instance of ExprBuilder. |
... |
To see more examples, check the vignette, or the table.express-package entry.
Build a chain of similar objects/operations.
chain(.data, ...) ## S3 method for class 'ExprBuilder' chain(.data, ..., .parent_env = rlang::caller_env())
chain(.data, ...) ## S3 method for class 'ExprBuilder' chain(.data, ..., .parent_env = rlang::caller_env())
.data |
Object to be chained. |
... |
Arguments for the specific methods. |
.parent_env |
See |
The chaining for ExprBuilder is equivalent to calling end_expr()
followed by start_expr()
.
The ellipsis (...
) is passed to both functions.
To see more examples, check the vignette, or the table.express-package entry.
Rows with distinct combinations of columns
## S3 method for class 'ExprBuilder' distinct( .data, ..., .keep = TRUE, .n = 1L, .parse = getOption("table.express.parse", FALSE) ) ## S3 method for class 'data.table' distinct(.data, ...)
## S3 method for class 'ExprBuilder' distinct( .data, ..., .keep = TRUE, .n = 1L, .parse = getOption("table.express.parse", FALSE) ) ## S3 method for class 'data.table' distinct(.data, ...)
.data |
An instance of ExprBuilder. |
... |
Which columns to use to determine uniqueness. |
.keep |
See details below. |
.n |
Indices of rows to return for each unique combination of the chosen columns. See details. |
.parse |
Logical. Whether to apply |
If .keep = TRUE
(the default), the columns not mentioned in ...
are also kept. However, if
a new column is created in one of the expressions therein, .keep
can also be set to a character
vector containing the names of all the columns that should be in the result in addition to the
ones mentioned in ...
. See the examples.
The value of .n
is only relevant when .keep
is not FALSE
. It is used to subset .SD
in
the built data.table
expression. For example, we could get 2 rows per combination by setting
.n
to 1:2
, or get the last row instead of the first by using .N
. If more than one index is
used, and not enough rows are found, some rows will have NA
. Do note that, at least as of
version 1.12.2 of data.table
, only expressions with single indices are internally optimized.
To see more examples, check the vignette, or the table.express-package entry.
data("mtcars") # compare with .keep = TRUE data.table::as.data.table(mtcars) %>% distinct(amvs = am + vs, .keep = names(mtcars))
data("mtcars") # compare with .keep = TRUE data.table::as.data.table(mtcars) %>% distinct(amvs = am + vs, .keep = names(mtcars))
Like ExprBuilder, but eager in some regards. This shouldn't be used directly.
table.express::ExprBuilder
-> EagerExprBuilder
new()
Constructor.
EagerExprBuilder$new(DT, ...)
DT
...
Ignored.
chain()
Override to abort if chaining is attempted.
EagerExprBuilder$chain(...)
...
Ignored.
chain_if_set()
Override to abort if chaining is attempted.
EagerExprBuilder$chain_if_set(...)
...
Ignored.
clone()
The objects of this class are cloneable with this method.
EagerExprBuilder$clone(deep = FALSE)
deep
Whether to make a deep clone.
Finish the expression-building process and evaluate it.
end_expr(.data, ...) ## S3 method for class 'ExprBuilder' end_expr(.data, ..., .by_ref = TRUE, .parent_env)
end_expr(.data, ...) ## S3 method for class 'ExprBuilder' end_expr(.data, ..., .by_ref = TRUE, .parent_env)
.data |
The expression. |
... |
Arguments for the specific methods. |
.by_ref |
If |
.parent_env |
Optionally, the enclosing environment of the expression's evaluation environment. Defaults to the caller environment. |
The ExprBuilder method returns a data.table::data.table.
To see more examples, check the vignette, or the table.express-package entry.
Build an expression that will be used inside a data.table::data.table's frame. This shouldn't be used directly.
In general, a modified self
with extended expression.
appends
Extra expressions that go at the end.
expr
The final expression that can be evaluated with base::eval()
or
rlang::eval_bare()
.
new()
Constructor.
ExprBuilder$new( DT, dt_pronouns = list(), nested = list(), verbose = getOption("table.express.verbose", FALSE) )
DT
dt_pronouns, nested
Internal parameters for joins.
verbose
Print more information during the process of building expressions.
set_i()
Set the i
clause expression(s), starting a new frame if the current
one already has said expression set.
ExprBuilder$set_i(value, chain_if_needed)
value
A captured expression.
chain_if_needed
Whether chaining is allowed during this step.
set_j()
Like set_i
but for the j
clause.
ExprBuilder$set_j(value, chain_if_needed)
value
A captured expression.
chain_if_needed
Whether chaining is allowed during this step.
set_by()
Set the by
clause expression.
ExprBuilder$set_by(value, chain_if_needed)
value
A captured expression.
chain_if_needed
Whether chaining is allowed during this step.
chain()
By default, start a new expression with the current one as its
parent. If type = "pronoun"
, dt
is used to start a new expression
that joins the current one.
ExprBuilder$chain(type = "frame", next_dt, parent_env, to_eager = FALSE)
type
One of "frame", "pronoun".
next_dt
Next data table when chaining pronoun.
parent_env
Where to evaluate current expression when chaining pronoun.
to_eager
Whether or not to use an EagerExprBuilder in the new chain
chain_if_set()
Chain if any clause values are already set.
ExprBuilder$chain_if_set(...)
...
Clause values.
seek_and_nestroy()
Helper for nest_expr
.
ExprBuilder$seek_and_nestroy(.exprs)
.exprs
List of expressions.
eval()
Evaluate the final expression with parent_env
as the enclosing
environment. If by_ref = FALSE
, data.table::copy()
is called
before. The ellipsis' contents are assigned to the expression's
evaluation environment.
ExprBuilder$eval(parent_env, by_ref, ...)
parent_env
Enclosing environment.
by_ref
Flag to control deep copies.
...
Additional variables for the evaluation environment.
tidy_select()
Evaluate a tidyselect
call using the currently captured table.
ExprBuilder$tidy_select(select_expr)
select_expr
The selection expression.
print()
Prints the built expr
.
ExprBuilder$print(...)
...
Ignored.
clone()
The objects of this class are cloneable with this method.
ExprBuilder$clone(deep = FALSE)
deep
Whether to make a deep clone.
Find rows with maxima/minima in given columns.
max_by(.data, .col, ...) ## S3 method for class 'ExprBuilder' max_by( .data, .col, ..., .some = FALSE, .chain = getOption("table.express.chain", TRUE) ) ## S3 method for class 'data.table' max_by(.data, .col, ..., .expr = FALSE) min_by(.data, .col, ...) ## S3 method for class 'ExprBuilder' min_by( .data, .col, ..., .some = FALSE, .chain = getOption("table.express.chain", TRUE) ) ## S3 method for class 'data.table' min_by(.data, .col, ..., .expr = FALSE)
max_by(.data, .col, ...) ## S3 method for class 'ExprBuilder' max_by( .data, .col, ..., .some = FALSE, .chain = getOption("table.express.chain", TRUE) ) ## S3 method for class 'data.table' max_by(.data, .col, ..., .expr = FALSE) min_by(.data, .col, ...) ## S3 method for class 'ExprBuilder' min_by( .data, .col, ..., .some = FALSE, .chain = getOption("table.express.chain", TRUE) ) ## S3 method for class 'data.table' min_by(.data, .col, ..., .expr = FALSE)
.data |
An instance of ExprBuilder. |
.col |
A character vector indicating the columns that will be searched for extrema. |
... |
Optionally, columns to group by, either as characters or symbols. |
.some |
If |
.chain |
Logical. Should a new frame be automatically chained to the expression if the clause being set already exists? |
.expr |
If the input is a |
These verbs implement the idiom shown here by
leveraging nest_expr()
. The whole nested expression is assigned to i
in the data.table
's
frame. It is probably a good idea to use this on a frame that has no other frames preceding it
in the current expression, given that nest_expr()
uses the captured data.table
, so consider
using chain()
when needed.
Several columns can be specified in .col
, and depending on the value of .some
, the rows with
all or some extrema are returned, using &
or |
respectively. Depending on your data, using
more than one column might not make sense, resulting in an empty data.table
.
data("mtcars") data.table::as.data.table(mtcars) %>% max_by("mpg", "vs")
data("mtcars") data.table::as.data.table(mtcars) %>% max_by("mpg", "vs")
Helper to filter specifying the on
part of the data.table::data.table query.
filter_on(.data, ...) ## S3 method for class 'ExprBuilder' filter_on( .data, ..., which = FALSE, nomatch = getOption("datatable.nomatch"), mult = "all", .negate = FALSE, .chain = getOption("table.express.chain", TRUE) ) ## S3 method for class 'data.table' filter_on(.data, ..., .expr = FALSE)
filter_on(.data, ...) ## S3 method for class 'ExprBuilder' filter_on( .data, ..., which = FALSE, nomatch = getOption("datatable.nomatch"), mult = "all", .negate = FALSE, .chain = getOption("table.express.chain", TRUE) ) ## S3 method for class 'data.table' filter_on(.data, ..., .expr = FALSE)
.data |
An instance of ExprBuilder. |
... |
Key-value pairs, maybe with empty keys if the |
which , nomatch , mult
|
|
.negate |
Whether to negate the expression and search only for rows that don't contain the given values. |
.chain |
Logical. Should a new frame be automatically chained to the expression if the clause being set already exists? |
.expr |
If the input is a |
The key-value pairs in '...' are processed as follows:
The names are used as on
in the data.table
frame. If any name is empty, on
is left
missing.
The values are packed in a list and used as i
in the data.table
frame.
To see more examples, check the vignette, or the table.express-package entry.
data("mtcars") data.table::as.data.table(mtcars) %>% filter_on(cyl = 4, gear = 5)
data("mtcars") data.table::as.data.table(mtcars) %>% filter_on(cyl = 4, gear = 5)
Helper to filter rows with the same condition applied to a subset of the data.
filter_sd(.data, .SDcols, .how = Negate(is.na), ...) ## S3 method for class 'ExprBuilder' filter_sd( .data, .SDcols, .how = Negate(is.na), ..., which, .collapse = `&`, .parse = getOption("table.express.parse", FALSE), .chain = getOption("table.express.chain", TRUE), .caller_env_n = 1L ) ## S3 method for class 'data.table' filter_sd(.data, ..., .expr = FALSE)
filter_sd(.data, .SDcols, .how = Negate(is.na), ...) ## S3 method for class 'ExprBuilder' filter_sd( .data, .SDcols, .how = Negate(is.na), ..., which, .collapse = `&`, .parse = getOption("table.express.parse", FALSE), .chain = getOption("table.express.chain", TRUE), .caller_env_n = 1L ) ## S3 method for class 'data.table' filter_sd(.data, ..., .expr = FALSE)
.data |
An instance of ExprBuilder. |
.SDcols |
See data.table::data.table and the details here. |
.how |
The filtering function or predicate. |
... |
Possibly more arguments for |
which |
Passed to data.table::data.table. |
.collapse |
See where-table.express. |
.parse |
Logical. Whether to apply |
.chain |
Logical. Should a new frame be automatically chained to the expression if the clause being set already exists? |
.caller_env_n |
Internal. Passed to |
.expr |
If the input is a |
This function adds/chains an i
expression that will be evaluated by data.table::data.table,
and it supports the .COL
pronoun and lambdas as formulas. The .how
condition is applied to
all .SDcols
.
Additionally, .SDcols
supports:
A predicate using the .COL
pronoun that should return a single logical when .COL
is
replaced by a column of the data.
A formula using .
or .x
instead of the aforementioned .COL
.
The caveat is that the expression is evaluated eagerly, i.e. with the currently captured
data.table
. Consider using chain()
to explicitly capture intermediate results as actual
data.table
s.
To see more examples, check the vignette, or the table.express-package entry.
data("mtcars") data.table::as.data.table(mtcars) %>% filter_sd(c("vs", "am"), ~ .x == 1)
data("mtcars") data.table::as.data.table(mtcars) %>% filter_sd(c("vs", "am"), ~ .x == 1)
Filter rows
## S3 method for class 'ExprBuilder' filter(.data, ..., .preserve) ## S3 method for class 'data.table' filter(.data, ...)
## S3 method for class 'ExprBuilder' filter(.data, ..., .preserve) ## S3 method for class 'data.table' filter(.data, ...)
.data |
An instance of ExprBuilder. |
... |
See where-table.express. |
.preserve |
Ignored. |
The ExprBuilder method is an alias for where-table.express.
The data.table::data.table method works eagerly like dplyr::filter()
.
To see more examples, check the vignette, or the table.express-package entry.
Add named expressions for the data.table::data.table frame.
frame_append(.data, ..., .parse = getOption("table.express.parse", FALSE))
frame_append(.data, ..., .parse = getOption("table.express.parse", FALSE))
.data |
An instance of ExprBuilder. |
... |
Expressions to add to the frame. |
.parse |
Logical. Whether to apply |
data.table::data.table() %>% start_expr %>% frame_append(anything = "goes")
data.table::data.table() %>% start_expr %>% frame_append(anything = "goes")
Grouping by columns of a data.table::data.table.
## S3 method for class 'ExprBuilder' group_by( .data, ..., .parse = getOption("table.express.parse", FALSE), .chain = getOption("table.express.chain", TRUE) ) ## S3 method for class 'data.table' group_by(.data, ...)
## S3 method for class 'ExprBuilder' group_by( .data, ..., .parse = getOption("table.express.parse", FALSE), .chain = getOption("table.express.chain", TRUE) ) ## S3 method for class 'data.table' group_by(.data, ...)
.data |
An instance of ExprBuilder. |
... |
Clause for grouping on columns. The |
.parse |
Logical. Whether to apply |
.chain |
Logical. Should a new frame be automatically chained to the expression if the clause being set already exists? |
Everything in ...
will be wrapped in a call to list
.
To see more examples, check the vignette, or the table.express-package entry.
data("mtcars") data.table::as.data.table(mtcars) %>% start_expr %>% group_by(cyl, gear)
data("mtcars") data.table::as.data.table(mtcars) %>% start_expr %>% group_by(cyl, gear)
Two-table joins. Check the "Joining verbs" vignette for more information.
## S3 method for class 'ExprBuilder' anti_join(x, y, ...) ## S3 method for class 'data.table' anti_join(x, ..., .expr = FALSE) ## S3 method for class 'ExprBuilder' full_join(x, y, ..., sort = TRUE, allow = TRUE, .parent_env) ## S3 method for class 'data.table' full_join(x, ...) ## S3 method for class 'ExprBuilder' inner_join(x, y, ...) ## S3 method for class 'data.table' inner_join(x, ..., .expr = FALSE) ## S3 method for class 'ExprBuilder' left_join( x, y, ..., nomatch, mult, roll, rollends, .parent_env, .to_eager = FALSE ) ## S3 method for class 'data.table' left_join(x, y, ..., allow = FALSE, .expr = FALSE) mutate_join(x, y, ...) ## S3 method for class 'ExprBuilder' mutate_join( x, y, ..., .SDcols, mult, roll, rollends, allow = FALSE, .by_each = NULL, .parent_env ) ## S3 method for class 'EagerExprBuilder' mutate_join(x, ..., .parent_env = rlang::caller_env()) ## S3 method for class 'data.table' mutate_join(x, y, ...) ## S3 method for class 'ExprBuilder' right_join( x, y, ..., allow = FALSE, which, nomatch, mult, roll, rollends, .selecting, .framing ) ## S3 method for class 'data.table' right_join(x, y, ..., allow = FALSE, .expr = FALSE, .selecting, .framing) ## S3 method for class 'ExprBuilder' semi_join(x, y, ..., allow = FALSE, .eager = FALSE) ## S3 method for class 'data.table' semi_join(x, y, ..., allow = FALSE, .eager = FALSE)
## S3 method for class 'ExprBuilder' anti_join(x, y, ...) ## S3 method for class 'data.table' anti_join(x, ..., .expr = FALSE) ## S3 method for class 'ExprBuilder' full_join(x, y, ..., sort = TRUE, allow = TRUE, .parent_env) ## S3 method for class 'data.table' full_join(x, ...) ## S3 method for class 'ExprBuilder' inner_join(x, y, ...) ## S3 method for class 'data.table' inner_join(x, ..., .expr = FALSE) ## S3 method for class 'ExprBuilder' left_join( x, y, ..., nomatch, mult, roll, rollends, .parent_env, .to_eager = FALSE ) ## S3 method for class 'data.table' left_join(x, y, ..., allow = FALSE, .expr = FALSE) mutate_join(x, y, ...) ## S3 method for class 'ExprBuilder' mutate_join( x, y, ..., .SDcols, mult, roll, rollends, allow = FALSE, .by_each = NULL, .parent_env ) ## S3 method for class 'EagerExprBuilder' mutate_join(x, ..., .parent_env = rlang::caller_env()) ## S3 method for class 'data.table' mutate_join(x, y, ...) ## S3 method for class 'ExprBuilder' right_join( x, y, ..., allow = FALSE, which, nomatch, mult, roll, rollends, .selecting, .framing ) ## S3 method for class 'data.table' right_join(x, y, ..., allow = FALSE, .expr = FALSE, .selecting, .framing) ## S3 method for class 'ExprBuilder' semi_join(x, y, ..., allow = FALSE, .eager = FALSE) ## S3 method for class 'data.table' semi_join(x, y, ..., allow = FALSE, .eager = FALSE)
x |
An ExprBuilder instance. |
y |
A data.table::data.table or, for some verbs (see details), a call to
|
... |
Expressions for the |
.expr |
If the input is a |
sort |
Passed to data.table::merge. |
allow |
Passed as |
.parent_env |
See |
nomatch , mult , roll , rollends
|
|
.to_eager |
Internal, should be left as |
.SDcols |
For |
.by_each |
For |
which |
If |
.selecting |
One or more expressions, possibly contained in a call to |
.framing |
Similar to |
.eager |
For |
The following joins support nest_expr()
in y
:
anti_join
inner_join
right_join
The full_join
method is really a wrapper for data.table::merge
that specifies all = TRUE
.
The expression in x
gets evaluated, merged with y
, and the result is captured in a new
ExprBuilder. Useful in case you want to keep building expressions after the merge.
The ExprBuilder method for mutate_join
implements the idiom described in this link. The columns specified in .SDcols
are
those that will be added to x
from y
. The specification can be done by:
Using tidyselect::select_helpers.
Passing a character vector. If the character is named, the names are taken as the new column
names for the values added to x
.
A list, using base::list()
or .()
, containing:
Column names, either as characters or symbols.
Named calls expressing how the column should be summarized/modified before adding it to
x
.
The last case mentioned above is useful when the join returns many rows from y
for each row
in x
, so they can be summarized while joining. The value of by
in the join depends on what
is passed to .by_each
:
If NULL
(the default), by
is set to .EACHI
if a call is detected in any of the
expressions from the list in .SDcols
If TRUE
, by
is always set to .EACHI
If FALSE
, by
is never set to .EACHI
data.table::data.table, dplyr::join
lhs <- data.table::data.table(x = rep(c("b", "a", "c"), each = 3), y = c(1, 3, 6), v = 1:9) rhs <- data.table::data.table(x = c("c", "b"), v = 8:7, foo = c(4, 2)) rhs %>% anti_join(lhs, x, v) lhs %>% inner_join(rhs, x) # creates new data.table lhs %>% left_join(rhs, x) # would modify lhs by reference lhs %>% start_expr %>% mutate_join(rhs, x, .SDcols = c("foo", rhs.v = "v")) # would modify rhs by reference, summarizing 'y' before adding it. rhs %>% start_expr %>% mutate_join(lhs, x, .SDcols = .(y = mean(y))) # creates new data.table lhs %>% right_join(rhs, x) # keep only columns from lhs lhs %>% semi_join(rhs, x)
lhs <- data.table::data.table(x = rep(c("b", "a", "c"), each = 3), y = c(1, 3, 6), v = 1:9) rhs <- data.table::data.table(x = c("c", "b"), v = 8:7, foo = c(4, 2)) rhs %>% anti_join(lhs, x, v) lhs %>% inner_join(rhs, x) # creates new data.table lhs %>% left_join(rhs, x) # would modify lhs by reference lhs %>% start_expr %>% mutate_join(rhs, x, .SDcols = c("foo", rhs.v = "v")) # would modify rhs by reference, summarizing 'y' before adding it. rhs %>% start_expr %>% mutate_join(lhs, x, .SDcols = .(y = mean(y))) # creates new data.table lhs %>% right_join(rhs, x) # keep only columns from lhs lhs %>% semi_join(rhs, x)
Group by setting key of the input.
key_by(.data, ...) ## S3 method for class 'ExprBuilder' key_by( .data, ..., .parse = getOption("table.express.parse", FALSE), .chain = getOption("table.express.chain", TRUE) ) ## S3 method for class 'data.table' key_by(.data, ...)
key_by(.data, ...) ## S3 method for class 'ExprBuilder' key_by( .data, ..., .parse = getOption("table.express.parse", FALSE), .chain = getOption("table.express.chain", TRUE) ) ## S3 method for class 'data.table' key_by(.data, ...)
.data |
Object to be grouped and subsequently keyed. |
... |
Arguments for the specific methods. |
.parse |
Logical. Whether to apply |
.chain |
Logical. Should a new frame be automatically chained to the expression if the clause being set already exists? |
Everything in ...
will be wrapped in a call to list
. Its contents work like Clauses for
grouping on columns. The keyby
inside the data.table::data.table frame.
To see more examples, check the vignette, or the table.express-package entry.
data("mtcars") data.table::as.data.table(mtcars) %>% start_expr %>% key_by(cyl, gear)
data("mtcars") data.table::as.data.table(mtcars) %>% start_expr %>% key_by(cyl, gear)
Like mutate-table.express but possibly recycling calls.
mutate_sd(.data, .SDcols, .how = identity, ...) ## S3 method for class 'ExprBuilder' mutate_sd( .data, .SDcols, .how = identity, ..., .pairwise = TRUE, .prefix, .suffix, .parse = getOption("table.express.parse", FALSE), .chain = getOption("table.express.chain", TRUE) ) ## S3 method for class 'EagerExprBuilder' mutate_sd(.data, ..., .parent_env = rlang::caller_env()) ## S3 method for class 'data.table' mutate_sd(.data, ...)
mutate_sd(.data, .SDcols, .how = identity, ...) ## S3 method for class 'ExprBuilder' mutate_sd( .data, .SDcols, .how = identity, ..., .pairwise = TRUE, .prefix, .suffix, .parse = getOption("table.express.parse", FALSE), .chain = getOption("table.express.chain", TRUE) ) ## S3 method for class 'EagerExprBuilder' mutate_sd(.data, ..., .parent_env = rlang::caller_env()) ## S3 method for class 'data.table' mutate_sd(.data, ...)
.data |
An instance of ExprBuilder. |
.SDcols |
See data.table::data.table and the details here. |
.how |
The function(s) or function call(s) that will perform the transformation. If many,
a list should be used, either with |
... |
Possibly more arguments for all functions/calls in |
.pairwise |
If |
.prefix , .suffix
|
Only relevant when |
.parse |
Logical. Whether to apply |
.chain |
Logical. Should a new frame be automatically chained to the expression if the clause being set already exists? |
.parent_env |
See |
This function works similar to transmute_sd()
but keeps all columns and can modify by
reference, like mutate-table.express. It can serve like
dplyr's scoped mutation variants depending on what's given to .SDcols
.
Additionally, .SDcols
supports:
A predicate using the .COL
pronoun that should return a single logical when .COL
is
replaced by a column of the data.
A formula using .
or .x
instead of the aforementioned .COL
.
The caveat is that the expression is evaluated eagerly, i.e. with the currently captured
data.table
. Consider using chain()
to explicitly capture intermediate results as actual
data.table
s.
To see more examples, check the vignette, or the table.express-package entry.
data("mtcars") data.table::as.data.table(mtcars) %>% start_expr %>% mutate_sd(c("mpg", "cyl"), ~ .x * 2)
data("mtcars") data.table::as.data.table(mtcars) %>% start_expr %>% mutate_sd(c("mpg", "cyl"), ~ .x * 2)
Add or update columns of a data.table::data.table, possibly by reference using
:=
.
## S3 method for class 'ExprBuilder' mutate( .data, ..., .sequential = FALSE, .unquote_names = TRUE, .parse = getOption("table.express.parse", FALSE), .chain = getOption("table.express.chain", TRUE) ) ## S3 method for class 'EagerExprBuilder' mutate(.data, ..., .parent_env = rlang::caller_env()) ## S3 method for class 'data.table' mutate(.data, ...)
## S3 method for class 'ExprBuilder' mutate( .data, ..., .sequential = FALSE, .unquote_names = TRUE, .parse = getOption("table.express.parse", FALSE), .chain = getOption("table.express.chain", TRUE) ) ## S3 method for class 'EagerExprBuilder' mutate(.data, ..., .parent_env = rlang::caller_env()) ## S3 method for class 'data.table' mutate(.data, ...)
.data |
An instance of ExprBuilder. |
... |
Mutation clauses. |
.sequential |
If |
.unquote_names |
Passed to |
.parse |
Logical. Whether to apply |
.chain |
Logical. Should a new frame be automatically chained to the expression if the clause being set already exists? |
.parent_env |
See |
To see more examples, check the vignette, or the table.express-package entry.
data("mtcars") data.table::as.data.table(mtcars) %>% start_expr %>% mutate(mpg_squared = mpg ^ 2)
data("mtcars") data.table::as.data.table(mtcars) %>% start_expr %>% mutate(mpg_squared = mpg ^ 2)
Nest expressions as a functional chain
nest_expr( ..., .start = TRUE, .end = .start, .parse = getOption("table.express.parse", FALSE) )
nest_expr( ..., .start = TRUE, .end = .start, .parse = getOption("table.express.parse", FALSE) )
... |
Expressions that will be part of the functional chain. |
.start |
Whether to add a |
.end |
Whether to add an |
.parse |
Logical. Whether to apply |
All expressions in ...
are "collapsed" with %>%
, passing the
ExprBuilder's captured data.table
as the initial parameter. Names are silently dropped.
The chain is evaluated eagerly and saved in the ExprBuilder
instance to be used during final
expression evaluation.
To see more examples, check the vignette, or the table.express-package entry.
Clause for ordering rows.
order_by(.data, ...) ## S3 method for class 'ExprBuilder' order_by( .data, ..., .collapse, .parse = getOption("table.express.parse", FALSE), .chain = getOption("table.express.chain", TRUE) ) ## S3 method for class 'data.table' order_by(.data, ...)
order_by(.data, ...) ## S3 method for class 'ExprBuilder' order_by( .data, ..., .collapse, .parse = getOption("table.express.parse", FALSE), .chain = getOption("table.express.chain", TRUE) ) ## S3 method for class 'data.table' order_by(.data, ...)
.data |
The input data. |
... |
Arguments for the specific methods. |
.collapse |
Ignored. See details. |
.parse |
Logical. Whether to apply |
.chain |
Logical. Should a new frame be automatically chained to the expression if the clause being set already exists? |
The ExprBuilder method dispatches to where-table.express, but doesn't forward the .collapse
argument.
To see more examples, check the vignette, or the table.express-package entry.
data("mtcars") data.table::as.data.table(mtcars) %>% order_by(-cyl, gear)
data("mtcars") data.table::as.data.table(mtcars) %>% order_by(-cyl, gear)
Select columns of a data.table::data.table.
## S3 method for class 'ExprBuilder' select( .data, ..., .negate = FALSE, .parse = getOption("table.express.parse", FALSE), .chain = getOption("table.express.chain", TRUE) ) ## S3 method for class 'EagerExprBuilder' select(.data, ..., .parent_env = rlang::caller_env()) ## S3 method for class 'data.table' select(.data, ...)
## S3 method for class 'ExprBuilder' select( .data, ..., .negate = FALSE, .parse = getOption("table.express.parse", FALSE), .chain = getOption("table.express.chain", TRUE) ) ## S3 method for class 'EagerExprBuilder' select(.data, ..., .parent_env = rlang::caller_env()) ## S3 method for class 'data.table' select(.data, ...)
.data |
An instance of ExprBuilder. |
... |
Clause for selecting columns. For |
.negate |
Whether to negate the selection semantics and keep only columns that do not
match what's given in |
.parse |
Logical. Whether to apply |
.chain |
Logical. Should a new frame be automatically chained to the expression if the clause being set already exists? |
.parent_env |
See |
The expressions in ...
support tidyselect::select_helpers.
To see more examples, check the vignette, or the table.express-package entry.
data("mtcars") data.table::as.data.table(mtcars) %>% select(mpg:cyl)
data("mtcars") data.table::as.data.table(mtcars) %>% select(mpg:cyl)
Start building an expression.
start_expr(.data, ...) ## S3 method for class 'data.table' start_expr(.data, ..., .verbose = getOption("table.express.verbose", FALSE))
start_expr(.data, ...) ## S3 method for class 'data.table' start_expr(.data, ..., .verbose = getOption("table.express.verbose", FALSE))
.data |
Optionally, something to capture for the expression. |
... |
Arguments for the specific methods. |
.verbose |
Whether to print more information during the expression-building process. |
The data.table::data.table method returns an ExprBuilder instance.
To see more examples, check the vignette, or the table.express-package entry.
Compute summaries for columns, perhaps by group.
## S3 method for class 'ExprBuilder' summarize( .data, ..., .assume_optimized = NULL, .parse = getOption("table.express.parse", FALSE), .chain = getOption("table.express.chain", TRUE) ) ## S3 method for class 'ExprBuilder' summarise( .data, ..., .assume_optimized = NULL, .parse = getOption("table.express.parse", FALSE), .chain = getOption("table.express.chain", TRUE) ) ## S3 method for class 'EagerExprBuilder' summarize(.data, ..., .parent_env = rlang::caller_env()) ## S3 method for class 'EagerExprBuilder' summarise(.data, ..., .parent_env = rlang::caller_env()) ## S3 method for class 'data.table' summarize(.data, ...) ## S3 method for class 'data.table' summarise(.data, ...)
## S3 method for class 'ExprBuilder' summarize( .data, ..., .assume_optimized = NULL, .parse = getOption("table.express.parse", FALSE), .chain = getOption("table.express.chain", TRUE) ) ## S3 method for class 'ExprBuilder' summarise( .data, ..., .assume_optimized = NULL, .parse = getOption("table.express.parse", FALSE), .chain = getOption("table.express.chain", TRUE) ) ## S3 method for class 'EagerExprBuilder' summarize(.data, ..., .parent_env = rlang::caller_env()) ## S3 method for class 'EagerExprBuilder' summarise(.data, ..., .parent_env = rlang::caller_env()) ## S3 method for class 'data.table' summarize(.data, ...) ## S3 method for class 'data.table' summarise(.data, ...)
.data |
An instance of ExprBuilder. |
... |
Clauses for transmuting columns. For |
.assume_optimized |
An optional character vector with function names that you know
|
.parse |
Logical. Whether to apply |
.chain |
Logical. Should a new frame be automatically chained to the expression if the clause being set already exists? |
.parent_env |
See |
The built expression is similar to what transmute
builds, but the function also checks that the
results have length 1.
To see more examples, check the vignette, or the table.express-package entry.
Like transmute-table.express but for a single call and maybe specifying .SDcols
.
transmute_sd(.data, .SDcols = everything(), .how = identity, ...) ## S3 method for class 'ExprBuilder' transmute_sd( .data, .SDcols = everything(), .how = identity, ..., .parse = getOption("table.express.parse", FALSE), .chain = getOption("table.express.chain", TRUE) ) ## S3 method for class 'EagerExprBuilder' transmute_sd(.data, ..., .parent_env = rlang::caller_env()) ## S3 method for class 'data.table' transmute_sd(.data, ...)
transmute_sd(.data, .SDcols = everything(), .how = identity, ...) ## S3 method for class 'ExprBuilder' transmute_sd( .data, .SDcols = everything(), .how = identity, ..., .parse = getOption("table.express.parse", FALSE), .chain = getOption("table.express.chain", TRUE) ) ## S3 method for class 'EagerExprBuilder' transmute_sd(.data, ..., .parent_env = rlang::caller_env()) ## S3 method for class 'data.table' transmute_sd(.data, ...)
.data |
An instance of ExprBuilder. |
.SDcols |
See data.table::data.table and the details here. |
.how |
The function(s) or function call(s) that will perform the transformation. If many,
a list should be used, either with |
... |
Possibly more arguments for all functions/calls in |
.parse |
Logical. Whether to apply |
.chain |
Logical. Should a new frame be automatically chained to the expression if the clause being set already exists? |
.parent_env |
See |
Like transmute-table.express, this function never modifies the input by reference. This
function adds/chains a select
expression that will be evaluated by data.table::data.table,
possibly specifying the helper function .transmute_matching
, which is assigned to the final
expression's evaluation environment when calling end_expr()
(i.e., ExprBuilder's eval
method).
Said function supports two pronouns that can be used by .how
and .SDcols
:
.COL
: the actual values of the column.
.COLNAME
: the name of the column currently being evaluated.
Additionally, lambdas specified as formulas are also supported. In those cases, .x
is
equivalent to .COL
and .y
to .COLNAME
.
Unlike a call like DT[, (vars) := expr]
, .SDcols
can be created dynamically with an
expression that evaluates to something that would be used in place of vars
without eagerly
using the captured data.table
. See the examples here or in table.express-package.
data("mtcars") data.table::as.data.table(mtcars) %>% transmute_sd(~ grepl("^d", .y), ~ .x * 2) data.table::as.data.table(mtcars) %>% transmute_sd(~ is.numeric(.x), ~ .x * 2)
data("mtcars") data.table::as.data.table(mtcars) %>% transmute_sd(~ grepl("^d", .y), ~ .x * 2) data.table::as.data.table(mtcars) %>% transmute_sd(~ is.numeric(.x), ~ .x * 2)
Compute and keep only new columns.
## S3 method for class 'ExprBuilder' transmute( .data, ..., .enlist = TRUE, .sequential = FALSE, .parse = getOption("table.express.parse", FALSE), .chain = getOption("table.express.chain", TRUE) ) ## S3 method for class 'EagerExprBuilder' transmute(.data, ..., .parent_env = rlang::caller_env()) ## S3 method for class 'data.table' transmute(.data, ...)
## S3 method for class 'ExprBuilder' transmute( .data, ..., .enlist = TRUE, .sequential = FALSE, .parse = getOption("table.express.parse", FALSE), .chain = getOption("table.express.chain", TRUE) ) ## S3 method for class 'EagerExprBuilder' transmute(.data, ..., .parent_env = rlang::caller_env()) ## S3 method for class 'data.table' transmute(.data, ...)
.data |
An instance of ExprBuilder. |
... |
Clauses for transmuting columns. For |
.enlist |
See details. |
.sequential |
If |
.parse |
Logical. Whether to apply |
.chain |
Logical. Should a new frame be automatically chained to the expression if the clause being set already exists? |
.parent_env |
See |
Everything in ...
is wrapped in a call to list
by default. If only one expression is given,
you can set .enlist
to FALSE
to skip the call to list
.
To see more examples, check the vignette, or the table.express-package entry.
data("mtcars") data.table::as.data.table(mtcars) %>% transmute(ans = mpg * 2)
data("mtcars") data.table::as.data.table(mtcars) %>% transmute(ans = mpg * 2)
Clause for subsetting rows.
where(.data, ...) ## S3 method for class 'ExprBuilder' where( .data, ..., which, .collapse = `&`, .parse = getOption("table.express.parse", FALSE), .chain = getOption("table.express.chain", TRUE) ) ## S3 method for class 'data.table' where(.data, ...)
where(.data, ...) ## S3 method for class 'ExprBuilder' where( .data, ..., which, .collapse = `&`, .parse = getOption("table.express.parse", FALSE), .chain = getOption("table.express.chain", TRUE) ) ## S3 method for class 'data.table' where(.data, ...)
.data |
The input data. |
... |
Arguments for the specific methods. |
which |
Passed to data.table::data.table. |
.collapse |
A boolean function which will be used to "concatenate" all conditions in |
.parse |
Logical. Whether to apply |
.chain |
Logical. Should a new frame be automatically chained to the expression if the clause being set already exists? |
For ExprBuilder, the expressions in ...
can call nest_expr()
, and are eagerly nested if
they do.
The data.table::data.table method is lazy, so it expects another verb to follow afterwards.
To see more examples, check the vignette, or the table.express-package entry.
data("mtcars") data.table::as.data.table(mtcars) %>% start_expr %>% where(vs == 0, am == 1) data.table::as.data.table(mtcars) %>% where(vs == 0) %>% transmute(mpg = round(mpg))
data("mtcars") data.table::as.data.table(mtcars) %>% start_expr %>% where(vs == 0, am == 1) data.table::as.data.table(mtcars) %>% where(vs == 0) %>% transmute(mpg = round(mpg))