The nomenclature of the package, ast2ast, is rooted in the abbreviation ast, signifying abstract syntax tree. Originally I planned to convert the abstract syntax tree of R to the C++ tree as the literature recommended for transpilers. However, through iterative refinement, a more optimal methodology emerged. The development incorporated an Expression Template Library in C++ known as ETR, meticulously crafted to mimic R. Thus, R code is translated into ETR code, which is subsequently compiled. The original ETR library is accessible at https://github.com/Konrad1991/ETR. It’s imperative to note that the version integrated into ast2ast has undergone substantial enhancements, amplifying its efficacy and adaptability.
Displayed below is a basic bubble sort function implemented in R on
the left, juxtaposed with its ETR counterpart on the right. It is
obvious that the two code snippets are quite similar. Remarkably, the
overall structure of the R code remains unaltered. Instead, the
substitution of individual functions with their ETR equivalents forms
the crux of the transformation.
In the C++ code, all functions are located in the etr
namespace. Certain functions share identical names in both R and C++,
such as the length function. To mitigate potential conflicts,
these calls are modified to explicitly reference the etr namespace,
resulting in expressions like etr::length. Other functions as
for example :
and [
cannot
be defined in C++ (at least not in the way they are used in R) therefore
they are replaced by functions with new names e.g. etr::colon
and etr::subset.
Additionally, C++ necessitates explicit declaration of variable types.
Within this example for all variables the type Vec
bubbleSort <- function(a) {
size <- length(a)
for (i in 1:size) {
for (j in 1:(size - 1)) {
if (a[j] > a[j + 1]) {
temp <- a[j]
a[j] <- a[j + 1]
a[j + 1] <- temp
}
}
}
return(a)
}
// [[Rcpp::depends(ast2ast)]]
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::plugins(cpp2a)]]
#include "etr.hpp"
// [[Rcpp::export]]
SEXP bubbleSort(SEXP aSEXP) {
etr::Vec<double> size;
etr::Vec<double> temp;
etr::Vec<double> a; a = aSEXP;
size = etr::length(a);
for (auto&i: etr::colon(etr::i2d(1), size)) {
for (auto&j: etr::colon(etr::i2d(1), (size - etr::i2d(1)))) {
if (etr::subset(a, j) > etr::subset(a, j + etr::i2d(1))) {
temp = etr::subset(a, j);
etr::subset(a, j) = etr::subset(a, j + etr::i2d(1));
etr::subset(a, j + etr::i2d(1)) = temp;
}
}
}
return(etr::cpp2R(a));
}
The XPtr interfaces creates an external pointer which can be used in other C++ programs.
// [[Rcpp::depends(ast2ast)]]
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::plugins(cpp2a)]]
#include "etr.hpp"
typedef etr::Vec<double> (*FP)(etr::Vec<double>& a, etr::Vec<double>& b);
// [[Rcpp::export]]
void call_xptr(Rcpp::XPtr<FP> ep) {
FP f = *ep;
etr::Vec<double> a;
etr::Vec<double> b;
etr::Vec<double> c;
a = etr::coca(1, 2, 3);
b = etr::coca(4, 5, 6);
c = f(a, b);
etr::print(c);
}
The core type in the expression template library R (ETR) is a class
called Vec. Presuming a foundational familiarity with classes
and templates in C++, we embark on a detailed exploration of the design
inherent in this class. The Vec class incorporates three templates,
namely T, R, and Trait. In this context, the
typename T signifies a fundamental data type, while R
represents another class (more details forthcoming). The third template,
Trait, plays a crucial role in endowing the class with
identifiable properties as Vec during the compile-time phase. In the
majority of instances, the template T is instantiated as a
double, referred to as numeric in the context of R types. On
certain occasions, T is a bool, denoted as
logical in the realm of R types. The typename R
represents another class this can be either: Buffer, BorrowSEXP, Borrow,
Subset, UnaryOperation or BinaryOperation. Each of these classes
contributes distinct functionalities and features to the Vec
class. It is recommended to directly use only the classes Buffer,
BorrowSEXP and Borrow. The Subset class is yielded by calls to the
function subset. Whereas UnaryOperation or BinaryOperation are produced
by invoking functions like sin, cos, + and
-.
The class Vec is thread safe in the sense that no memory is associated with functions or global variables. Moreover, almost no static methods are defined. An exception are the methods used for comparison. Notably, these comparison methods (==, <=, >=, <, > and !=) except two doubles (by copy) as arguments and return a bool. Thus, it is still possible to use instances of the class in parallel. However, the user has to take care that only one thread edits an object at a time.
A sexp can be converted to Rcpp::NumericVectors, Rcpp::NumericMatrices, arma::vec or to a arma::mat.