--- title: "Example 2: Join tables" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Example 2: Join tables} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` This post has referred to a vignette from `dplyr`, you can find it in . We'll try to display how to join data tables in this vignette. First, load the packages we need and get some data. ```{r setup} library(tidyfst) library(nycflights13) flights2 <- flights %>% select_dt(year,month,day, hour, origin, dest, tailnum, carrier) ``` Do a left join with a simple: ```{r} flights2 %>% left_join_dt(airlines) ``` ## Controlling how the tables are matched Join works the same as `dplyr`: ```{r} flights2 %>% left_join_dt(weather) flights2 %>% left_join_dt(planes, by = "tailnum") flights2 %>% left_join_dt(airports, c("dest" = "faa")) flights2 %>% left_join_dt(airports, c("origin" = "faa")) ``` ## Types of join ```{r} df1 <- data.table(x = c(1, 2), y = 2:1) df2 <- data.table(x = c(1, 3), a = 10, b = "a") df1 %>% inner_join_dt(df2) df1 %>% left_join_dt(df2) df1 %>% right_join_dt(df2) df1 %>% full_join_dt(df2) ``` If all you have is a data.frame or tibble, you have no need to change the format. Feed the data directly: ```{r} df1 <- data.frame(x = c(1, 1, 2), y = 1:3) df2 <- data.frame(x = c(1, 1, 2), z = c("a", "b", "a")) df1 %>% left_join_dt(df2) ``` The "_dt" suffix should remind you that this is backed up by `data.table` and will always return a data.table in the end. ## Filtering joins Filtering joins have also been supported in `tidyfst`. ```{r} flights %>% anti_join_dt(planes, by = "tailnum") %>% count_dt(tailnum, sort = TRUE) ``` Other examples (`semi_join_dt()` and `anti_join_dt()` never duplicate; they only ever remove observations.): ```{r} df1 <- data.frame(x = c(1, 1, 3, 4), y = 1:4) df2 <- data.frame(x = c(1, 1, 2), z = c("a", "b", "a")) # Four rows to start with: df1 %>% nrow() # And we get four rows after the join df1 %>% inner_join_dt(df2, by = "x") %>% nrow() # But only two rows actually match df1 %>% semi_join_dt(df2, by = "x") %>% nrow() ``` ## Set operations For set operations, wrap `data.table`'s function directly, but the functions will automatically turn any data.frame into data.table. Examples are listed as below: ```{r} x = iris[c(2,3,3,4),] x2 = iris[2:4,] y = iris[c(3:5),] intersect_dt(x, y) # intersect intersect_dt(x, y, all=TRUE) # intersect all setdiff_dt(x, y) # except setdiff_dt(x, y, all=TRUE) # except all union_dt(x, y) # union union_dt(x, y, all=TRUE) # union all setequal_dt(x, x2, all=FALSE) # setequal setequal_dt(x, x2) ``` For more details, just find the help from `data.table` using `?setops`.