Example 4: Nest

This vignette displays how to use nesting in tidyfst. It has referred to tidyrs vignette in https://tidyr.tidyverse.org/articles/nest.html. Now fist, we nest the “mtcars” data.frame by “cyl” column.

library(tidyfst)

# nest by "cyl" column
mtcars_nested <- mtcars %>% 
  nest_dt(cyl) # you can use "cyl" too, very flexible

# inspect the output data.table
mtcars_nested
#>      cyl                 ndt
#>    <num>              <list>
#> 1:     6  <data.table[7x10]>
#> 2:     4 <data.table[11x10]>
#> 3:     8 <data.table[14x10]>

Now, we want to do a regression within the nested group “cyl”. We’ll use the famous lapply to complete this:

mtcars_nested2 <- mtcars_nested %>% 
  mutate_dt(model = lapply(ndt,function(df) lm(mpg ~ wt, data = df)))

mtcars_nested2
#>      cyl                 ndt    model
#>    <num>              <list>   <list>
#> 1:     6  <data.table[7x10]> <lm[12]>
#> 2:     4 <data.table[11x10]> <lm[12]>
#> 3:     8 <data.table[14x10]> <lm[12]>

We could see that the model is stored in the column “model”. Now, we try to get the fitted value in the model.

mtcars_nested3 <- mtcars_nested2 %>% 
  mutate_dt(model_predict = lapply(model, predict))
mtcars_nested3$model_predict
#> [[1]]
#>        1        2        3        4        5        6        7 
#> 21.12497 20.41604 19.47080 18.78968 18.84528 18.84528 20.70795 
#> 
#> [[2]]
#>        1        2        3        4        5        6        7        8 
#> 26.47010 21.55719 21.78307 27.14774 30.45125 29.20890 25.65128 28.64420 
#>        9       10       11 
#> 27.48656 31.02725 23.87247 
#> 
#> [[3]]
#>        1        2        3        4        5        6        7        8 
#> 16.32604 16.04103 14.94481 15.69024 15.58061 12.35773 11.97625 12.14945 
#>        9       10       11       12       13       14 
#> 16.15065 16.33700 15.44907 15.43811 16.91800 16.04103

We could find that the “model_predict” is a list of numeric vectors. Let’s try to unnest the target column “model_predict”.

mtcars_nested3 %>% unnest_dt(model_predict)
#>       cyl model_predict
#>     <num>         <num>
#>  1:     6      21.12497
#>  2:     6      20.41604
#>  3:     6      19.47080
#>  4:     6      18.78968
#>  5:     6      18.84528
#>  6:     6      18.84528
#>  7:     6      20.70795
#>  8:     4      26.47010
#>  9:     4      21.55719
#> 10:     4      21.78307
#> 11:     4      27.14774
#> 12:     4      30.45125
#> 13:     4      29.20890
#> 14:     4      25.65128
#> 15:     4      28.64420
#> 16:     4      27.48656
#> 17:     4      31.02725
#> 18:     4      23.87247
#> 19:     8      16.32604
#> 20:     8      16.04103
#> 21:     8      14.94481
#> 22:     8      15.69024
#> 23:     8      15.58061
#> 24:     8      12.35773
#> 25:     8      11.97625
#> 26:     8      12.14945
#> 27:     8      16.15065
#> 28:     8      16.33700
#> 29:     8      15.44907
#> 30:     8      15.43811
#> 31:     8      16.91800
#> 32:     8      16.04103
#>       cyl model_predict

This process would remove all the other list column automatically. For instance, in our case, the column “ndt” is removed.