This vignette builds upon the sample data for São Miguel (introduced
in the previous vignette) to
demonstrate the use of exactextractr
with categorical land
cover data. A sample of the CORINE
2018 raster dataset is included in exactextractr
.
As in the previous vignette, the following packages are used:
First, we load the CORINE land cover data and the concelho boundaries.
clc <- raster(system.file('sao_miguel/clc2018_v2020_20u1.tif',
package = 'exactextractr'))
concelhos <- st_read(system.file('sao_miguel/concelhos.gpkg',
package = 'exactextractr'),
quiet = TRUE)
The land cover class descriptions are provided in a separate DBF
file. We read this in to a data frame, then use levels()
to
associate the class descriptions with the raster.
clc_classes <- foreign::read.dbf(system.file('sao_miguel/clc2018_v2020_20u1.tif.vat.dbf',
package = 'exactextractr'),
as.is = TRUE) %>%
dplyr::select(value = Value,
landcov = LABEL3)
levels(clc) <- list(data.frame(ID = clc_classes$value,
landcov = clc_classes$landcov))
This association provides us with a way to look up the description
for a given ID. Alternatively, we can relate the values using
merge
or a `dplyr
join.
One of the most basic questions we might ask is which land cover
classification is predominant in each concelho. We can do this
with the built-in mode
summary operation. The
minority
and variety
operations are also
applicable to categorical data and provide the least-common
classification and number of distinct classifications, respectively.
landcov_mode <- exact_extract(clc, concelhos, 'mode',
append_cols = 'name', progress = FALSE) %>%
inner_join(clc_classes, by=c(mode = 'value'))
name | landcov |
---|---|
Lagoa | Land principally occupied by agriculture, with significant areas of natural vegetation |
Nordeste | Coniferous forest |
Ponta Delgada | Pastures |
Povoação | Broad-leaved forest |
Ribeira Grande | Land principally occupied by agriculture, with significant areas of natural vegetation |
Vila Franca do Campo | Land principally occupied by agriculture, with significant areas of natural vegetation |
While mode
provides a handy way to see the most common
land cover category, we need to write a custom summary function if we
want to see the frequency of different land cover types in an area.
Summary functions are called once per feature from the input
sf
object. They can return either:
a scalar, in which case the return value of
exact_extract
will be a vector whose entries correspond
with the rows of the input sf
object, or
a data frame, in which case exact_extract
will
return a rowwise combination of the data frames for each feature. If the
data frame returned by the summary function will have than a single row,
it is useful for some identifying information to be included in the
returned data frame.
If we are going to perform typical data frame operations on the
raster values and coverage fractions, it can be more convenient for the
summary function to accept a single data frame argument, instead of
separate arguments for the cell values and coverage fractions. This
behavior can be enabled with the summarize_df
argument.
Using this method, we can calculate the fraction of each concelho that is covered by each land cover category:
landcov_fracs <- exact_extract(clc, concelhos, function(df) {
df %>%
mutate(frac_total = coverage_fraction / sum(coverage_fraction)) %>%
group_by(name, value) %>%
summarize(freq = sum(frac_total))
}, summarize_df = TRUE, include_cols = 'name', progress = FALSE)
Here we use the include_cols
argument here to include
the name
column from concelhos
in the data
frame passed to the summary function. (Although the value of
name
will be the same for all rows in df
, we
include name
in the group_by
expression so
that it is not removed by summarize
.) Other similar
arguments include include_xy
to get the cell center
coordinates, include_area
to get the cell area, and
include_cell
to get the cell index used by the
raster
package.
This provides us with a correspondence between each numeric land cover category and its frequency in each concelho:
head(landcov_fracs)
#> # A tibble: 6 × 3
#> # Groups: name [1]
#> name value freq
#> <chr> <dbl> <dbl>
#> 1 Lagoa 2 0.0630
#> 2 Lagoa 3 0.0136
#> 3 Lagoa 7 0.00659
#> 4 Lagoa 9 0.00637
#> 5 Lagoa 12 0.170
#> 6 Lagoa 18 0.154
Joining this table to clc_classes
, we can associate the
descriptions with the numeric types and view the three most common land
cover classes in each concelho:
landcov_fracs %>%
inner_join(clc_classes, by = 'value') %>%
group_by(name) %>%
arrange(desc(freq)) %>%
slice_head(n = 3) %>%
mutate(freq = sprintf('%0.1f%%', 100*freq)) %>%
knitr::kable()
name | value | freq | landcov |
---|---|---|---|
Lagoa | 21 | 26.8% | Land principally occupied by agriculture, with significant areas of natural vegetation |
Lagoa | 12 | 17.0% | Non-irrigated arable land |
Lagoa | 20 | 15.6% | Complex cultivation patterns |
Nordeste | 24 | 24.3% | Coniferous forest |
Nordeste | 18 | 22.5% | Pastures |
Nordeste | 21 | 14.2% | Land principally occupied by agriculture, with significant areas of natural vegetation |
Ponta Delgada | 18 | 29.9% | Pastures |
Ponta Delgada | 21 | 26.5% | Land principally occupied by agriculture, with significant areas of natural vegetation |
Ponta Delgada | 12 | 11.3% | Non-irrigated arable land |
Povoação | 23 | 24.2% | Broad-leaved forest |
Povoação | 18 | 20.1% | Pastures |
Povoação | 21 | 16.7% | Land principally occupied by agriculture, with significant areas of natural vegetation |
Ribeira Grande | 21 | 25.1% | Land principally occupied by agriculture, with significant areas of natural vegetation |
Ribeira Grande | 18 | 22.7% | Pastures |
Ribeira Grande | 12 | 16.4% | Non-irrigated arable land |
Vila Franca do Campo | 21 | 36.0% | Land principally occupied by agriculture, with significant areas of natural vegetation |
Vila Franca do Campo | 18 | 15.5% | Pastures |
Vila Franca do Campo | 27 | 15.0% | Moors and heathland |
Similarly, we can find the top land covers by area:
landcov_areas <- exact_extract(clc, concelhos, function(df) {
df %>%
group_by(name, value) %>%
summarize(area_km2 = sum(coverage_area) / 1e6)
}, summarize_df = TRUE, coverage_area = TRUE, include_cols = 'name', progress = FALSE)
name | area_km2 | landcov |
---|---|---|
Lagoa | 12.243526 | Land principally occupied by agriculture, with significant areas of natural vegetation |
Lagoa | 7.749522 | Non-irrigated arable land |
Lagoa | 7.129778 | Complex cultivation patterns |
Nordeste | 24.662020 | Coniferous forest |
Nordeste | 22.832392 | Pastures |
Nordeste | 14.404089 | Land principally occupied by agriculture, with significant areas of natural vegetation |
Ponta Delgada | 69.843899 | Pastures |
Ponta Delgada | 61.752290 | Land principally occupied by agriculture, with significant areas of natural vegetation |
Ponta Delgada | 26.315411 | Non-irrigated arable land |
Povoação | 25.798260 | Broad-leaved forest |
Povoação | 21.438728 | Pastures |
Povoação | 17.821336 | Land principally occupied by agriculture, with significant areas of natural vegetation |
Ribeira Grande | 45.394998 | Land principally occupied by agriculture, with significant areas of natural vegetation |
Ribeira Grande | 40.902179 | Pastures |
Ribeira Grande | 29.625931 | Non-irrigated arable land |
Vila Franca do Campo | 28.120871 | Land principally occupied by agriculture, with significant areas of natural vegetation |
Vila Franca do Campo | 12.081576 | Pastures |
Vila Franca do Campo | 11.681901 | Moors and heathland |
One extension of the analysis above is to see which land covers are associated with human population in a given concelho. Is the population primary urban or rural?
As described in the previous vignette, the population density raster provides the most robust results in the presence of partially-covered pixels.
pop_density <- raster(system.file('sao_miguel/gpw_v411_2020_density_2020.tif',
package = 'exactextractr'))
We are able to perform this analysis because the CORINE sample
distributed with exactextractr
has been reprojected from
its native Lambert Equal Area projection into geographic coordinates
consistent with GPW. Otherwise, working with multiple rasters in
different projections requires transformation to a common grid using
tools such as raster::projectRaster
or the
gdalwarp
command-line utility.
Very little about the call to exact_extract
requires
changing to incorporate population. We set
weights = pop_density
and, since we are using the
summarize_df
option, a column called weight
will be added to the data frame passed to the summary function. We also
set coverage_area = TRUE
so that we can multiply the
density by the covered pixel area to get a population count.
landcov_pop_areas <- exact_extract(clc, concelhos, function(df) {
df %>%
group_by(name, value) %>%
summarize(pop = sum(coverage_area * weight / 1e6)) %>%
mutate(pop_frac = pop / sum(pop))
}, weights = pop_density, coverage_area = TRUE,
summarize_df = TRUE, include_cols = 'name')
Looking at the highest-population land cover type in each concelho, we can can see that the western/central concelhos of Lagoa, Ponta Delgada, Ribeira Grande, and Vila Franca do Campo have a more urban population than Nordeste or Povoação to the east.
name | landcov | pop | pop_frac |
---|---|---|---|
Lagoa | Discontinuous urban fabric | 6548 | 0.415 |
Nordeste | Complex cultivation patterns | 1280 | 0.282 |
Ponta Delgada | Discontinuous urban fabric | 27224 | 0.381 |
Povoação | Complex cultivation patterns | 1568 | 0.261 |
Ribeira Grande | Discontinuous urban fabric | 13497 | 0.374 |
Vila Franca do Campo | Discontinuous urban fabric | 4442 | 0.377 |