Summarize Multiple Regions with Optimized Tile Scheduling
Source:R/summarize.R
summarize_regions_streaming.RdEfficiently summarizes embeddings for multiple regions by:
Building a region-to-tile mapping
Processing tiles in an order that minimizes redundant downloads
Computing streaming statistics for each region
Cleaning up tiles as soon as no regions need them
Usage
summarize_regions_streaming(
gt,
regions,
year,
region_ids = NULL,
sample_rate = 1,
mask_to_region = TRUE,
seed = NULL,
progress = TRUE
)Arguments
- gt
GeoTessera object
- regions
List of sf objects, or a single sf/sfc with multiple features
- year
Integer year
- region_ids
Optional character vector of region identifiers. If NULL, uses row indices or names from the regions.
- sample_rate
Fraction of pixels to sample per tile (0-1). Default 1.0.
- mask_to_region
If TRUE, only include pixels inside each region's polygon. Default TRUE.
- seed
Random seed for reproducible sampling
- progress
Show progress. Default TRUE.
Value
Named list with:
summaries: Named list of mean embeddings per region
pixel_counts: Named vector of pixel counts per region
metadata: Processing statistics
Details
This is much more efficient than processing regions independently when regions share tiles (e.g., adjacent administrative units).
Examples
if (FALSE) { # \dontrun{
library(sf)
gt <- geotessera()
# Load LGAs for a state
lgas <- st_read("nigeria_lgas.shp")
state_lgas <- lgas[lgas$state == "Abia", ]
# Summarize all LGAs efficiently
result <- summarize_regions_streaming(
gt = gt,
regions = state_lgas,
year = 2024,
region_ids = state_lgas$adminName,
sample_rate = 0.1
)
# Access results
result$summaries[["Aba North"]]
} # }