Compute linearized leaf IDs for microdata — hier_create

This function calculates linearized integer IDs for each record in a micro dataset based on a set of hierarchies. These IDs match the `leaf_id` and `contributing_leaf_ids` generated by [hier_grid()].

hier_create_ids(data, dims)

Arguments

data: a `data.table` containing the microdata.
dims: a named `list` of `sdc_hierarchy` objects. The names of the list elements must correspond to existing column names in `data`.

Value

an integer vector of leaf_ids matching the 'leaf_id' column in [hier_grid()]

Examples

# Setup Hierarchies
h1 <- hier_create("Total", nodes = LETTERS[1:3])
h1 <- hier_add(h1, root = "A", node = "a1")
h1 <- hier_add(h1, root = "a1", node = "aa1") # h1 terminals: aa1, B, C (N=3)

h2 <- hier_create("Total", letters[1:2])      # h2 terminals: a, b (N=2)

# Create the Grid
# With add_dups = FALSE, bogus parents 'A' and 'a1' are removed.
grid <- hier_grid(h1, h2, add_dups = FALSE, add_contributing_cells = TRUE)

# The 'leaf_id' in `grid` is calculated using Column-Major order:
# ID = i1 + (i2 - 1) * N1

# Generate micro data
microdata <- data.table::data.table(
   region = c("aa1", "aa1", "B", "C", "B"),
   sector = c("a", "b", "a", "b", "b"),
   turnover = c(100, 200, 50, 300, 150)
)

# Map the strings in microdata to the same integer leaf_ids.
# We provide a named list where names 'region' and 'sector' match microdata.
microdata[, leaf_id := hier_create_ids(
   data = microdata,
   dims = list(region = h1, sector = h2)
)]
#>    region sector turnover leaf_id
#>    <char> <char>    <num>   <int>
#> 1:    aa1      a      100       3
#> 2:    aa1      b      200       6
#> 3:      B      a       50       1
#> 4:      C      b      300       5
#> 5:      B      b      150       4

# Aggregation Example:
# To get 'Total Region' for 'Sector a' from the grid:
target_cell <- grid[v1 == "Total" & v2 == "a"]
ids <- target_cell$contributing_leaf_ids[[1]]

val_total_a <- sum(microdata[leaf_id %in% ids, turnover])
# Result: 150 (Records: aa1_a [100] + B_a [50])