Introduction

The sdcHierarchies packages allows to create, modify and export nested hierarchies that are used for example to define tables in statistical disclosure control software such as in sdcTable

Usage

Before using, the package needs to be loaded:

Create and modify a hierarchy from scratch

hier_create() allows to create a hierarchy. Argument root specifies the name of the root node. Optionally, it is possible to add some nodes to the top-level by listing their names in argument node_labs. Also, hier_display() shows the hierarchical structure of the current tree as shown below:

h <- hier_create(root = "Total", nodes = LETTERS[1:5])
hier_display(h)
## Total
## ├─A
## ├─B
## ├─C
## ├─D
## └─E

Once such an object is created, it can be modified by the following functions:

These functions can be applied as shown below:

## adding nodes below the node specified in argument `node`
h <- hier_add(h, root = "A", nodes = c("a1", "a2"))
h <- hier_add(h, root = "B", nodes = c("b1", "b2"))
h <- hier_add(h, root = "b1", nodes = c("b1_a", "b1_b"))

# deleting one or more nodes from the hierarchy
h <- hier_delete(h, nodes = c("a1", "b2"))
h <- hier_delete(h, nodes = c("a2"))

# rename nodes
h <- hier_rename(h, nodes = c("C" = "X", "D" = "Y"))
hier_display(h)
## Total
## ├─A
## ├─B
## │ └─b1
## │   ├─b1_a
## │   └─b1_b
## ├─X
## ├─Y
## └─E

We note that the underlying data.tree package allows to modify the objects on reference so no explicit assignment of the form is required.

Information about nodes

Function hier_info() returns information about the nodes that are specified in argument leaves.

# about a specific node
info <- hier_info(h, nodes = c("b1", "E"))

info is a named list where each list element refers to a queried node. The results for level b1 could be extracted as shown below:

info$b1
## $name
## [1] "b1"
## 
## $is_rootnode
## [1] FALSE
## 
## $level
## [1] 3
## 
## $is_leaf
## [1] FALSE
## 
## $siblings
## character(0)
## 
## $contributing_codes
## [1] "b1_a" "b1_b"
## 
## $children
## [1] "b1_a" "b1_b"
## 
## $parent
## [1] "B"
## 
## $is_bogus
## [1] FALSE
## 
## $parent_bogus
## character(0)

Information about all nodes can be extracted by not specifying argument leaves.

Convert to other formats

Function hier_convert() takes a hierarchy and allows to convert the network based structure to different formats while hier_export() does the conversion and writes the results to a file on the disk. The following formats are currently supported:

  • df: a “@;label”-based format that can be used in sdcTable
  • dt: the same as df, but the result is returned as a
  • argus: also a “@;label”-based format that used to create hrc-files suitable for \(\tau\)-argus
  • json: a json-encoded string
  • code: the required code to re-build the current hierarchy
  • sdc: a list which is a suitable input for sdcTable
# conversion to a "@;label"-based format
res_df <- hier_convert(h, as = "df")
print(res_df)
##   level  name
## 1     @ Total
## 2    @@     A
## 3    @@     B
## 4   @@@    b1
## 5  @@@@  b1_a
## 6  @@@@  b1_b
## 7    @@     X
## 8    @@     Y
## 9    @@     E

The required code to create this hierarchy could be computed using:

code <- hier_convert(h, as = "code"); cat(code, sep = "\n")
## library(sdcHierarchies)
## tree <- hier_create(root = 'Total', nodes = c('A', 'B', 'X', 'Y', 'E'))
## tree <- hier_add(tree = tree, root = 'B', nodes = 'b1')
## tree <- hier_add(tree = tree, root = 'b1', nodes = c('b1_a', 'b1_b'))
## print(tree)

Using hier_export() one can write the results to a file. This is for example useful if one wants to create hrc-files that could be used as input for \(\tau\)-argus which can be achieved as follows:

hier_export(h, as = "argus", path = file.path(tempfile(), "hierarchy.hrc"))

Create a hierarchy from data.frames, code or json

hier_import() returns a network-based hierarchy given either a data.frame (in @;labs-format), json format, code or from a tau-argus compatible hrc-file. For example if we want to create a hierarchy based of res_df which was previously created using hier_convert(), the code is as simple as:

n_df <- hier_import(inp = res_df, from = "df")
hier_display(n_df)
## Total
## ├─A
## ├─B
## │ └─b1
## │   ├─b1_a
## │   └─b1_b
## ├─E
## ├─X
## └─Y

Using hier_import(inp = "hierarchy.hrc", from = "argus") one could create a sdc hierarchy object directly from a hrc-file.

Create/Compute hierarchies from a string

Often it is the case, the the nested hierarchy information in encoded in a string. Function hier_compute() allows to transform such strings into hierarchy objects. One can distinguish two cases: The first case is where all input codes have the same length while in the latter case the length of the codes differs. Let’s assume we have a geographic code given in geo_m where digits 1-2 refer to the first level, digit 3 to the second and digits 4-5 to the third level of the hierarchy.

geo_m <- c(
  "01051", "01053", "01054", "01055", "01056", "01057", "01058", "01059", "01060", "01061", "01062",
  "02000",
  "03151", "03152", "03153", "03154", "03155", "03156", "03157", "03158", "03251", "03252", "03254", "03255",
  "03256", "03257", "03351", "03352", "03353", "03354", "03355", "03356", "03357", "03358", "03359", "03360",
  "03361", "03451", "03452", "03453", "03454", "03455", "03456",
  "10155")

Function hier_compute() takes a character vector and creates a hierarchy from it. In argument method, two ways of specifying the encoded levels can be chosen.

  • endpos: an integerish-vector must be specified in argument dim_spec holding the end-position at each level
  • len: an integerish-vector must be specified in argument dim_spec containing for each level how many digits are required

In case the overal total is not encoded in the input, specifying argument root allows to give a name to the overall total. Additionally, it is possible to set the desired output format in parameter as. In the example below setting as = "df" returns the result as a data.frame in @; key-format. The two methods on how to define the positions of the levels are interchangable and lead to the same hierarchy as shown below:

v1 <- hier_compute(
  inp = geo_m, 
  dim_spec = c(2, 3, 5), 
  root = "Tot", 
  method = "endpos", 
  as = "df"
)

v2 <- hier_compute(
  inp = geo_m, 
  dim_spec = c(2, 1, 2), 
  root = "Tot", 
  method = "len",
  as = "df"
)

identical(v1, v2)
## [1] TRUE
## Tot
## ├─01
## │ └─010
## │   ├─01051
## │   ├─01053
## │   ├─01054
## │   ├─01055
## │   ├─01056
## │   ├─01057
## │   ├─01058
## │   ├─01059
## │   ├─01060
## │   ├─01061
## │   └─01062
## ├─02
## │ └─020
## │   └─02000
## ├─03
## │ ├─031
## │ │ ├─03151
## │ │ ├─03152
## │ │ ├─03153
## │ │ ├─03154
## │ │ ├─03155
## │ │ ├─03156
## │ │ ├─03157
## │ │ └─03158
## │ ├─032
## │ │ ├─03251
## │ │ ├─03252
## │ │ ├─03254
## │ │ ├─03255
## │ │ ├─03256
## │ │ └─03257
## │ ├─033
## │ │ ├─03351
## │ │ ├─03352
## │ │ ├─03353
## │ │ ├─03354
## │ │ ├─03355
## │ │ ├─03356
## │ │ ├─03357
## │ │ ├─03358
## │ │ ├─03359
## │ │ ├─03360
## │ │ └─03361
## │ └─034
## │   ├─03451
## │   ├─03452
## │   ├─03453
## │   ├─03454
## │   ├─03455
## │   └─03456
## └─10
##   └─101
##     └─10155

If the total is contained in the string, let’s say in the first 3 positions of the input values, the hierarchy can be computed as follows:

geo_m_with_tot <- paste0("Tot", geo_m)
head(geo_m_with_tot)
## [1] "Tot01051" "Tot01053" "Tot01054" "Tot01055" "Tot01056" "Tot01057"
v3 <- hier_compute(
  inp = geo_m_with_tot, 
  dim_spec = c(3, 2, 1, 2), 
  method = "len"
); hier_display(v3)
## Tot
## ├─01
## │ └─010
## │   ├─01051
## │   ├─01053
## │   ├─01054
## │   ├─01055
## │   ├─01056
## │   ├─01057
## │   ├─01058
## │   ├─01059
## │   ├─01060
## │   ├─01061
## │   └─01062
## ├─02
## │ └─020
## │   └─02000
## ├─03
## │ ├─031
## │ │ ├─03151
## │ │ ├─03152
## │ │ ├─03153
## │ │ ├─03154
## │ │ ├─03155
## │ │ ├─03156
## │ │ ├─03157
## │ │ └─03158
## │ ├─032
## │ │ ├─03251
## │ │ ├─03252
## │ │ ├─03254
## │ │ ├─03255
## │ │ ├─03256
## │ │ └─03257
## │ ├─033
## │ │ ├─03351
## │ │ ├─03352
## │ │ ├─03353
## │ │ ├─03354
## │ │ ├─03355
## │ │ ├─03356
## │ │ ├─03357
## │ │ ├─03358
## │ │ ├─03359
## │ │ ├─03360
## │ │ └─03361
## │ └─034
## │   ├─03451
## │   ├─03452
## │   ├─03453
## │   ├─03454
## │   ├─03455
## │   └─03456
## └─10
##   └─101
##     └─10155

The result is the same as v1 and v2 previously generated.

hier_compute() can also deal with inputs that are of different length as shown in the next example.

## second example, unequal strings; overall total not included in input
yae_h <- c(
  "1.1.1.", "1.1.2.",
  "1.2.1.", "1.2.2.", "1.2.3.", "1.2.4.", "1.2.5.", "1.3.1.",
  "1.3.2.", "1.3.3.", "1.3.4.", "1.3.5.",
  "1.4.1.", "1.4.2.", "1.4.3.", "1.4.4.", "1.4.5.",
  "1.5.", "1.6.", "1.7.", "1.8.", "1.9.", "2.", "3.")
v1 <- hier_compute(
  inp = yae_h, 
  dim_spec = c(2,2,2), 
  root = "Tot", 
  method = "len"
); hier_display(v1)
## Tot
## ├─1.
## │ ├─1.1.
## │ │ ├─1.1.1.
## │ │ └─1.1.2.
## │ ├─1.2.
## │ │ ├─1.2.1.
## │ │ ├─1.2.2.
## │ │ ├─1.2.3.
## │ │ ├─1.2.4.
## │ │ └─1.2.5.
## │ ├─1.3.
## │ │ ├─1.3.1.
## │ │ ├─1.3.2.
## │ │ ├─1.3.3.
## │ │ ├─1.3.4.
## │ │ └─1.3.5.
## │ ├─1.4.
## │ │ ├─1.4.1.
## │ │ ├─1.4.2.
## │ │ ├─1.4.3.
## │ │ ├─1.4.4.
## │ │ └─1.4.5.
## │ ├─1.5.
## │ ├─1.6.
## │ ├─1.7.
## │ ├─1.8.
## │ └─1.9.
## ├─2.
## └─3.

We also note that there is another way to specify the inputs in hier_compute(). Setting argument method = "list" allows to create a hierarchy from a given named list. In such a list, the name of a list element is interpreted as the name of the parent node of all codes of the specific list element. An example is shown below:

yae_ll <- list()
yae_ll[["Total"]] <- c("1.", "2.", "3.")
yae_ll[["1."]] <- paste0("1.", 1:9, ".")
yae_ll[["1.1."]] <- paste0("1.1.", 1:2, ".")
yae_ll[["1.2."]] <- paste0("1.2.", 1:5, ".")
yae_ll[["1.3."]] <- paste0("1.3.", 1:5, ".")
yae_ll[["1.4."]] <- paste0("1.4.", 1:6, ".")
d <- hier_compute(inp = yae_ll, root = "Total", method = "list") 
## Argument 'dim_spec' is ignored when constructing a hierarchy from a nested list.
## Total
## ├─1.
## │ ├─1.1.
## │ │ ├─1.1.1.
## │ │ └─1.1.2.
## │ ├─1.2.
## │ │ ├─1.2.1.
## │ │ ├─1.2.2.
## │ │ ├─1.2.3.
## │ │ ├─1.2.4.
## │ │ └─1.2.5.
## │ ├─1.3.
## │ │ ├─1.3.1.
## │ │ ├─1.3.2.
## │ │ ├─1.3.3.
## │ │ ├─1.3.4.
## │ │ └─1.3.5.
## │ ├─1.4.
## │ │ ├─1.4.1.
## │ │ ├─1.4.2.
## │ │ ├─1.4.3.
## │ │ ├─1.4.4.
## │ │ ├─1.4.5.
## │ │ └─1.4.6.
## │ ├─1.5.
## │ ├─1.6.
## │ ├─1.7.
## │ ├─1.8.
## │ └─1.9.
## ├─2.
## └─3.

Grids

Using hier_grid() it is possible to compute all combinations of codes given several hierarchies. This is useful to build a complete table (e.g for merging purposes). The functionality of hier_grid is shown below. First, we need to specify some hierarchies.

h1 <- hier_create("Total", nodes = LETTERS[1:3])
h1 <- hier_add(h1, root = "A", node = "a1")
h1 <- hier_add(h1, root = "a1", node = "aa1")

h2 <- hier_create("Total", letters[1:5])
h2 <- hier_add(h2, root = "b", node = "b1")
h2 <- hier_add(h2, root = "d", node = "d1")

Note that we - on purpose - added some “bogus” codes to each h1 and h2 as codes a1 and aa1 in h1 and b1 and d1 in h2 are just identical to their respective parent categories. Applying hier_grid is as simple as

hier_grid(h1, h2)
##        v1    v2
##  1: Total Total
##  2:     A Total
##  3:    a1 Total
##  4:   aa1 Total
##  5:     B Total
##  6:     C Total
##  7: Total     a
##  8:     A     a
##  9:    a1     a
## 10:   aa1     a
## 11:     B     a
## 12:     C     a
## 13: Total     b
## 14:     A     b
## 15:    a1     b
## 16:   aa1     b
## 17:     B     b
## 18:     C     b
## 19: Total    b1
## 20:     A    b1
## 21:    a1    b1
## 22:   aa1    b1
## 23:     B    b1
## 24:     C    b1
## 25: Total     c
## 26:     A     c
## 27:    a1     c
## 28:   aa1     c
## 29:     B     c
## 30:     C     c
## 31: Total     d
## 32:     A     d
## 33:    a1     d
## 34:   aa1     d
## 35:     B     d
## 36:     C     d
## 37: Total    d1
## 38:     A    d1
## 39:    a1    d1
## 40:   aa1    d1
## 41:     B    d1
## 42:     C    d1
## 43: Total     e
## 44:     A     e
## 45:    a1     e
## 46:   aa1     e
## 47:     B     e
## 48:     C     e
##        v1    v2

separating all target hierarchies with a ,. hier_grid then computes all combinations of codes from hierarchies h1 and h2. Using the default options, these bogus codes are included in the output data.table. Setting argument add_dups = FALSE removes all rows containing such bogus codes. Setting option add_levs = TRUE adds some columns labeled levs_v{n} to the output data set. Each of this colum contains values which define the hierarchy level of the corresponding code given in variable v{n} in the same row in the table as shown below.

hier_grid(h1, h2, add_dups = FALSE, add_levs = TRUE)
##        v1    v2 levs_v1 levs_v2
##  1: Total Total       1       1
##  2:     A Total       2       1
##  3:     B Total       2       1
##  4:     C Total       2       1
##  5: Total     a       1       2
##  6:     A     a       2       2
##  7:     B     a       2       2
##  8:     C     a       2       2
##  9: Total     b       1       2
## 10:     A     b       2       2
## 11:     B     b       2       2
## 12:     C     b       2       2
## 13: Total     c       1       2
## 14:     A     c       2       2
## 15:     B     c       2       2
## 16:     C     c       2       2
## 17: Total     d       1       2
## 18:     A     d       2       2
## 19:     B     d       2       2
## 20:     C     d       2       2
## 21: Total     e       1       2
## 22:     A     e       2       2
## 23:     B     e       2       2
## 24:     C     e       2       2
##        v1    v2 levs_v1 levs_v2

Interactively create or modify hierarchies

The package also contains a shiny-based interactive app that can be started using hier_app(). The app allows to pass as input either a character vector (that should be converted into a hierarchy) or an existing hierarchy and can be started as follows given the hierarchy previously generated using hier_compute():

d <- hier_app(d)

If a character vector is passed to hier_app(), the interface allows to specify the arguments for hier_compute(). Once a hierarchy is created, the interface changes and the tree can be dynamically changed by dragging nodes around. Futhermore, it is possible to add, remove or rename nodes. The required code to construct the current hierarchy is displayed and can be saved to disk. Furthermore, there is functionality to undo the last step as well as to export results to either the R-session or write results to a file. This is especially helpful if one wants to create for example an hrc-file as input for \(\tau\)-argus. Please note that hier_app() is able to return the modified hierarchy and not only save results to disk. In order to continue working, one may assign the result to a new object as shown in the code above.

Summary

In case you have any suggestions or improvements, please feel free to file an issue at our issue tracker or contribute to the package by filing a pull request against the master branch.