Table Properties
table.RmdIntroduction
In tplyr2, a table is defined by its specification.
The tplyr_spec() function captures the full configuration –
column variables, filters, treatment groups, population data, and layers
– as a pure description of what you want. No data processing happens
until you call tplyr_build(). This vignette covers the
spec-level parameters that control the overall structure of your
table.
Every tplyr2 workflow follows two steps: define a
spec with tplyr_spec(), then build the
table with tplyr_build(spec, data). Let’s look at an
example using the included tplyr_adsl dataset.
spec <- tplyr_spec(
cols = "TRT01P",
layers = tplyr_layers(
group_count(target_var = "SEX")
)
)
result <- tplyr_build(spec, tplyr_adsl)
kable(result[, c("rowlabel1", grep("^res", names(result), value = TRUE))])| rowlabel1 | res1 | res2 | res3 |
|---|---|---|---|
| F | 53 (61.6%) | 40 (47.6%) | 50 (59.5%) |
| M | 33 (38.4%) | 44 (52.4%) | 34 (40.5%) |
Note how the cols parameter defines the column structure
of the output. Each unique value of TRT01P becomes a result
column, and the column labels automatically include the group count as
(N=xx).
Column Variables
The cols parameter accepts a character vector of one or
more variable names that define the columns of your output table. The
most common case is a single treatment variable:
spec <- tplyr_spec(
cols = "TRT01P",
layers = tplyr_layers(
group_count(target_var = "AGEGR1")
)
)
result <- tplyr_build(spec, tplyr_adsl)
kable(result[, c("rowlabel1", grep("^res", names(result), value = TRUE))])| rowlabel1 | res1 | res2 | res3 |
|---|---|---|---|
| 65-80 | 42 (48.8%) | 55 (65.5%) | 47 (56.0%) |
| <65 | 14 (16.3%) | 11 (13.1%) | 8 ( 9.5%) |
| >80 | 30 (34.9%) | 18 (21.4%) | 29 (34.5%) |
Multiple Column Variables
When you provide multiple variables, tplyr2 creates a cross of all combinations. This is useful when you need columns split by treatment and another variable.
spec <- tplyr_spec(
cols = c("TRT01P", "SEX"),
layers = tplyr_layers(
group_count(target_var = "AGEGR1")
)
)
result <- tplyr_build(spec, tplyr_adsl)
kable(result[, c("rowlabel1", grep("^res", names(result), value = TRUE))])| rowlabel1 | res1 | res2 | res3 | res4 | res5 | res6 |
|---|---|---|---|---|---|---|
| 65-80 | 22 (41.5%) | 20 (60.6%) | 28 (70.0%) | 27 (61.4%) | 28 (56.0%) | 19 (55.9%) |
| <65 | 9 (17.0%) | 5 (15.2%) | 5 (12.5%) | 6 (13.6%) | 5 (10.0%) | 3 ( 8.8%) |
| >80 | 22 (41.5%) | 8 (24.2%) | 7 (17.5%) | 11 (25.0%) | 17 (34.0%) | 12 (35.3%) |
Notice that the column labels use a " | " separator to
show the cross of treatment and sex, and each combination gets its own
N.
Table-Level Filtering with where
The where parameter applies a filter to all data before
any layer processing begins. This is useful when records should be
excluded from the entire table.
spec <- tplyr_spec(
cols = "TRT01P",
where = SAFFL == "Y",
layers = tplyr_layers(
group_count(target_var = "AGEGR1", by = "Age Group"),
group_desc(
target_var = "AGE",
by = "Age (Years)",
settings = layer_settings(
format_strings = list(
"n" = f_str("xxx", "n"),
"Mean (SD)" = f_str("xx.x (xx.xx)", "mean", "sd"),
"Median" = f_str("xx.x", "median"),
"Min, Max" = f_str("xx, xx", "min", "max")
)
)
)
)
)
result <- tplyr_build(spec, tplyr_adsl)
kable(result[, c("rowlabel1", "rowlabel2", grep("^res", names(result), value = TRUE))])| rowlabel1 | rowlabel2 | res1 | res2 | res3 |
|---|---|---|---|---|
| Age Group | 65-80 | 42 (48.8%) | 55 (65.5%) | 47 (56.0%) |
| Age Group | <65 | 14 (16.3%) | 11 (13.1%) | 8 ( 9.5%) |
| Age Group | >80 | 30 (34.9%) | 18 (21.4%) | 29 (34.5%) |
| Age (Years) | n | 86 | 84 | 84 |
| Age (Years) | Mean (SD) | 75.2 ( 8.59) | 74.4 ( 7.89) | 75.7 ( 8.29) |
| Age (Years) | Median | 76.0 | 76.0 | 77.5 |
| Age (Years) | Min, Max | 52, 89 | 56, 88 | 51, 88 |
Both the count and descriptive statistics layers are computed on the
safety population. Individual layers can also have their own
where filters, which are applied in addition to the
table-level filter.
Treatment Groups
Clinical tables often need columns beyond the individual treatment arms. tplyr2 provides total groups and custom groups for this purpose.
Total Groups
A total group creates a synthetic column that includes all subjects by duplicating every row with the column variable set to the total group label.
spec <- tplyr_spec(
cols = "TRT01P",
total_groups = list(
total_group("TRT01P", label = "Total")
),
layers = tplyr_layers(
group_count(target_var = "SEX")
)
)
result <- tplyr_build(spec, tplyr_adsl)
kable(result[, c("rowlabel1", grep("^res", names(result), value = TRUE))])| rowlabel1 | res1 | res2 | res3 | res4 |
|---|---|---|---|---|
| F | 53 (61.6%) | 143 (56.3%) | 40 (47.6%) | 50 (59.5%) |
| M | 33 (38.4%) | 111 (43.7%) | 44 (52.4%) | 34 (40.5%) |
The “Total” column now appears alongside the individual treatment arms, with its N reflecting the sum of all subjects.
Custom Groups
Custom groups combine specific treatment levels into a new group. For example, you might pool the two active dose groups together.
spec <- tplyr_spec(
cols = "TRT01P",
custom_groups = list(
custom_group(
"TRT01P",
"Xanomeline" = c("Xanomeline High Dose", "Xanomeline Low Dose")
)
),
layers = tplyr_layers(
group_count(target_var = "SEX")
)
)
result <- tplyr_build(spec, tplyr_adsl)
kable(result[, c("rowlabel1", grep("^res", names(result), value = TRUE))])| rowlabel1 | res1 | res2 | res3 | res4 |
|---|---|---|---|---|
| F | 53 (61.6%) | 90 (53.6%) | 40 (47.6%) | 50 (59.5%) |
| M | 33 (38.4%) | 78 (46.4%) | 44 (52.4%) | 34 (40.5%) |
The “Xanomeline” column includes all subjects from both dose groups, while the original dose-level columns are preserved.
Combining Total and Custom Groups
You can use both together. Custom groups are applied first, and then total groups duplicate all rows (including the custom group rows). This means the “Total” column will include subjects from the custom group as well.
spec <- tplyr_spec(
cols = "TRT01P",
custom_groups = list(
custom_group(
"TRT01P",
"Xanomeline" = c("Xanomeline High Dose", "Xanomeline Low Dose")
)
),
total_groups = list(
total_group("TRT01P", label = "Total")
),
layers = tplyr_layers(
group_count(target_var = "SEX")
)
)
result <- tplyr_build(spec, tplyr_adsl)
kable(result[, c("rowlabel1", grep("^res", names(result), value = TRUE))])| rowlabel1 | res1 | res2 | res3 | res4 | res5 |
|---|---|---|---|---|---|
| F | 53 (61.6%) | 233 (55.2%) | 90 (53.6%) | 40 (47.6%) | 50 (59.5%) |
| M | 33 (38.4%) | 189 (44.8%) | 78 (46.4%) | 44 (52.4%) | 34 (40.5%) |
Population Data
In many clinical analyses, denominators and header Ns should come
from a different dataset than the analysis data. The classic example is
an adverse event table: ADAE only contains subjects who
experienced events, but percentages should reflect the full safety
population from ADSL.
The pop_data() configuration specifies how the
population dataset maps to the spec. The actual data is provided at
build time.
spec <- tplyr_spec(
cols = "TRTA",
pop_data = pop_data(cols = c("TRTA" = "TRT01A")),
layers = tplyr_layers(
group_count(
target_var = "AEBODSYS",
settings = layer_settings(
distinct_by = "USUBJID"
)
)
)
)
result <- tplyr_build(spec, tplyr_adae, pop_data = tplyr_adsl)
kable(head(result[, c("rowlabel1", grep("^res", names(result), value = TRUE))], 8))| rowlabel1 | res1 | res2 | res3 |
|---|---|---|---|
| CARDIAC DISORDERS | 5 ( 5.8%) | 6 ( 7.1%) | 6 ( 7.1%) |
| CONGENITAL, FAMILIAL AND GENETIC DISORDERS | 0 ( 0.0%) | 1 ( 1.2%) | 0 ( 0.0%) |
| GASTROINTESTINAL DISORDERS | 6 ( 7.0%) | 6 ( 7.1%) | 3 ( 3.6%) |
| GENERAL DISORDERS AND ADMINISTRATION SITE CONDITIONS | 11 (12.8%) | 21 (25.0%) | 21 (25.0%) |
| IMMUNE SYSTEM DISORDERS | 0 ( 0.0%) | 0 ( 0.0%) | 1 ( 1.2%) |
| INFECTIONS AND INFESTATIONS | 5 ( 5.8%) | 4 ( 4.8%) | 3 ( 3.6%) |
| INJURY, POISONING AND PROCEDURAL COMPLICATIONS | 2 ( 2.3%) | 2 ( 2.4%) | 2 ( 2.4%) |
| INVESTIGATIONS | 3 ( 3.5%) | 1 ( 1.2%) | 1 ( 1.2%) |
A few things to note:
-
cols = "TRTA"matches the treatment variable inADAE. -
pop_data(cols = c("TRTA" = "TRT01A"))mapsTRT01Ain the population data toTRTAin the analysis data (format:c("analysis_name" = "pop_name")). -
distinct_by = "USUBJID"counts each subject once per body system. - Denominators and column Ns come from the full
tplyr_adslpopulation.
Extracting Header N
After building a table with population data, you can extract the
header N values using tplyr_header_n():
header_n <- tplyr_header_n(result)
kable(header_n)| TRTA | .n |
|---|---|
| Placebo | 86 |
| Xanomeline High Dose | 84 |
| Xanomeline Low Dose | 84 |
This is useful when you need to programmatically construct column headers or integrate with other reporting tools.
Population Data with Filters
The population data is not subject to the spec-level
where filter. It uses its own where clause,
specified in the pop_data() call:
spec <- tplyr_spec(
cols = "TRTA",
pop_data = pop_data(
cols = c("TRTA" = "TRT01A"),
where = SAFFL == "Y"
),
layers = tplyr_layers(
group_count(
target_var = "AEBODSYS",
settings = layer_settings(
distinct_by = "USUBJID"
)
)
)
)
result <- tplyr_build(spec, tplyr_adae, pop_data = tplyr_adsl)
kable(tplyr_header_n(result))| TRTA | .n |
|---|---|
| Placebo | 86 |
| Xanomeline High Dose | 84 |
| Xanomeline Low Dose | 84 |
This separation is intentional. The table-level where
controls which records are summarized, while pop_data
where controls which subjects contribute to denominators.
In practice these often differ – you might filter AE records to
treatment-emergent events while basing denominators on the full safety
population.
Data Completion
When building count layers, tplyr2 automatically completes all
combinations of factor levels and cross-variables. If a treatment group
has zero subjects with a given characteristic, a 0 (0.0%)
row still appears rather than being dropped.
spec <- tplyr_spec(
cols = "TRT01P",
layers = tplyr_layers(
group_count(target_var = "RACE")
)
)
result <- tplyr_build(spec, tplyr_adsl)
kable(result[, c("rowlabel1", grep("^res", names(result), value = TRUE))])| rowlabel1 | res1 | res2 | res3 |
|---|---|---|---|
| AMERICAN INDIAN OR ALASKA NATIVE | 0 ( 0.0%) | 1 ( 1.2%) | 0 ( 0.0%) |
| BLACK OR AFRICAN AMERICAN | 8 ( 9.3%) | 9 (10.7%) | 6 ( 7.1%) |
| WHITE | 78 (90.7%) | 74 (88.1%) | 78 (92.9%) |
Every race category appears for every treatment group, even when the count is zero.
Limiting Completion with limit_data_by
Sometimes completing all combinations is too aggressive. The
limit_data_by parameter in layer_settings()
restricts the completion grid to combinations that actually exist in the
data. This is essential for AE tables where preferred terms should only
appear under their actual body system:
spec <- tplyr_spec(
cols = "TRTA",
pop_data = pop_data(cols = c("TRTA" = "TRT01A")),
layers = tplyr_layers(
group_count(
target_var = "AEDECOD",
by = "AEBODSYS",
settings = layer_settings(
distinct_by = "USUBJID",
limit_data_by = c("AEBODSYS", "AEDECOD")
)
)
)
)
result <- tplyr_build(spec, tplyr_adae, pop_data = tplyr_adsl)
kable(head(result[, c("rowlabel1", "rowlabel2", grep("^res", names(result), value = TRUE))], 10))| rowlabel1 | rowlabel2 | res1 | res2 | res3 |
|---|---|---|---|---|
| CARDIAC DISORDERS | ATRIAL FIBRILLATION | 0 ( 0.0%) | 0 ( 0.0%) | 1 ( 1.2%) |
| CARDIAC DISORDERS | ATRIAL FLUTTER | 0 ( 0.0%) | 1 ( 1.2%) | 0 ( 0.0%) |
| CARDIAC DISORDERS | ATRIAL HYPERTROPHY | 1 ( 1.2%) | 0 ( 0.0%) | 0 ( 0.0%) |
| CARDIAC DISORDERS | BUNDLE BRANCH BLOCK RIGHT | 1 ( 1.2%) | 0 ( 0.0%) | 0 ( 0.0%) |
| CARDIAC DISORDERS | CARDIAC FAILURE CONGESTIVE | 1 ( 1.2%) | 0 ( 0.0%) | 0 ( 0.0%) |
| CARDIAC DISORDERS | MYOCARDIAL INFARCTION | 0 ( 0.0%) | 1 ( 1.2%) | 2 ( 2.4%) |
| CARDIAC DISORDERS | SINUS BRADYCARDIA | 0 ( 0.0%) | 3 ( 3.6%) | 1 ( 1.2%) |
| CARDIAC DISORDERS | SUPRAVENTRICULAR EXTRASYSTOLES | 1 ( 1.2%) | 0 ( 0.0%) | 1 ( 1.2%) |
| CARDIAC DISORDERS | SUPRAVENTRICULAR TACHYCARDIA | 0 ( 0.0%) | 0 ( 0.0%) | 1 ( 1.2%) |
| CARDIAC DISORDERS | TACHYCARDIA | 1 ( 1.2%) | 0 ( 0.0%) | 0 ( 0.0%) |
With limit_data_by = c("AEBODSYS", "AEDECOD"), tplyr2
only creates rows for body system/preferred term combinations that exist
in the data, while still filling in zeros for treatment groups with no
events for a given combination.
Where to Go From Here
This vignette covered the table-level properties that control the overall structure of your tplyr2 output. For details on specific layer types and additional features, see:
-
Count layers:
group_count()for frequency tables, including nested counts, distinct subject counts, and missing value handling -
Descriptive statistics layers:
group_desc()for summary statistics with format strings, auto-precision, and custom summary functions -
Shift layers:
group_shift()for baseline-by-post-baseline cross-tabulations - Ordering: How tplyr2 sorts rows and controls output order
-
Options: Package-level options via
tplyr2_options()