Skip to contents

Introduction

In tplyr2, a table is defined by its specification. The tplyr_spec() function captures the full configuration – column variables, filters, treatment groups, population data, and layers – as a pure description of what you want. No data processing happens until you call tplyr_build(). This vignette covers the spec-level parameters that control the overall structure of your table.

Every tplyr2 workflow follows two steps: define a spec with tplyr_spec(), then build the table with tplyr_build(spec, data). Let’s look at an example using the included tplyr_adsl dataset.

spec <- tplyr_spec(
  cols = "TRT01P",
  layers = tplyr_layers(
    group_count(target_var = "SEX")
  )
)

result <- tplyr_build(spec, tplyr_adsl)
kable(result[, c("rowlabel1", grep("^res", names(result), value = TRUE))])
rowlabel1 res1 res2 res3
F 53 (61.6%) 40 (47.6%) 50 (59.5%)
M 33 (38.4%) 44 (52.4%) 34 (40.5%)

Note how the cols parameter defines the column structure of the output. Each unique value of TRT01P becomes a result column, and the column labels automatically include the group count as (N=xx).

Column Variables

The cols parameter accepts a character vector of one or more variable names that define the columns of your output table. The most common case is a single treatment variable:

spec <- tplyr_spec(
  cols = "TRT01P",
  layers = tplyr_layers(
    group_count(target_var = "AGEGR1")
  )
)

result <- tplyr_build(spec, tplyr_adsl)
kable(result[, c("rowlabel1", grep("^res", names(result), value = TRUE))])
rowlabel1 res1 res2 res3
65-80 42 (48.8%) 55 (65.5%) 47 (56.0%)
<65 14 (16.3%) 11 (13.1%) 8 ( 9.5%)
>80 30 (34.9%) 18 (21.4%) 29 (34.5%)

Multiple Column Variables

When you provide multiple variables, tplyr2 creates a cross of all combinations. This is useful when you need columns split by treatment and another variable.

spec <- tplyr_spec(
  cols = c("TRT01P", "SEX"),
  layers = tplyr_layers(
    group_count(target_var = "AGEGR1")
  )
)

result <- tplyr_build(spec, tplyr_adsl)
kable(result[, c("rowlabel1", grep("^res", names(result), value = TRUE))])
rowlabel1 res1 res2 res3 res4 res5 res6
65-80 22 (41.5%) 20 (60.6%) 28 (70.0%) 27 (61.4%) 28 (56.0%) 19 (55.9%)
<65 9 (17.0%) 5 (15.2%) 5 (12.5%) 6 (13.6%) 5 (10.0%) 3 ( 8.8%)
>80 22 (41.5%) 8 (24.2%) 7 (17.5%) 11 (25.0%) 17 (34.0%) 12 (35.3%)

Notice that the column labels use a " | " separator to show the cross of treatment and sex, and each combination gets its own N.

Table-Level Filtering with where

The where parameter applies a filter to all data before any layer processing begins. This is useful when records should be excluded from the entire table.

spec <- tplyr_spec(
  cols = "TRT01P",
  where = SAFFL == "Y",
  layers = tplyr_layers(
    group_count(target_var = "AGEGR1", by = "Age Group"),
    group_desc(
      target_var = "AGE",
      by = "Age (Years)",
      settings = layer_settings(
        format_strings = list(
          "n"         = f_str("xxx", "n"),
          "Mean (SD)" = f_str("xx.x (xx.xx)", "mean", "sd"),
          "Median"    = f_str("xx.x", "median"),
          "Min, Max"  = f_str("xx, xx", "min", "max")
        )
      )
    )
  )
)

result <- tplyr_build(spec, tplyr_adsl)
kable(result[, c("rowlabel1", "rowlabel2", grep("^res", names(result), value = TRUE))])
rowlabel1 rowlabel2 res1 res2 res3
Age Group 65-80 42 (48.8%) 55 (65.5%) 47 (56.0%)
Age Group <65 14 (16.3%) 11 (13.1%) 8 ( 9.5%)
Age Group >80 30 (34.9%) 18 (21.4%) 29 (34.5%)
Age (Years) n 86 84 84
Age (Years) Mean (SD) 75.2 ( 8.59) 74.4 ( 7.89) 75.7 ( 8.29)
Age (Years) Median 76.0 76.0 77.5
Age (Years) Min, Max 52, 89 56, 88 51, 88

Both the count and descriptive statistics layers are computed on the safety population. Individual layers can also have their own where filters, which are applied in addition to the table-level filter.

Treatment Groups

Clinical tables often need columns beyond the individual treatment arms. tplyr2 provides total groups and custom groups for this purpose.

Total Groups

A total group creates a synthetic column that includes all subjects by duplicating every row with the column variable set to the total group label.

spec <- tplyr_spec(
  cols = "TRT01P",
  total_groups = list(
    total_group("TRT01P", label = "Total")
  ),
  layers = tplyr_layers(
    group_count(target_var = "SEX")
  )
)

result <- tplyr_build(spec, tplyr_adsl)
kable(result[, c("rowlabel1", grep("^res", names(result), value = TRUE))])
rowlabel1 res1 res2 res3 res4
F 53 (61.6%) 143 (56.3%) 40 (47.6%) 50 (59.5%)
M 33 (38.4%) 111 (43.7%) 44 (52.4%) 34 (40.5%)

The “Total” column now appears alongside the individual treatment arms, with its N reflecting the sum of all subjects.

Custom Groups

Custom groups combine specific treatment levels into a new group. For example, you might pool the two active dose groups together.

spec <- tplyr_spec(
  cols = "TRT01P",
  custom_groups = list(
    custom_group(
      "TRT01P",
      "Xanomeline" = c("Xanomeline High Dose", "Xanomeline Low Dose")
    )
  ),
  layers = tplyr_layers(
    group_count(target_var = "SEX")
  )
)

result <- tplyr_build(spec, tplyr_adsl)
kable(result[, c("rowlabel1", grep("^res", names(result), value = TRUE))])
rowlabel1 res1 res2 res3 res4
F 53 (61.6%) 90 (53.6%) 40 (47.6%) 50 (59.5%)
M 33 (38.4%) 78 (46.4%) 44 (52.4%) 34 (40.5%)

The “Xanomeline” column includes all subjects from both dose groups, while the original dose-level columns are preserved.

Combining Total and Custom Groups

You can use both together. Custom groups are applied first, and then total groups duplicate all rows (including the custom group rows). This means the “Total” column will include subjects from the custom group as well.

spec <- tplyr_spec(
  cols = "TRT01P",
  custom_groups = list(
    custom_group(
      "TRT01P",
      "Xanomeline" = c("Xanomeline High Dose", "Xanomeline Low Dose")
    )
  ),
  total_groups = list(
    total_group("TRT01P", label = "Total")
  ),
  layers = tplyr_layers(
    group_count(target_var = "SEX")
  )
)

result <- tplyr_build(spec, tplyr_adsl)
kable(result[, c("rowlabel1", grep("^res", names(result), value = TRUE))])
rowlabel1 res1 res2 res3 res4 res5
F 53 (61.6%) 233 (55.2%) 90 (53.6%) 40 (47.6%) 50 (59.5%)
M 33 (38.4%) 189 (44.8%) 78 (46.4%) 44 (52.4%) 34 (40.5%)

Population Data

In many clinical analyses, denominators and header Ns should come from a different dataset than the analysis data. The classic example is an adverse event table: ADAE only contains subjects who experienced events, but percentages should reflect the full safety population from ADSL.

The pop_data() configuration specifies how the population dataset maps to the spec. The actual data is provided at build time.

spec <- tplyr_spec(
  cols = "TRTA",
  pop_data = pop_data(cols = c("TRTA" = "TRT01A")),
  layers = tplyr_layers(
    group_count(
      target_var = "AEBODSYS",
      settings = layer_settings(
        distinct_by = "USUBJID"
      )
    )
  )
)

result <- tplyr_build(spec, tplyr_adae, pop_data = tplyr_adsl)
kable(head(result[, c("rowlabel1", grep("^res", names(result), value = TRUE))], 8))
rowlabel1 res1 res2 res3
CARDIAC DISORDERS 5 ( 5.8%) 6 ( 7.1%) 6 ( 7.1%)
CONGENITAL, FAMILIAL AND GENETIC DISORDERS 0 ( 0.0%) 1 ( 1.2%) 0 ( 0.0%)
GASTROINTESTINAL DISORDERS 6 ( 7.0%) 6 ( 7.1%) 3 ( 3.6%)
GENERAL DISORDERS AND ADMINISTRATION SITE CONDITIONS 11 (12.8%) 21 (25.0%) 21 (25.0%)
IMMUNE SYSTEM DISORDERS 0 ( 0.0%) 0 ( 0.0%) 1 ( 1.2%)
INFECTIONS AND INFESTATIONS 5 ( 5.8%) 4 ( 4.8%) 3 ( 3.6%)
INJURY, POISONING AND PROCEDURAL COMPLICATIONS 2 ( 2.3%) 2 ( 2.4%) 2 ( 2.4%)
INVESTIGATIONS 3 ( 3.5%) 1 ( 1.2%) 1 ( 1.2%)

A few things to note:

  • cols = "TRTA" matches the treatment variable in ADAE.
  • pop_data(cols = c("TRTA" = "TRT01A")) maps TRT01A in the population data to TRTA in the analysis data (format: c("analysis_name" = "pop_name")).
  • distinct_by = "USUBJID" counts each subject once per body system.
  • Denominators and column Ns come from the full tplyr_adsl population.

Extracting Header N

After building a table with population data, you can extract the header N values using tplyr_header_n():

header_n <- tplyr_header_n(result)
kable(header_n)
TRTA .n
Placebo 86
Xanomeline High Dose 84
Xanomeline Low Dose 84

This is useful when you need to programmatically construct column headers or integrate with other reporting tools.

Population Data with Filters

The population data is not subject to the spec-level where filter. It uses its own where clause, specified in the pop_data() call:

spec <- tplyr_spec(
  cols = "TRTA",
  pop_data = pop_data(
    cols = c("TRTA" = "TRT01A"),
    where = SAFFL == "Y"
  ),
  layers = tplyr_layers(
    group_count(
      target_var = "AEBODSYS",
      settings = layer_settings(
        distinct_by = "USUBJID"
      )
    )
  )
)

result <- tplyr_build(spec, tplyr_adae, pop_data = tplyr_adsl)
kable(tplyr_header_n(result))
TRTA .n
Placebo 86
Xanomeline High Dose 84
Xanomeline Low Dose 84

This separation is intentional. The table-level where controls which records are summarized, while pop_data where controls which subjects contribute to denominators. In practice these often differ – you might filter AE records to treatment-emergent events while basing denominators on the full safety population.

Data Completion

When building count layers, tplyr2 automatically completes all combinations of factor levels and cross-variables. If a treatment group has zero subjects with a given characteristic, a 0 (0.0%) row still appears rather than being dropped.

spec <- tplyr_spec(
  cols = "TRT01P",
  layers = tplyr_layers(
    group_count(target_var = "RACE")
  )
)

result <- tplyr_build(spec, tplyr_adsl)
kable(result[, c("rowlabel1", grep("^res", names(result), value = TRUE))])
rowlabel1 res1 res2 res3
AMERICAN INDIAN OR ALASKA NATIVE 0 ( 0.0%) 1 ( 1.2%) 0 ( 0.0%)
BLACK OR AFRICAN AMERICAN 8 ( 9.3%) 9 (10.7%) 6 ( 7.1%)
WHITE 78 (90.7%) 74 (88.1%) 78 (92.9%)

Every race category appears for every treatment group, even when the count is zero.

Limiting Completion with limit_data_by

Sometimes completing all combinations is too aggressive. The limit_data_by parameter in layer_settings() restricts the completion grid to combinations that actually exist in the data. This is essential for AE tables where preferred terms should only appear under their actual body system:

spec <- tplyr_spec(
  cols = "TRTA",
  pop_data = pop_data(cols = c("TRTA" = "TRT01A")),
  layers = tplyr_layers(
    group_count(
      target_var = "AEDECOD",
      by = "AEBODSYS",
      settings = layer_settings(
        distinct_by = "USUBJID",
        limit_data_by = c("AEBODSYS", "AEDECOD")
      )
    )
  )
)

result <- tplyr_build(spec, tplyr_adae, pop_data = tplyr_adsl)
kable(head(result[, c("rowlabel1", "rowlabel2", grep("^res", names(result), value = TRUE))], 10))
rowlabel1 rowlabel2 res1 res2 res3
CARDIAC DISORDERS ATRIAL FIBRILLATION 0 ( 0.0%) 0 ( 0.0%) 1 ( 1.2%)
CARDIAC DISORDERS ATRIAL FLUTTER 0 ( 0.0%) 1 ( 1.2%) 0 ( 0.0%)
CARDIAC DISORDERS ATRIAL HYPERTROPHY 1 ( 1.2%) 0 ( 0.0%) 0 ( 0.0%)
CARDIAC DISORDERS BUNDLE BRANCH BLOCK RIGHT 1 ( 1.2%) 0 ( 0.0%) 0 ( 0.0%)
CARDIAC DISORDERS CARDIAC FAILURE CONGESTIVE 1 ( 1.2%) 0 ( 0.0%) 0 ( 0.0%)
CARDIAC DISORDERS MYOCARDIAL INFARCTION 0 ( 0.0%) 1 ( 1.2%) 2 ( 2.4%)
CARDIAC DISORDERS SINUS BRADYCARDIA 0 ( 0.0%) 3 ( 3.6%) 1 ( 1.2%)
CARDIAC DISORDERS SUPRAVENTRICULAR EXTRASYSTOLES 1 ( 1.2%) 0 ( 0.0%) 1 ( 1.2%)
CARDIAC DISORDERS SUPRAVENTRICULAR TACHYCARDIA 0 ( 0.0%) 0 ( 0.0%) 1 ( 1.2%)
CARDIAC DISORDERS TACHYCARDIA 1 ( 1.2%) 0 ( 0.0%) 0 ( 0.0%)

With limit_data_by = c("AEBODSYS", "AEDECOD"), tplyr2 only creates rows for body system/preferred term combinations that exist in the data, while still filling in zeros for treatment groups with no events for a given combination.

Where to Go From Here

This vignette covered the table-level properties that control the overall structure of your tplyr2 output. For details on specific layer types and additional features, see:

  • Count layers: group_count() for frequency tables, including nested counts, distinct subject counts, and missing value handling
  • Descriptive statistics layers: group_desc() for summary statistics with format strings, auto-precision, and custom summary functions
  • Shift layers: group_shift() for baseline-by-post-baseline cross-tabulations
  • Ordering: How tplyr2 sorts rows and controls output order
  • Options: Package-level options via tplyr2_options()