Frequency Tables

At the surface, counting sounds pretty simple, right? You just want to know how many occurrences of something there are. Well - unfortunately, it’s not that easy. And in clinical reports, there’s quite a bit of nuance that goes into the different types of frequency tables that need to be created. Fortunately, we’ve added a good bit of flexibility into both group_count() and group_shift() to help you get what you need when creating these reports, whether you’re creating a demographics table, adverse events, of shift in lab results.

A Simple Example

Let’s start with a basic example. This table demonstrates the distribution of subject disposition across treatment groups. Additionally, we’re sorting by descending total occurrences using the “Total” group.

t <- tplyr_table(adsl, TRT01P, where = SAFFL == "Y") %>%
  add_total_group() %>%
  add_treat_grps(Treated = c("Xanomeline Low Dose", "Xanomeline High Dose")) %>%
  add_layer(
    group_count(DCDECOD) %>%
      set_order_count_method("bycount") %>%
      set_ordering_cols(Total)
  ) %>%
  build() %>%
  arrange(desc(ord_layer_1)) %>%
  select(starts_with("row"), var1_Placebo, `var1_Xanomeline Low Dose`,
         `var1_Xanomeline High Dose`, var1_Treated, var1_Total)

kable(t)
row_label1 var1_Placebo var1_Xanomeline Low Dose var1_Xanomeline High Dose var1_Treated var1_Total
COMPLETED 58 ( 67.4%) 25 ( 29.8%) 27 ( 32.1%) 52 ( 31.0%) 110 ( 43.3%)
ADVERSE EVENT 8 ( 9.3%) 44 ( 52.4%) 40 ( 47.6%) 84 ( 50.0%) 92 ( 36.2%)
WITHDRAWAL BY SUBJECT 9 ( 10.5%) 10 ( 11.9%) 8 ( 9.5%) 18 ( 10.7%) 27 ( 10.6%)
STUDY TERMINATED BY SPONSOR 2 ( 2.3%) 2 ( 2.4%) 3 ( 3.6%) 5 ( 3.0%) 7 ( 2.8%)
PROTOCOL VIOLATION 2 ( 2.3%) 1 ( 1.2%) 3 ( 3.6%) 4 ( 2.4%) 6 ( 2.4%)
LACK OF EFFICACY 3 ( 3.5%) 0 ( 0.0%) 1 ( 1.2%) 1 ( 0.6%) 4 ( 1.6%)
DEATH 2 ( 2.3%) 1 ( 1.2%) 0 ( 0.0%) 1 ( 0.6%) 3 ( 1.2%)
PHYSICIAN DECISION 1 ( 1.2%) 0 ( 0.0%) 2 ( 2.4%) 2 ( 1.2%) 3 ( 1.2%)
LOST TO FOLLOW-UP 1 ( 1.2%) 1 ( 1.2%) 0 ( 0.0%) 1 ( 0.6%) 2 ( 0.8%)

Percentages

Counting is one thing - but a count by itself doesn’t give much context. Often frequencies in clinical reports are paired with a percentage, which allows you to understand the proportion of events/subjects to some total - but that total can prove to be complicated, which is why we’ve included a few functions to aid in giving you the necessary denominator.

  • set_denoms_by() function allows users calculate denominators using any variable within the by, cols, treatment, or target_var. This defaults to all variables passed as treatment groups and column variables which causes the column percentages to sum to 100%. This is particularly useful for shift tables, which use a combination of row and column variables.
    • set_denom_ignore() defines values of the target variable to be ignored in the calculation of the denominator. In many tables percentages are based on non-missing counts. You can pass values to this function to have them be excluded from the calculation of the denominator. For example, if your data include NAs, you can use “NA” in set_denom_ignore() and those will be excluded from the denominator.

Examples with percentages can be seen below.

Distinct Versus Event Counts

Just as important as denominator flexibility is the determination of whether you should be using distinct or non-distinct counts - or some combination of both. Adverse event tables are a perfect example. Often, you’re concerned about how many subjects had an adverse event in particular instead of just the number of occurrences of that adverse event. Similarly, the number occurrences of an event isn’t necessarily relevant when compared to the total number of adverse events that occurred. For this reason, what you likely want to look at is instead the number of subjects who experienced an event compared to the total number of subjects in that treatment group.

‘Tplyr’ allows you to focus on these distinct counts and distinct percents within some grouping variable, like subject. Additionally, you can mix and match with the distinct counts with non-distinct counts in the same row too. The set_distinct_by() function sets the variables used to calculate the distinct occurrences of some value using the specified by variables.

t <- tplyr_table(adae, TRTA) %>%
  add_layer(
    group_count(AEDECOD) %>%
      set_distinct_by(USUBJID) %>%
      set_format_strings(f_str("xxx (xx.xx%) [xxx]", distinct, distinct_pct, n))
  ) %>%
  build() %>%
  head()

kable(t)
row_label1 var1_Placebo var1_Xanomeline High Dose var1_Xanomeline Low Dose ord_layer_index ord_layer_1
ACTINIC KERATOSIS 0 ( 0.00%) [ 0] 1 ( 2.38%) [ 1] 0 ( 0.00%) [ 0] 1 1
ALOPECIA 1 ( 4.76%) [ 1] 0 ( 0.00%) [ 0] 0 ( 0.00%) [ 0] 1 2
BLISTER 0 ( 0.00%) [ 0] 1 ( 2.38%) [ 2] 5 (11.90%) [ 8] 1 3
COLD SWEAT 1 ( 4.76%) [ 3] 0 ( 0.00%) [ 0] 0 ( 0.00%) [ 0] 1 4
DERMATITIS ATOPIC 1 ( 4.76%) [ 1] 0 ( 0.00%) [ 0] 0 ( 0.00%) [ 0] 1 5
DERMATITIS CONTACT 0 ( 0.00%) [ 0] 0 ( 0.00%) [ 0] 1 ( 2.38%) [ 2] 1 6

You may have seen tables before like the one about. This display shows the number of subjects who experienced an adverse event, the percentage of subjects within the given treatment group who experienced that event, and then the total number of occurrences of that event. Using set_distinct_by() triggered the derivation of distinct and distinct_pct in addition to the n and pct created created within group_count. The display of the values is then controlled by the f_str() call in set_format_strings().

Missing Counts

Missing counts are another tricky area for frequency tables. These values raise a number of questions. For example, do you want to format the missing counts the same way as the event counts? Do you want to present missing counts with percentages? Do missing counts belong in the denominator?

The set_missing_count() function can take a new f_str() object to set the display of missing values. If not specified, the associated count layer’s format will be used. Using the ... parameter, you are able to specify the row label desired for missing values and values that you determine to be considered ‘missing’. For example, you may have NAs in the target variable, and the values like “Not Collected” that you also wish to consider “missing”. set_missing_count() allows you to group those together. Actually - you’re able to establish as many different “missing” groups as you want - even though that scenario is fairly unlikely.

In the example below 50 random values are removed and NA is specified as the missing string. This also introduces another function - set_denom_ignore(). When you have missing counts, you may wish to exclude them from the totals being summarized. You can exclude them from the denominator by using set_denom_ignore(). Simply pass the missing group label established from set_missing_count() into set_denom_ignore(), and all of the missing values you grouped will be excluded from the denominator. set_denom_ignore() can also consume any other value you wish to ignore from the target variable as well, if desired.

adae2 <- adae
adae2[sample(nrow(adae2), 50), "AESEV"] <- NA

t <- tplyr_table(adae2, TRTA) %>%
  add_layer(
    group_count(AESEV) %>%
      set_format_strings(f_str("xxx (xx.xx%)", n, pct)) %>%
      set_missing_count(f_str("xxx", n), Missing = NA, sort_value=Inf)  %>%
      set_denom_ignore("Missing")
  ) %>%
  build() %>% 
  arrange(ord_layer_1)

t %>% 
  kable()
row_label1 var1_Placebo var1_Xanomeline High Dose var1_Xanomeline Low Dose ord_layer_index ord_layer_1
MILD 27 (67.50%) 72 (75.79%) 43 (47.25%) 1 1
MODERATE 13 (32.50%) 23 (24.21%) 43 (47.25%) 1 2
SEVERE 0 ( 0.00%) 0 ( 0.00%) 5 ( 5.49%) 1 3
Missing 7 16 27 1 Inf

We did one more other thing worth explaining in the example above - gave the missing count its own sort value. If you leave this field null, it will simply be the maximum value in the order layer plus 1, to put the Missing counts at the bottom during an ascending sort. But tables can be sorted a lot of different ways, as you’ll see in the sort vignette. So instead of trying to come up with novel ways for you to control where the missing row goes - we decided to just let you specify your own value.

Controlling the Denominator Filter

While set_denom_ignore() can be useful for ignoring certain values, ‘Tplyr’ also offers you the ability to specifically control the filter used within the denominator. This is provided through the function set_denom_where(). The default for set_denom_where() is the layer level where parameter, if one was supplied. set_denom_where() allows you to replace this layer level filter with a custom filter of your choosing. This is done on top of any filtering specified in the tplyr_table() where parameter - which means that the set_denom_where() filter is applied in addition to any table level filtering.

Take the example shown below. The first layer has no layer level filtering applied, so the table level where is the only filter applied. The second layer has a layer level filter applied, so the denominators will be based on that layer level filter. Notice how in this case, the percentages in the second layer add up to 100%. This is because the denominator only includes values used in that layer.

The third layer has a layer level filter applied, but additionally uses set_denom_where(). The set_denom_where() in this example is actually removing the layer level filter for the denominators. This is because in R, when you filter using TRUE, the filter returns all records. So by using TRUE in set_denom_where(), the layer level filter is removed. This causes the denominator to include all values and not just those selected for that layer - so for this layer, the percentages will not add up to 100%. In this example, this allows the percentages from Layer 3 to sum to the total percentage of “DISCONTINUED” from Layer 1.

adsl2 <- adsl %>% 
  mutate(DISCONTEXT = if_else(DISCONFL == 'Y', 'DISCONTINUED', 'COMPLETED'))

t <- tplyr_table(adsl2, TRT01P, where = SAFFL == 'Y') %>%
  add_layer(
    group_count(DISCONTEXT)
  ) %>%
  add_layer(
    group_count(DCSREAS, where = DISCONFL == 'Y')
  ) %>%
  add_layer(
    group_count(DCSREAS, where = DISCONFL == 'Y') %>% 
    set_denom_where(TRUE)
  ) %>%
  build() %>%
  arrange(ord_layer_index, ord_layer_1) 

t %>% 
  kable()
row_label1 var1_Placebo var1_Xanomeline High Dose var1_Xanomeline Low Dose ord_layer_index ord_layer_1
COMPLETED 58 ( 67.4%) 27 ( 32.1%) 25 ( 29.8%) 1 1
DISCONTINUED 28 ( 32.6%) 57 ( 67.9%) 59 ( 70.2%) 1 2
Adverse Event 8 ( 28.6%) 40 ( 70.2%) 44 ( 74.6%) 2 2
Death 2 ( 7.1%) 0 ( 0.0%) 1 ( 1.7%) 2 3
I/E Not Met 1 ( 3.6%) 2 ( 3.5%) 0 ( 0.0%) 2 4
Lack of Efficacy 3 ( 10.7%) 1 ( 1.8%) 0 ( 0.0%) 2 5
Lost to Follow-up 1 ( 3.6%) 0 ( 0.0%) 1 ( 1.7%) 2 6
Physician Decision 1 ( 3.6%) 2 ( 3.5%) 0 ( 0.0%) 2 7
Protocol Violation 1 ( 3.6%) 1 ( 1.8%) 1 ( 1.7%) 2 8
Sponsor Decision 2 ( 7.1%) 3 ( 5.3%) 2 ( 3.4%) 2 9
Withdrew Consent 9 ( 32.1%) 8 ( 14.0%) 10 ( 16.9%) 2 10
Adverse Event 8 ( 9.3%) 40 ( 47.6%) 44 ( 52.4%) 3 2
Death 2 ( 2.3%) 0 ( 0.0%) 1 ( 1.2%) 3 3
I/E Not Met 1 ( 1.2%) 2 ( 2.4%) 0 ( 0.0%) 3 4
Lack of Efficacy 3 ( 3.5%) 1 ( 1.2%) 0 ( 0.0%) 3 5
Lost to Follow-up 1 ( 1.2%) 0 ( 0.0%) 1 ( 1.2%) 3 6
Physician Decision 1 ( 1.2%) 2 ( 2.4%) 0 ( 0.0%) 3 7
Protocol Violation 1 ( 1.2%) 1 ( 1.2%) 1 ( 1.2%) 3 8
Sponsor Decision 2 ( 2.3%) 3 ( 3.6%) 2 ( 2.4%) 3 9
Withdrew Consent 9 ( 10.5%) 8 ( 9.5%) 10 ( 11.9%) 3 10

Adding a ‘Total’ Row

In addition to missing counts, some summaries require the addition of a ‘Total’ row. ‘Tplyr’ has the helper function add_total_row() to ease this process for you. Like everything else that goes into ‘Tplyr’, this too has a significant bit of nuance to it.

Much of this functionality is similar to set_missing_count(). You’re able to specify a different format for the total, but if not specified, the associated count layer’s format will be used. You’re able to set your own sort value to specify where you want the total row to sit.

More nuance comes in two places:

  • By default, add_total_row() will ignore missing values, but you can have it count missing values using the count_missings parameter. ‘Tplyr’ will warn you when using add_total_row() with set_denom_ignore() if you’re requesting to count missing values and including a percentage, because the percentage will exceed 100%.
  • add_total_row() will throw a warning when a by variable is used, because it becomes ambiguous what total should be calculated. You can rectify this by using set_denoms_by(), which allows the user to control exactly how denominators are calculated. Typically in a count layer, the column variables (i.e. treat_var and any cols) are used, and this is the default. The totals presented by add_total_row() will always align with denominators specified in set_denom_by().

In the example below, we summarize age groups by sex. The denominators are determined by treatment group and sex, and the total row shows us what denominator is used. The ‘Missing’ row tells us the number of missing values, but because count_missings is set to TRUE, the missing counts are included in the denominator. This probably isn’t how you would choose to display things, but here we’re trying to show the flexibility built into ‘Tplyr’.

adsl2 <- adsl
adsl2[sample(nrow(adsl2), 50), "AGEGR1"] <- NA

tplyr_table(adsl2, TRT01P) %>% 
  add_layer(
    group_count(AGEGR1, by=SEX) %>% 
      set_denoms_by(TRT01P, SEX) %>%  # This gives me a Total row each group
      add_total_row(f_str("xxx", n), count_missings=TRUE, sort_value=-Inf) %>% 
      set_total_row_label("All Age Groups") %>% 
      set_missing_count(f_str("xxx", n), Missing = NA, sort_value=Inf)
  ) %>% 
  build() %>% 
  arrange(ord_layer_1, ord_layer_2) %>% 
  kable()
row_label1 row_label2 var1_Placebo var1_Xanomeline High Dose var1_Xanomeline Low Dose ord_layer_index ord_layer_1 ord_layer_2
F All Age Groups 53 40 50 1 1 -Inf
F <65 9 ( 17.0%) 2 ( 5.0%) 3 ( 6.0%) 1 1 1
F >80 18 ( 34.0%) 5 ( 12.5%) 16 ( 32.0%) 1 1 2
F 65-80 22 ( 41.5%) 23 ( 57.5%) 20 ( 40.0%) 1 1 3
F Missing 4 10 11 1 1 Inf
M All Age Groups 33 44 34 1 2 -Inf
M <65 5 ( 15.2%) 4 ( 9.1%) 3 ( 8.8%) 1 2 1
M >80 5 ( 15.2%) 11 ( 25.0%) 9 ( 26.5%) 1 2 2
M 65-80 15 ( 45.5%) 20 ( 45.5%) 14 ( 41.2%) 1 2 3
M Missing 8 9 8 1 2 Inf

The default text for the Total row is “Total”, but we provide set_total_row_label() to allow you to customize the text used in your display.

Nested Count Summaries

Certain summary tables present counts within groups. One example could be in a disposition table where a disposition reason of “Other” summarizes what those other reasons were. A very common example is an Adverse Event table that displays counts for body systems, and then the events within those body systems. This is again a nuanced situation - there are two variables being summarized: The body system counts, and the advert event counts.

One way to approach this would be creating two summaries. One summarizing the body system, and another summarizing the preferred terms by body system, and then merging the two together. But we don’t want you to have to do that. Instead, we handle this complexity for you. This is done in group_count() by submitting two target variables with dplyr::vars(). The first variable should be your grouping variable that you want summarized, which we refer to as the “Outside” variable, and the second should have the narrower scope, which we call the “Inside” variable.

The example below demonstrates how to do a nested summary. Look at the first row - here row_label1 and row_label2 are both “CARDIAC DISORDERS”. This line is the summary for AEBODSYS. In the rows below that, row_label1 continues on with the value “CARDIAC DISORDERS”, but row_label2 changes. These are the summaries for AEDECOD.

tplyr_table(adae, TRTA) %>%
  add_layer(
    group_count(vars(AEBODSYS, AEDECOD))
  ) %>%
  build() %>%
  head() %>% 
  kable()
row_label1 row_label2 var1_Placebo var1_Xanomeline High Dose var1_Xanomeline Low Dose ord_layer_index ord_layer_1 ord_layer_2
SKIN AND SUBCUTANEOUS TISSUE DISORDERS SKIN AND SUBCUTANEOUS TISSUE DISORDERS 47 (100.0%) 111 (100.0%) 118 (100.0%) 1 1 Inf
SKIN AND SUBCUTANEOUS TISSUE DISORDERS ACTINIC KERATOSIS 0 ( 0.0%) 1 ( 0.9%) 0 ( 0.0%) 1 1 1
SKIN AND SUBCUTANEOUS TISSUE DISORDERS ALOPECIA 1 ( 2.1%) 0 ( 0.0%) 0 ( 0.0%) 1 1 2
SKIN AND SUBCUTANEOUS TISSUE DISORDERS BLISTER 0 ( 0.0%) 2 ( 1.8%) 8 ( 6.8%) 1 1 3
SKIN AND SUBCUTANEOUS TISSUE DISORDERS COLD SWEAT 3 ( 6.4%) 0 ( 0.0%) 0 ( 0.0%) 1 1 4
SKIN AND SUBCUTANEOUS TISSUE DISORDERS DERMATITIS ATOPIC 1 ( 2.1%) 0 ( 0.0%) 0 ( 0.0%) 1 1 5

This accomplishes what we needed, but it’s not exactly the presentation you might hope for. We have a solution for this as well.

tplyr_table(adae, TRTA) %>%
  add_layer(
    group_count(vars(AEBODSYS, AEDECOD)) %>% 
      set_nest_count(TRUE) %>% 
      set_indentation("--->")
  ) %>%
  build() %>%
  head() %>% 
  kable()
row_label1 var1_Placebo var1_Xanomeline High Dose var1_Xanomeline Low Dose ord_layer_index ord_layer_1 ord_layer_2
SKIN AND SUBCUTANEOUS TISSUE DISORDERS 47 (100.0%) 111 (100.0%) 118 (100.0%) 1 1 Inf
—>ACTINIC KERATOSIS 0 ( 0.0%) 1 ( 0.9%) 0 ( 0.0%) 1 1 1
—>ALOPECIA 1 ( 2.1%) 0 ( 0.0%) 0 ( 0.0%) 1 1 2
—>BLISTER 0 ( 0.0%) 2 ( 1.8%) 8 ( 6.8%) 1 1 3
—>COLD SWEAT 3 ( 6.4%) 0 ( 0.0%) 0 ( 0.0%) 1 1 4
—>DERMATITIS ATOPIC 1 ( 2.1%) 0 ( 0.0%) 0 ( 0.0%) 1 1 5

By using set_nest_count(), this triggers ‘Tplyr’ to drop row_label1, and indent all of the AEDECOD values within row_label2. The columns are renamed appropriately as well. The default indentation used will be 3 spaces, but as you can see here - you can set the indentation however you like. This let’s you use tab strings for different language-specific output types, stick with spaces, indent wider or smaller - whatever you wish. All of the existing order variables remain, so this has no impact on your ability to sort the table.

Shift Tables

Shift tables are a special kind of frequency table - but what they count are changes in state. This is most common when looking at laboratory ranges, where you may be interested in a subject’s status at baseline versus their status at some designated evaluation point. Shift tables allow you to see the distribution of how subjects move between normal ranges, and if the population is improving or worsening as the study progresses.

While shift tables are very similar to a normal frequency table, there’s more nuance here, and thus we decided to create group_shift(). This function is largely an abstraction of a count layer, and in fact re-uses a good deal of the same underlying code. But we handle some of the complexity for you to make the interface easy to use and the behavior similar to that of the group_count() and group_desc() APIs.

One thing to note - the group_shift() API is intended to be used on shift tables where one group is presented in rows and the other group in columns. Occasionally, shift tables will have a row based approach that shows “Low to High”, “Normal to High”, etc. For those situations, group_count() will do just fine.

Let’s look at an example.

tplyr_table(adlb, TRTA, where=PARAMCD == "CK") %>%
  add_layer(
    group_shift(vars(row = BNRIND, column = ANRIND), by = vars(PARAM, VISIT))
  ) %>%
  build() %>%
  head(20) %>%
  kable()
row_label1 row_label2 row_label3 var1_Placebo_H var1_Placebo_N var1_Xanomeline High Dose_H var1_Xanomeline High Dose_N var1_Xanomeline Low Dose_H var1_Xanomeline Low Dose_N ord_layer_index ord_layer_1 ord_layer_2 ord_layer_3
Creatine Kinase (U/L) WEEK 12 H 0 0 1 0 0 0 1 35 1 1
Creatine Kinase (U/L) WEEK 12 N 1 11 1 6 1 6 1 35 1 3
Creatine Kinase (U/L) WEEK 24 H 0 0 0 0 0 0 1 35 2 1
Creatine Kinase (U/L) WEEK 24 N 2 10 0 2 0 2 1 35 2 3
Creatine Kinase (U/L) WEEK 8 H 0 0 0 0 0 0 1 35 3 1
Creatine Kinase (U/L) WEEK 8 N 1 6 1 9 0 6 1 35 3 3

For the most part, this is getting us where we want to go - but there’s still some that’s left to be desired. It doesn’t look like there are any ‘L’ values for BNRIND in the dataset so we are not getting and rows containing ‘L’. Let’s see if we can fix that by dummying in the possible values.

adlb$ANRIND <- factor(adlb$ANRIND, levels=c("L", "N", "H"))
adlb$BNRIND <- factor(adlb$BNRIND, levels=c("L", "N", "H"))
tplyr_table(adlb, TRTA, where=PARAMCD == "CK") %>%
  add_layer(
    group_shift(vars(row = BNRIND, column = ANRIND), by = vars(PARAM, VISIT))
  ) %>%
  build() %>%
  head(20) %>%
  kable()
row_label1 row_label2 row_label3 var1_Placebo_L var1_Placebo_N var1_Placebo_H var1_Xanomeline High Dose_L var1_Xanomeline High Dose_N var1_Xanomeline High Dose_H var1_Xanomeline Low Dose_L var1_Xanomeline Low Dose_N var1_Xanomeline Low Dose_H ord_layer_index ord_layer_1 ord_layer_2 ord_layer_3
Creatine Kinase (U/L) WEEK 12 L 0 0 0 0 0 0 0 0 0 1 35 1 1
Creatine Kinase (U/L) WEEK 12 N 0 11 1 0 6 1 0 6 1 1 35 1 2
Creatine Kinase (U/L) WEEK 12 H 0 0 0 0 0 1 0 0 0 1 35 1 3
Creatine Kinase (U/L) WEEK 24 L 0 0 0 0 0 0 0 0 0 1 35 2 1
Creatine Kinase (U/L) WEEK 24 N 0 10 2 0 2 0 0 2 0 1 35 2 2
Creatine Kinase (U/L) WEEK 24 H 0 0 0 0 0 0 0 0 0 1 35 2 3
Creatine Kinase (U/L) WEEK 8 L 0 0 0 0 0 0 0 0 0 1 35 3 1
Creatine Kinase (U/L) WEEK 8 N 0 6 1 0 9 1 0 6 0 1 35 3 2
Creatine Kinase (U/L) WEEK 8 H 0 0 0 0 0 0 0 0 0 1 35 3 3

There we go. This is another situation where using factors in R let’s us dummy values within the dataset. Furthermore, since factors are ordered, it automatically corrected the sort order of the row labels too. Check out the vignettes("sort") for more information on sorting.

A major part of the shift API is the control of the denominators used in the calculation of the percentages. In frequency tables, a lot of the time you want the columns within a by group to sum to 100%. In shift tables, most percentages are relative to the “box” that is formed from the “from” and “to” groups of the shift for each treatment group. To support this, ‘Tplyr’ let’s you specify the grouping used to calculate the denominators.

Just like the count layers, the set_denoms_by() functions any variable name from the treatment variable, cols argument, by variables, and the target variables.

tplyr_table(adlb, TRTA, where=PARAMCD == "CK") %>%
  add_layer(
    group_shift(vars(row = BNRIND, column = ANRIND), by = vars(PARAM, AVISIT)) %>%
      set_format_strings(f_str("xx (xxx.x%)", n, pct)) %>%
      # This is the default, the 3x3 box formed by the target variables
      set_denoms_by(TRTA, PARAM, AVISIT) 
  ) %>%
  build() %>%
  kable()
row_label1 row_label2 row_label3 var1_Placebo_L var1_Placebo_N var1_Placebo_H var1_Xanomeline High Dose_L var1_Xanomeline High Dose_N var1_Xanomeline High Dose_H var1_Xanomeline Low Dose_L var1_Xanomeline Low Dose_N var1_Xanomeline Low Dose_H ord_layer_index ord_layer_1 ord_layer_2 ord_layer_3
Creatine Kinase (U/L) Week 12 L 0 ( 0.0%) 0 ( 0.0%) 0 ( 0.0%) 0 ( 0.0%) 0 ( 0.0%) 0 ( 0.0%) 0 ( 0.0%) 0 ( 0.0%) 0 ( 0.0%) 1 35 12 1
Creatine Kinase (U/L) Week 12 N 0 ( 0.0%) 11 ( 91.7%) 1 ( 8.3%) 0 ( 0.0%) 6 ( 75.0%) 1 ( 12.5%) 0 ( 0.0%) 6 ( 85.7%) 1 ( 14.3%) 1 35 12 2
Creatine Kinase (U/L) Week 12 H 0 ( 0.0%) 0 ( 0.0%) 0 ( 0.0%) 0 ( 0.0%) 0 ( 0.0%) 1 ( 12.5%) 0 ( 0.0%) 0 ( 0.0%) 0 ( 0.0%) 1 35 12 3
Creatine Kinase (U/L) Week 24 L 0 ( 0.0%) 0 ( 0.0%) 0 ( 0.0%) 0 ( 0.0%) 0 ( 0.0%) 0 ( 0.0%) 0 ( 0.0%) 0 ( 0.0%) 0 ( 0.0%) 1 35 24 1
Creatine Kinase (U/L) Week 24 N 0 ( 0.0%) 10 ( 83.3%) 2 ( 16.7%) 0 ( 0.0%) 2 (100.0%) 0 ( 0.0%) 0 ( 0.0%) 2 (100.0%) 0 ( 0.0%) 1 35 24 2
Creatine Kinase (U/L) Week 24 H 0 ( 0.0%) 0 ( 0.0%) 0 ( 0.0%) 0 ( 0.0%) 0 ( 0.0%) 0 ( 0.0%) 0 ( 0.0%) 0 ( 0.0%) 0 ( 0.0%) 1 35 24 3
Creatine Kinase (U/L) Week 8 L 0 ( 0.0%) 0 ( 0.0%) 0 ( 0.0%) 0 ( 0.0%) 0 ( 0.0%) 0 ( 0.0%) 0 ( 0.0%) 0 ( 0.0%) 0 ( 0.0%) 1 35 8 1
Creatine Kinase (U/L) Week 8 N 0 ( 0.0%) 6 ( 85.7%) 1 ( 14.3%) 0 ( 0.0%) 9 ( 90.0%) 1 ( 10.0%) 0 ( 0.0%) 6 (100.0%) 0 ( 0.0%) 1 35 8 2
Creatine Kinase (U/L) Week 8 H 0 ( 0.0%) 0 ( 0.0%) 0 ( 0.0%) 0 ( 0.0%) 0 ( 0.0%) 0 ( 0.0%) 0 ( 0.0%) 0 ( 0.0%) 0 ( 0.0%) 1 35 8 3

In the example above, the denominators were based on the by and treatment variables, TRTA, PARAM and VISIT. This creates a 3 X 3 box, where the denominator is the total of all record within the FROM and TO shift variables, within each parameter, visit, and treatment. This is the default, and this is how ‘Tplyr’ will create the denominators if set_denom_by() isn’t specified.

In the next example, the percentage denominators are calculated row-wise, each row percentage sums to 100%.

tplyr_table(adlb, TRTA, where=PARAMCD == "CK") %>%
  add_layer(
    group_shift(vars(row = BNRIND, column = ANRIND), by = vars(PARAM, AVISIT)) %>%
      set_format_strings(f_str("xx (xxx.x%)", n, pct)) %>%
      set_denoms_by(BNRIND, PARAM, AVISIT) # Each row made by LBNRIND, TRTA
  ) %>%
  build() %>%
  arrange(ord_layer_1, ord_layer_2, ord_layer_3) %>% 
  head() %>% 
  kable()
row_label1 row_label2 row_label3 var1_Placebo_L var1_Placebo_N var1_Placebo_H var1_Xanomeline High Dose_L var1_Xanomeline High Dose_N var1_Xanomeline High Dose_H var1_Xanomeline Low Dose_L var1_Xanomeline Low Dose_N var1_Xanomeline Low Dose_H ord_layer_index ord_layer_1 ord_layer_2 ord_layer_3
Creatine Kinase (U/L) Week 8 L 0 ( 0.0%) 0 ( 0.0%) 0 ( 0.0%) 0 ( 0.0%) 0 ( 0.0%) 0 ( 0.0%) 0 ( 0.0%) 0 ( 0.0%) 0 ( 0.0%) 1 35 8 1
Creatine Kinase (U/L) Week 8 N 0 ( 0.0%) 6 ( 26.1%) 1 ( 4.3%) 0 ( 0.0%) 9 ( 39.1%) 1 ( 4.3%) 0 ( 0.0%) 6 ( 26.1%) 0 ( 0.0%) 1 35 8 2
Creatine Kinase (U/L) Week 8 H 0 ( 0.0%) 0 ( 0.0%) 0 ( 0.0%) 0 ( 0.0%) 0 ( 0.0%) 0 ( 0.0%) 0 ( 0.0%) 0 ( 0.0%) 0 ( 0.0%) 1 35 8 3
Creatine Kinase (U/L) Week 12 L 0 ( 0.0%) 0 ( 0.0%) 0 ( 0.0%) 0 ( 0.0%) 0 ( 0.0%) 0 ( 0.0%) 0 ( 0.0%) 0 ( 0.0%) 0 ( 0.0%) 1 35 12 1
Creatine Kinase (U/L) Week 12 N 0 ( 0.0%) 11 ( 42.3%) 1 ( 3.8%) 0 ( 0.0%) 6 ( 23.1%) 1 ( 3.8%) 0 ( 0.0%) 6 ( 23.1%) 1 ( 3.8%) 1 35 12 2
Creatine Kinase (U/L) Week 12 H 0 ( 0.0%) 0 ( 0.0%) 0 ( 0.0%) 0 ( 0.0%) 0 ( 0.0%) 1 (100.0%) 0 ( 0.0%) 0 ( 0.0%) 0 ( 0.0%) 1 35 12 3

While not practical, in this last example the denominators are changed to be based on the entire column instead of the 3 x 3 box. By passing the column variables, TRTA and ANRIND the layer will use those denominators when determining the percentages.

tplyr_table(adlb, TRTA, where = PARAMCD == "CK") %>%
  add_layer(
    group_shift(vars(row = BNRIND, column = ANRIND), by = vars(PARAM, AVISIT)) %>%
      set_format_strings(f_str("xx (xx.xx%)", n, pct)) %>%
      set_denoms_by(TRTA, ANRIND) # Use the column total as the denominator
  ) %>%
  build() %>%
  arrange(ord_layer_1, ord_layer_2, ord_layer_3) %>% 
  head() %>%
  kable()
row_label1 row_label2 row_label3 var1_Placebo_L var1_Placebo_N var1_Placebo_H var1_Xanomeline High Dose_L var1_Xanomeline High Dose_N var1_Xanomeline High Dose_H var1_Xanomeline Low Dose_L var1_Xanomeline Low Dose_N var1_Xanomeline Low Dose_H ord_layer_index ord_layer_1 ord_layer_2 ord_layer_3
Creatine Kinase (U/L) Week 8 L 0 ( 0.00%) 0 ( 0.00%) 0 ( 0.00%) 0 ( 0.00%) 0 ( 0.00%) 0 ( 0.00%) 0 ( 0.00%) 0 ( 0.00%) 0 ( 0.00%) 1 35 8 1
Creatine Kinase (U/L) Week 8 N 0 ( 0.00%) 6 (22.22%) 1 (25.00%) 0 ( 0.00%) 9 (52.94%) 1 (33.33%) 0 ( 0.00%) 6 (42.86%) 0 ( 0.00%) 1 35 8 2
Creatine Kinase (U/L) Week 8 H 0 ( 0.00%) 0 ( 0.00%) 0 ( 0.00%) 0 ( 0.00%) 0 ( 0.00%) 0 ( 0.00%) 0 ( 0.00%) 0 ( 0.00%) 0 ( 0.00%) 1 35 8 3
Creatine Kinase (U/L) Week 12 L 0 ( 0.00%) 0 ( 0.00%) 0 ( 0.00%) 0 ( 0.00%) 0 ( 0.00%) 0 ( 0.00%) 0 ( 0.00%) 0 ( 0.00%) 0 ( 0.00%) 1 35 12 1
Creatine Kinase (U/L) Week 12 N 0 ( 0.00%) 11 (40.74%) 1 (25.00%) 0 ( 0.00%) 6 (35.29%) 1 (33.33%) 0 ( 0.00%) 6 (42.86%) 1 (100.00%) 1 35 12 2
Creatine Kinase (U/L) Week 12 H 0 ( 0.00%) 0 ( 0.00%) 0 ( 0.00%) 0 ( 0.00%) 0 ( 0.00%) 1 (33.33%) 0 ( 0.00%) 0 ( 0.00%) 0 ( 0.00%) 1 35 12 3

Our hope is that this gives you the flexibility you need to structure your denominator however required.