Skip to contents

At the surface, counting sounds pretty simple, right? You just want to know how many occurrences of something there are. Well - unfortunately, it’s not that easy. And in clinical reports, there’s quite a bit of nuance that goes into the different types of frequency tables that need to be created. Fortunately, we’ve added a good bit of flexibility into group_count() to help you get what you need when creating these reports, whether you’re creating a demographics table, adverse events, or lab results.

A Simple Example

Let’s start with a basic example. This table demonstrates the distribution of subject disposition across treatment groups. Additionally, we’re sorting by descending total occurrences using the “Total” group.

t <- tplyr_table(tplyr_adsl, TRT01P, where = SAFFL == "Y") %>%
  add_total_group() %>%
  add_treat_grps(Treated = c("Xanomeline Low Dose", "Xanomeline High Dose")) %>%
  add_layer(
    group_count(DCDECOD) %>%
      set_order_count_method("bycount") %>%
      set_ordering_cols(Total)
  ) %>%
  build() %>%
  arrange(desc(ord_layer_1)) %>%
  select(starts_with("row"), var1_Placebo, `var1_Xanomeline Low Dose`,
         `var1_Xanomeline High Dose`, var1_Treated, var1_Total)

kable(t)
row_label1 var1_Placebo var1_Xanomeline Low Dose var1_Xanomeline High Dose var1_Treated var1_Total
COMPLETED 58 ( 67.4%) 25 ( 29.8%) 27 ( 32.1%) 52 ( 31.0%) 110 ( 43.3%)
ADVERSE EVENT 8 ( 9.3%) 44 ( 52.4%) 40 ( 47.6%) 84 ( 50.0%) 92 ( 36.2%)
WITHDRAWAL BY SUBJECT 9 ( 10.5%) 10 ( 11.9%) 8 ( 9.5%) 18 ( 10.7%) 27 ( 10.6%)
STUDY TERMINATED BY SPONSOR 2 ( 2.3%) 2 ( 2.4%) 3 ( 3.6%) 5 ( 3.0%) 7 ( 2.8%)
PROTOCOL VIOLATION 2 ( 2.3%) 1 ( 1.2%) 3 ( 3.6%) 4 ( 2.4%) 6 ( 2.4%)
LACK OF EFFICACY 3 ( 3.5%) 0 ( 0.0%) 1 ( 1.2%) 1 ( 0.6%) 4 ( 1.6%)
DEATH 2 ( 2.3%) 1 ( 1.2%) 0 ( 0.0%) 1 ( 0.6%) 3 ( 1.2%)
PHYSICIAN DECISION 1 ( 1.2%) 0 ( 0.0%) 2 ( 2.4%) 2 ( 1.2%) 3 ( 1.2%)
LOST TO FOLLOW-UP 1 ( 1.2%) 1 ( 1.2%) 0 ( 0.0%) 1 ( 0.6%) 2 ( 0.8%)

Distinct Versus Event Counts

Another exceptionally important consideration within count layers is whether you should be using distinct counts, non-distinct counts, or some combination of both. Adverse event tables are a perfect example. Often, you’re concerned about how many subjects had an adverse event in particular instead of just the number of occurrences of that adverse event. Similarly, the number occurrences of an event isn’t necessarily relevant when compared to the total number of adverse events that occurred. For this reason, what you likely want to look at is instead the number of subjects who experienced an event compared to the total number of subjects in that treatment group.

Tplyr allows you to focus on these distinct counts and distinct percents within some grouping variable, like subject. Additionally, you can mix and match with the distinct counts with non-distinct counts in the same row too. The set_distinct_by() function sets the variables used to calculate the distinct occurrences of some value using the specified distinct_by variables.

t <- tplyr_table(tplyr_adae, TRTA) %>%
  add_layer(
    group_count(AEDECOD) %>%
      set_distinct_by(USUBJID) %>%
      set_format_strings(f_str("xxx (xx.xx%) [xxx]", distinct_n, distinct_pct, n))
  ) %>%
  build() %>%
  head()

kable(t)
row_label1 var1_Placebo var1_Xanomeline High Dose var1_Xanomeline Low Dose ord_layer_index ord_layer_1
ACTINIC KERATOSIS 0 ( 0.00%) [ 0] 1 ( 2.38%) [ 1] 0 ( 0.00%) [ 0] 1 1
ALOPECIA 1 ( 4.76%) [ 1] 0 ( 0.00%) [ 0] 0 ( 0.00%) [ 0] 1 2
BLISTER 0 ( 0.00%) [ 0] 1 ( 2.38%) [ 2] 5 (11.90%) [ 8] 1 3
COLD SWEAT 1 ( 4.76%) [ 3] 0 ( 0.00%) [ 0] 0 ( 0.00%) [ 0] 1 4
DERMATITIS ATOPIC 1 ( 4.76%) [ 1] 0 ( 0.00%) [ 0] 0 ( 0.00%) [ 0] 1 5
DERMATITIS CONTACT 0 ( 0.00%) [ 0] 0 ( 0.00%) [ 0] 1 ( 2.38%) [ 2] 1 6

You may have seen tables before like the one above. This display shows the number of subjects who experienced an adverse event, the percentage of subjects within the given treatment group who experienced that event, and then the total number of occurrences of that event. Using set_distinct_by() triggered the derivation of distinct_n and distinct_pct in addition to the n and pct created within group_count. The display of the values is then controlled by the f_str() call in set_format_strings().

An additional option for formatting the numbers above would be using ‘parenthesis hugging’. To trigger this, on the integer side of a number use a capital ‘X’ or a capital ‘A’. For example:

t <- tplyr_table(tplyr_adae, TRTA) %>%
  add_layer(
    group_count(AEDECOD) %>%
      set_distinct_by(USUBJID) %>%
      set_format_strings(f_str("xxx (XXX.xx%) [A]", distinct_n, distinct_pct, n))
  ) %>%
  build() %>%
  head() %>% 
  select(row_label1, `var1_Xanomeline Low Dose`)

t
#> # A tibble: 6 × 2
#>   row_label1         `var1_Xanomeline Low Dose`
#>   <chr>              <chr>                     
#> 1 ACTINIC KERATOSIS  "  0   (0.00%)  [0]"      
#> 2 ALOPECIA           "  0   (0.00%)  [0]"      
#> 3 BLISTER            "  5  (11.90%)  [8]"      
#> 4 COLD SWEAT         "  0   (0.00%)  [0]"      
#> 5 DERMATITIS ATOPIC  "  0   (0.00%)  [0]"      
#> 6 DERMATITIS CONTACT "  1   (2.38%)  [2]"

As can be seen above, when using parenthesis hugging, the width of a specified format group is preserved, but the preceding character (or characters) to the left of the ‘X’ or ‘A’ is pulled to the right to ‘hug’ the specified number.

Nested Count Summaries

Certain summary tables present counts within groups. One example could be in a disposition table where a disposition reason of “Other” summarizes what those other reasons were. A very common example is an Adverse Event table that displays counts for body systems, and then the events within those body systems. This is again a nuanced situation - there are two variables being summarized: The body system counts, and the advert event counts.

One way to approach this would be creating two summaries. One summarizing the body system, and another summarizing the preferred terms by body system, and then merging the two together. But we don’t want you to have to do that. Instead, we handle this complexity for you. This is done in group_count() by submitting two target variables with dplyr::vars(). The first variable should be your grouping variable that you want summarized, which we refer to as the “Outside” variable, and the second should have the narrower scope, which we call the “Inside” variable.

The example below demonstrates how to do a nested summary. Look at the first row - here row_label1 and row_label2 are both “CARDIAC DISORDERS”. This line is the summary for AEBODSYS. In the rows below that, row_label1 continues on with the value “CARDIAC DISORDERS”, but row_label2 changes. These are the summaries for AEDECOD.

tplyr_table(tplyr_adae, TRTA) %>%
  add_layer(
    group_count(vars(AEBODSYS, AEDECOD))
  ) %>%
  build() %>%
  head() %>% 
  kable()
row_label1 row_label2 var1_Placebo var1_Xanomeline High Dose var1_Xanomeline Low Dose ord_layer_index ord_layer_1 ord_layer_2
SKIN AND SUBCUTANEOUS TISSUE DISORDERS SKIN AND SUBCUTANEOUS TISSUE DISORDERS 47 (100.0%) 111 (100.0%) 118 (100.0%) 1 1 Inf
SKIN AND SUBCUTANEOUS TISSUE DISORDERS ACTINIC KERATOSIS 0 ( 0.0%) 1 ( 0.9%) 0 ( 0.0%) 1 1 1
SKIN AND SUBCUTANEOUS TISSUE DISORDERS ALOPECIA 1 ( 2.1%) 0 ( 0.0%) 0 ( 0.0%) 1 1 2
SKIN AND SUBCUTANEOUS TISSUE DISORDERS BLISTER 0 ( 0.0%) 2 ( 1.8%) 8 ( 6.8%) 1 1 3
SKIN AND SUBCUTANEOUS TISSUE DISORDERS COLD SWEAT 3 ( 6.4%) 0 ( 0.0%) 0 ( 0.0%) 1 1 4
SKIN AND SUBCUTANEOUS TISSUE DISORDERS DERMATITIS ATOPIC 1 ( 2.1%) 0 ( 0.0%) 0 ( 0.0%) 1 1 5

This accomplishes what we needed, but it’s not exactly the presentation you might hope for. We have a solution for this as well.

tplyr_table(tplyr_adae, TRTA) %>%
  add_layer(
    group_count(vars(AEBODSYS, AEDECOD)) %>% 
      set_nest_count(TRUE) %>% 
      set_indentation("--->")
  ) %>%
  build() %>%
  head() %>% 
  kable()
row_label1 var1_Placebo var1_Xanomeline High Dose var1_Xanomeline Low Dose ord_layer_index ord_layer_1 ord_layer_2
SKIN AND SUBCUTANEOUS TISSUE DISORDERS 47 (100.0%) 111 (100.0%) 118 (100.0%) 1 1 Inf
—>ACTINIC KERATOSIS 0 ( 0.0%) 1 ( 0.9%) 0 ( 0.0%) 1 1 1
—>ALOPECIA 1 ( 2.1%) 0 ( 0.0%) 0 ( 0.0%) 1 1 2
—>BLISTER 0 ( 0.0%) 2 ( 1.8%) 8 ( 6.8%) 1 1 3
—>COLD SWEAT 3 ( 6.4%) 0 ( 0.0%) 0 ( 0.0%) 1 1 4
—>DERMATITIS ATOPIC 1 ( 2.1%) 0 ( 0.0%) 0 ( 0.0%) 1 1 5

By using set_nest_count(), this triggers Tplyr to drop row_label1, and indent all of the AEDECOD values within row_label2. The columns are renamed appropriately as well. The default indentation used will be 3 spaces, but as you can see here - you can set the indentation however you like. This let’s you use tab strings for different language-specific output types, stick with spaces, indent wider or smaller - whatever you wish. All of the existing order variables remain, so this has no impact on your ability to sort the table.

There’s a lot more to counting! So be sure to check out our vignettes on sorting, shift tables, and denominators.