Tplyr Table Properties • Tplyr

Most of the work in creating a Tplyr table is at the layer level, but there are a few overarching properties that are worth spending some time discussing. One of the things that we wanted to make sure we did in Tplyr is allow you to eliminate redundant code wherever possible. Adding some processing to the tplyr_table() level allows us to do that. Furthermore, some settings simply need to be applied table wide.

Table Parameters

The tplyr_table() function has 4 parameters:

target: The dataset upon which summaries will be performed
treat_var: The variable containing treatment group assignments
where: The overarching table subset criteria. Each layer will use this subset by default. The where parameter at the table level will be called in addition to the layer subset criteria.
cols: Grouping variables used in addition to the by variables set at the layer level, but will be transposed into columns in addition to treat_var.

Let’s look at an example:

tplyr_table(tplyr_adsl, TRT01P, where= SAFFL =="Y", cols = SEX) %>% 
  add_layer(
    group_count(RACE, by = "Race")
  ) %>% 
  add_layer(
    group_desc(AGE, by = "Age (Years)")
  ) %>% 
  build() %>% 
  kable()

row_label1	row_label2	var1_Placebo_F	var1_Placebo_M	var1_Xanomeline High Dose_F	var1_Xanomeline High Dose_M	var1_Xanomeline Low Dose_F	var1_Xanomeline Low Dose_M	ord_layer_index	ord_layer_1	ord_layer_2
Race	AMERICAN INDIAN OR ALASKA NATIVE	0 ( 0.0%)	0 ( 0.0%)	0 ( 0.0%)	1 ( 2.3%)	0 ( 0.0%)	0 ( 0.0%)	1	1	1
Race	BLACK OR AFRICAN AMERICAN	5 ( 9.4%)	3 ( 9.1%)	6 ( 15.0%)	3 ( 6.8%)	6 ( 12.0%)	0 ( 0.0%)	1	1	2
Race	WHITE	48 ( 90.6%)	30 ( 90.9%)	34 ( 85.0%)	40 ( 90.9%)	44 ( 88.0%)	34 (100.0%)	1	1	3
Age (Years)	n	53	33	40	44	50	34	2	1	1
Age (Years)	Mean (SD)	76.4 ( 8.73)	73.4 ( 8.15)	74.7 ( 7.67)	74.1 ( 8.16)	75.7 ( 8.09)	75.6 ( 8.69)	2	1	2
Age (Years)	Median	78.0	74.0	76.0	77.0	77.5	77.5	2	1	3
Age (Years)	Q1, Q3	70.0, 84.0	69.0, 80.0	72.0, 79.0	69.0, 80.2	72.0, 81.0	68.2, 82.0	2	1	4
Age (Years)	Min, Max	59, 89	52, 85	56, 88	56, 86	54, 87	51, 88	2	1	5
Age (Years)	Missing	0	0	0	0	0	0	2	1	6

In the example above, the where parameter is passed forward into both the RACE and AGE layers. Furthermore, note how the cols parameter works. By default, the target variables from the layers are transposed by the treat_var variables. The cols argument adds an additional variable to transpose by, and the values of these variable are added as a suffix to the variable name. You are able to use multiple cols variables just like by, by using dplyr::vars(). But use with caution - as depending on the distinct variable values in the dataset, this could get quite wide.

Note: Treatment groups and additional column variables presented in the final output are always taken from the pre-filtered population data. This means that if a filter completed excludes a treatment group or group within a column variable, columns will still be created for those groups and will be empty/zero filled.

tplyr_table(tplyr_adsl, TRT01P, where= SAFFL =="Y", cols = vars(SEX, RACE)) %>% 
  add_layer(
    group_desc(AGE, by = "Age (Years)")
  ) %>% 
  build() %>% 
  kable()

row_label1	row_label2	var1_Placebo_F_BLACK OR AFRICAN AMERICAN	var1_Placebo_F_WHITE	var1_Placebo_M_BLACK OR AFRICAN AMERICAN	var1_Placebo_M_WHITE	var1_Xanomeline High Dose_F_BLACK OR AFRICAN AMERICAN	var1_Xanomeline High Dose_F_WHITE	var1_Xanomeline High Dose_M_AMERICAN INDIAN OR ALASKA NATIVE	var1_Xanomeline High Dose_M_BLACK OR AFRICAN AMERICAN	var1_Xanomeline High Dose_M_WHITE	var1_Xanomeline Low Dose_F_BLACK OR AFRICAN AMERICAN	var1_Xanomeline Low Dose_F_WHITE	var1_Xanomeline Low Dose_M_WHITE	ord_layer_index	ord_layer_1	ord_layer_2
Age (Years)	n	5	48	3	30	6	34	1	3	40	6	44	34	1	1	1
Age (Years)	Mean (SD)	75.2 ( 7.79)	76.5 ( 8.89)	64.7 ( 6.81)	74.2 ( 7.84)	72.2 ( 6.08)	75.1 ( 7.91)	61.0 ( )	79.3 ( 2.52)	74.0 ( 8.16)	72.5 (11.78)	76.1 ( 7.54)	75.6 ( 8.69)	1	1	2
Age (Years)	Median	80.0	78.0	67.0	74.5	73.5	76.0	61.0	79.0	76.0	75.0	78.0	77.5	1	1	3
Age (Years)	Q1, Q3	70.0, 81.0	70.5, 84.0	62.0, 68.5	70.0, 80.8	68.5, 76.2	72.0, 79.8	61.0, 61.0	78.0, 80.5	69.0, 80.2	63.5, 79.8	72.0, 81.0	68.2, 82.0	1	1	4
Age (Years)	Min, Max	64, 81	59, 89	57, 70	52, 85	63, 79	56, 88	61, 61	77, 82	56, 86	57, 87	54, 86	51, 88	1	1	5
Age (Years)	Missing	0	0	0	0	0	0	0	0	0	0	0	0	1	1	6

Additional Treatment Groups

Another important feature that works at the table level is the addition of treatment groups. By adding additional treatment groups, you’re able to do a number of things:

Add a ‘treated’ group to your data so you can analyze ‘treated’ vs. ‘placebo’ when you have multiple treated cohorts
Add a ‘total’ group so summarize the overall study population

We’ve added the function add_treat_grps() to do this work for you. With this function, you can create new treatment groups by combining existing treatment groups from values within treat_var. Additionally, to simplify the process we added an abstraction of add_treat_grps() named add_total_group() to simplify the process of creating a “Total” group.

tplyr_table(tplyr_adsl, TRT01P) %>%
  add_treat_grps('Treated' = c("Xanomeline High Dose", "Xanomeline Low Dose")) %>% 
  add_total_group() %>% 
  add_layer(
    group_desc(AGE, by = "Age (Years)")
  ) %>% 
  build() %>% 
  kable()

row_label1	row_label2	var1_Placebo	var1_Xanomeline High Dose	var1_Xanomeline Low Dose	var1_Treated	var1_Total	ord_layer_index	ord_layer_1	ord_layer_2
Age (Years)	n	86	84	84	168	254	1	1	1
Age (Years)	Mean (SD)	75.2 ( 8.59)	74.4 ( 7.89)	75.7 ( 8.29)	75.0 ( 8.09)	75.1 ( 8.25)	1	1	2
Age (Years)	Median	76.0	76.0	77.5	77.0	77.0	1	1	3
Age (Years)	Q1, Q3	69.2, 81.8	70.8, 80.0	71.0, 82.0	71.0, 81.0	70.0, 81.0	1	1	4
Age (Years)	Min, Max	52, 89	56, 88	51, 88	51, 88	51, 89	1	1	5
Age (Years)	Missing	0	0	0	0	0	1	1	6

Note how in the above example, there are two new columns added to the data - var1_Total and var1_Treated. The summaries for the individual cohorts are left unchanged.

Population Data

A last and very important aspect of table level properties in Tplyr is the addition of a population dataset. In CDISC standards, datasets like adae only contain adverse events when they occur. This means that if a subject did not experience an adverse event, or did not experience an adverse event within the criteria that you’re subsetting for, they don’t appear in the dataset. When you’re looking at the proportion of subject who experienced an adverse event compared to the total number of subjects in that cohort, adae itself leaves you no way to calculate that total - as the subjects won’t exist in the data.

Tplyr allows you to provide a separate population dataset to overcome this. Furthermore, you are also able to provide a separate population dataset where parameter and a population treatment variable named pop_treat_var, as variable names may differ between the datasets.

t <- tplyr_table(tplyr_adae, TRTA, where = AEREL != "NONE") %>% 
  set_pop_data(tplyr_adsl) %>% 
  set_pop_treat_var(TRT01A) %>% 
  set_pop_where(TRUE) %>% 
  add_layer(
    group_count(AEDECOD) %>% 
      set_distinct_by(USUBJID)
  )
  
t %>% 
  build() %>% 
  kable()

row_label1	var1_Placebo	var1_Xanomeline High Dose	var1_Xanomeline Low Dose	ord_layer_index	ord_layer_1
ALOPECIA	1 ( 1.2%)	0 ( 0.0%)	0 ( 0.0%)	1	2
BLISTER	0 ( 0.0%)	1 ( 1.2%)	5 ( 6.0%)	1	3
COLD SWEAT	1 ( 1.2%)	0 ( 0.0%)	0 ( 0.0%)	1	4
DERMATITIS CONTACT	0 ( 0.0%)	0 ( 0.0%)	1 ( 1.2%)	1	6
ERYTHEMA	9 ( 10.5%)	14 ( 16.7%)	13 ( 15.5%)	1	8
HYPERHIDROSIS	2 ( 2.3%)	8 ( 9.5%)	4 ( 4.8%)	1	9
PRURITUS	8 ( 9.3%)	26 ( 31.0%)	21 ( 25.0%)	1	10
PRURITUS GENERALISED	0 ( 0.0%)	1 ( 1.2%)	1 ( 1.2%)	1	11
RASH	4 ( 4.7%)	8 ( 9.5%)	13 ( 15.5%)	1	12
RASH ERYTHEMATOUS	0 ( 0.0%)	0 ( 0.0%)	2 ( 2.4%)	1	13
RASH MACULO-PAPULAR	0 ( 0.0%)	1 ( 1.2%)	0 ( 0.0%)	1	14
RASH PAPULAR	0 ( 0.0%)	1 ( 1.2%)	0 ( 0.0%)	1	15
RASH PRURITIC	0 ( 0.0%)	2 ( 2.4%)	1 ( 1.2%)	1	16
SKIN EXFOLIATION	0 ( 0.0%)	0 ( 0.0%)	1 ( 1.2%)	1	17
SKIN IRRITATION	2 ( 2.3%)	5 ( 6.0%)	6 ( 7.1%)	1	18
SKIN ODOUR ABNORMAL	0 ( 0.0%)	1 ( 1.2%)	0 ( 0.0%)	1	19
SKIN ULCER	1 ( 1.2%)	0 ( 0.0%)	0 ( 0.0%)	1	20
URTICARIA	0 ( 0.0%)	1 ( 1.2%)	1 ( 1.2%)	1	21

In the above example, AEREL doesn’t exist in adsl, therefore we used set_pop_where() to remove the filter criteria on the population data. Setting the population dataset where parameter to TRUE removes any filter applied by the population data. If set_pop_where() is not set for the population data, it will default to the where parameter used in tplyr_table(). The same logic applies to the population treatment variable. TRTA does not exist in adsl either, so we used set_pop_treat_var() to change it to the appropriate variable in adsl.

Note the percentage values in the summary above. By setting the population data, Tplyr now knew to use those values when calculating the percentages for the distinct counts of subjects who experienced the summarized adverse events. Furthermore, with the population data provided, Tplyr is able to calculate your header N’s properly:

header_n(t) %>% 
  kable()

TRT01A	n
Placebo	86
Xanomeline High Dose	84
Xanomeline Low Dose	84

Note: it’s expected the set_distinct_by() function is used with population data. This is because it does not make sense to use population data denominators unless you have distinct counts. The entire point of population data is to use subject counts, so non-distinct counts would potentially count multiple records per subject and then the percentage doesn’t make any sense.

Data Completion

When creating summary tables, often we have to mock up the potential values of data, even if those values aren’t present in the data we’re summarizing. Tplyr does its best effort to do this for you. Let’s consider the following dataset:

USUBJID	AVISIT	PECAT	PARAM	TRT01A	AVALC	AVAL	BASEC
101-001	Screening	A	Head	TRT A	Normal	1	Abnormal
101-001	Screening	A	Lungs	TRT A	Normal	2	Semi-Normal
101-001	Day -1	A	Lungs	TRT A	Normal	3	Normal
101-001	Day 5	A	Lungs	TRT A	Normal	4	Normal
101-002	Screening	A	Head	TRT B	Semi-Normal	5	Normal
101-002	Screening	A	Lungs	TRT B	Normal	6	Normal

Let’s say we want to create a count summary for this dataset, and report it by PARAM and AVISIT. Note that in the data, PARAM=="HEAD" is only collected at screening, while LUNGS is collected at Screening, Day -1, and Day 5.

tplyr_table(tplyr_adpe, TRT01A) %>%
  add_layer(
    group_count(AVALC, by = vars(PARAM, AVISIT))
  ) %>% 
  build() %>% 
  select(-starts_with('ord')) %>% 
  head(18) %>% 
  kable()

row_label1	row_label2	row_label3	var1_TRT A	var1_TRT B
Head	Screening	Normal	2 ( 14.3%)	0 ( 0.0%)
Head	Screening	Semi-Normal	0 ( 0.0%)	1 ( 14.3%)
Head	Screening	Abnormal	0 ( 0.0%)	0 ( 0.0%)
Head	Day -1	Normal	0 ( 0.0%)	0 ( 0.0%)
Head	Day -1	Semi-Normal	0 ( 0.0%)	0 ( 0.0%)
Head	Day -1	Abnormal	0 ( 0.0%)	0 ( 0.0%)
Head	Day 5	Normal	0 ( 0.0%)	0 ( 0.0%)
Head	Day 5	Semi-Normal	0 ( 0.0%)	0 ( 0.0%)
Head	Day 5	Abnormal	0 ( 0.0%)	0 ( 0.0%)
Lungs	Screening	Normal	2 ( 14.3%)	2 ( 28.6%)
Lungs	Screening	Semi-Normal	0 ( 0.0%)	0 ( 0.0%)
Lungs	Screening	Abnormal	2 ( 14.3%)	0 ( 0.0%)
Lungs	Day -1	Normal	4 ( 28.6%)	2 ( 28.6%)
Lungs	Day -1	Semi-Normal	0 ( 0.0%)	0 ( 0.0%)
Lungs	Day -1	Abnormal	0 ( 0.0%)	0 ( 0.0%)
Lungs	Day 5	Normal	2 ( 14.3%)	2 ( 28.6%)
Lungs	Day 5	Semi-Normal	0 ( 0.0%)	0 ( 0.0%)
Lungs	Day 5	Abnormal	2 ( 14.3%)	0 ( 0.0%)

By default, given the by variables of PARAM and AVISIT, all of the potential visits have dummy rows created that are 0 filled - meaning results of 0 records for all treatment groups are presented. However, that might not be what you wish to present. Perhaps HEAD was only intended to be collected at the Screening visit so it’s unnecessary to present other visits. To address this, you can use the set_limit_data_by() function.

tplyr_table(tplyr_adpe, TRT01A) %>%
  add_layer(
    group_count(AVALC, by = vars(PARAM, AVISIT)) %>% 
      set_limit_data_by(PARAM, AVISIT)
  ) %>% 
  build() %>% 
  select(-starts_with('ord')) %>% 
  head(12) %>% 
  kable()

row_label1	row_label2	row_label3	var1_TRT A	var1_TRT B
Head	Screening	Normal	2 ( 14.3%)	0 ( 0.0%)
Head	Screening	Semi-Normal	0 ( 0.0%)	1 ( 14.3%)
Head	Screening	Abnormal	0 ( 0.0%)	0 ( 0.0%)
Lungs	Screening	Normal	2 ( 14.3%)	2 ( 28.6%)
Lungs	Screening	Semi-Normal	0 ( 0.0%)	0 ( 0.0%)
Lungs	Screening	Abnormal	2 ( 14.3%)	0 ( 0.0%)
Lungs	Day -1	Normal	4 ( 28.6%)	2 ( 28.6%)
Lungs	Day -1	Semi-Normal	0 ( 0.0%)	0 ( 0.0%)
Lungs	Day -1	Abnormal	0 ( 0.0%)	0 ( 0.0%)
Lungs	Day 5	Normal	2 ( 14.3%)	2 ( 28.6%)
Lungs	Day 5	Semi-Normal	0 ( 0.0%)	0 ( 0.0%)
Lungs	Day 5	Abnormal	2 ( 14.3%)	0 ( 0.0%)

Here you can see that now records for HEAD only present the screening visit. For count and shift layers, you can additionally dig further in to use target variables:

tplyr_table(tplyr_adpe, TRT01A) %>%
  add_layer(
    group_count(AVALC, by = vars(PARAM, AVISIT)) %>% 
      set_limit_data_by(PARAM, AVISIT, AVALC)
  ) %>% 
  build() %>% 
  select(-starts_with('ord')) %>% 
  kable()

row_label1	row_label2	row_label3	var1_TRT A	var1_TRT B
Head	Screening	Normal	2 ( 14.3%)	0 ( 0.0%)
Lungs	Screening	Normal	2 ( 14.3%)	2 ( 28.6%)
Lungs	Screening	Abnormal	2 ( 14.3%)	0 ( 0.0%)
Lungs	Day -1	Normal	4 ( 28.6%)	2 ( 28.6%)
Lungs	Day 5	Normal	2 ( 14.3%)	2 ( 28.6%)
Lungs	Day 5	Abnormal	2 ( 14.3%)	0 ( 0.0%)
Head	Screening	Semi-Normal	0 ( 0.0%)	1 ( 14.3%)

This effectively limits to the values present in the data itself.

Where to Go From Here

With the table level settings under control, now you’re ready to learn more about what Tplyr has to offer in each layer.

Learn more about descriptive statistics layers in vignette("desc")
Learn more about count and shift layers in vignette("count")
Learn more about shift layers in vignette("shift")
Learn more about calculating risk differences in vignette("riskdiff")
Learn more about sorting Tplyr tables in vignette("sort")
Learn more about using Tplyr options in vignette("options")
And finally, learn more about producing and outputting styled tables using Tplyr in vignette("styled-table")