Analysis Results Data • tplyr2

Introduction

Clinical summary tables are typically built for human consumption: aligned columns, formatted numbers, parenthesized percentages. But increasingly, regulatory and industry workflows require the underlying results in a machine-readable, standards-compliant format. The Analysis Results Data (ARD) model – part of the broader CDISC ecosystem – addresses this by defining a long-format structure where each row represents a single statistic for a single group combination.

tplyr2 supports this through two functions:

tplyr_to_ard() converts a built result into ARD long format.
tplyr_from_ard() reconstructs a formatted table from ARD data and a spec.

Together, these functions let you separate the computed values from their presentation, enabling workflows where results are archived, exchanged, or validated independently of formatting.

Converting to ARD

Any result from tplyr_build() can be converted to ARD format. The raw numeric data is already attached to every build result as an attribute; tplyr_to_ard() melts that data into one-row-per-statistic long format.

Here is a complete example using a demographics table with both count and descriptive statistics layers:

spec <- tplyr_spec(
  cols = "TRT01P",
  layers = tplyr_layers(
    group_count("SEX"),
    group_desc(
      "AGE",
      settings = layer_settings(
        format_strings = list(
          "n"         = f_str("xxx", "n"),
          "Mean (SD)" = f_str("xx.x (xx.xx)", "mean", "sd"),
          "Median"    = f_str("xx.x", "median"),
          "Min, Max"  = f_str("xx, xx", "min", "max")
        )
      )
    )
  )
)

result <- tplyr_build(spec, tplyr_adsl)

The formatted output looks like a typical clinical table:

kable(result[, !grepl("^ord", names(result))])

rowlabel1	res1	res2	res3
F	53 (61.6%)	40 (47.6%)	50 (59.5%)
M	33 (38.4%)	44 (52.4%)	34 (40.5%)
n	86	84	84
Mean (SD)	75.2 ( 8.59)	74.4 ( 7.89)	75.7 ( 8.29)
Median	76.0	76.0	77.5
Min, Max	52, 89	56, 88	51, 88

Converting to ARD is a single function call:

ard <- tplyr_to_ard(result)
kable(head(ard, 15))

analysis_id	TRT01P	SEX	stat_name	stat_value
1	Placebo	F	n	53.00000
1	Placebo	M	n	33.00000
1	Xanomeline High Dose	F	n	40.00000
1	Xanomeline High Dose	M	n	44.00000
1	Xanomeline Low Dose	F	n	50.00000
1	Xanomeline Low Dose	M	n	34.00000
1	Placebo	F	pct	61.62791
1	Placebo	M	pct	38.37209
1	Xanomeline High Dose	F	pct	47.61905
1	Xanomeline High Dose	M	pct	52.38095
1	Xanomeline Low Dose	F	pct	59.52381
1	Xanomeline Low Dose	M	pct	40.47619
1	Placebo	F	total	86.00000
1	Placebo	M	total	86.00000
1	Xanomeline High Dose	F	total	84.00000

ARD Structure

The ARD output is a data frame in long format. Each row represents one statistic for one group combination. The columns are:

analysis_id: An integer identifying which layer produced the row. Layer 1 is the first layer in the spec, layer 2 is the second, and so on.
Grouping columns: The original data variables that define the groups. For a count layer with cols = "TRT01P" and target_var = "SEX", you will see TRT01P and SEX columns.
stat_name: The name of the statistic – for example, "n", "pct", "total", "mean", "sd", "median", "min", "max".
stat_value: The numeric value of the statistic.

Let us examine the count layer (analysis_id 1) and descriptive statistics layer (analysis_id 2) separately.

Count Layer ARD

The count layer produces statistics like n, pct, and total for each combination of treatment arm and target variable level:

count_ard <- ard[ard$analysis_id == 1, ]
kable(count_ard)

analysis_id	TRT01P	SEX	stat_name	stat_value
1	Placebo	F	n	53.00000
1	Placebo	M	n	33.00000
1	Xanomeline High Dose	F	n	40.00000
1	Xanomeline High Dose	M	n	44.00000
1	Xanomeline Low Dose	F	n	50.00000
1	Xanomeline Low Dose	M	n	34.00000
1	Placebo	F	pct	61.62791
1	Placebo	M	pct	38.37209
1	Xanomeline High Dose	F	pct	47.61905
1	Xanomeline High Dose	M	pct	52.38095
1	Xanomeline Low Dose	F	pct	59.52381
1	Xanomeline Low Dose	M	pct	40.47619
1	Placebo	F	total	86.00000
1	Placebo	M	total	86.00000
1	Xanomeline High Dose	F	total	84.00000
1	Xanomeline High Dose	M	total	84.00000
1	Xanomeline Low Dose	F	total	84.00000
1	Xanomeline Low Dose	M	total	84.00000

Each sex-by-treatment combination has three rows: the raw count (n), the percentage (pct), and the denominator (total).

Descriptive Statistics Layer ARD

The descriptive layer produces a richer set of statistics. Even if the format strings only reference n, mean, sd, median, min, and max, the numeric data snapshot captures all computed statistics:

desc_ard <- ard[ard$analysis_id == 2, ]
kable(head(desc_ard, 20))

	analysis_id	TRT01P	SEX	stat_name	stat_value
19	2	Placebo	NA	n	86.000000
20	2	Xanomeline High Dose	NA	n	84.000000
21	2	Xanomeline Low Dose	NA	n	84.000000
22	2	Placebo	NA	mean	75.209302
23	2	Xanomeline High Dose	NA	mean	74.380952
24	2	Xanomeline Low Dose	NA	mean	75.666667
25	2	Placebo	NA	sd	8.590167
26	2	Xanomeline High Dose	NA	sd	7.886094
27	2	Xanomeline Low Dose	NA	sd	8.286051
28	2	Placebo	NA	median	76.000000
29	2	Xanomeline High Dose	NA	median	76.000000
30	2	Xanomeline Low Dose	NA	median	77.500000
31	2	Placebo	NA	var	73.790971
32	2	Xanomeline High Dose	NA	var	62.190476
33	2	Xanomeline Low Dose	NA	var	68.658635
34	2	Placebo	NA	min	52.000000
35	2	Xanomeline High Dose	NA	min	56.000000
36	2	Xanomeline Low Dose	NA	min	51.000000
37	2	Placebo	NA	max	89.000000
38	2	Xanomeline High Dose	NA	max	88.000000

This is one of the key advantages of the ARD format: it preserves every computed value, not just those that appear in the formatted output.

Reconstructing from ARD

Given an ARD data frame and the original spec, tplyr_from_ard() applies the spec’s formatting rules to rebuild the display table. This completes the round-trip:

rebuilt <- tplyr_from_ard(ard, spec)
kable(rebuilt[, !grepl("^ord", names(rebuilt))])

rowlabel1	res1	res2	res3
F	53 (61.6%)	40 (47.6%)	50 (59.5%)
M	33 (38.4%)	44 (52.4%)	34 (40.5%)
Mean (SD)	75.2 ( 8.59)	74.4 ( 7.89)	75.7 ( 8.29)
Median	76.0	76.0	77.5
Min, Max	52, 89	56, 88	51, 88
n	86	84	84

The reconstructed table applies the same format strings, row labels, and column structure defined in the spec. This means the spec acts as a reusable formatting template: the same ARD data can be reformatted with different specs if needed, or the same spec can be applied to ARD data from different studies.

Verifying the Round-Trip

We can confirm that the formatted values match between the original build and the ARD reconstruction:

original_sorted <- result[order(result$rowlabel1), ]
rebuilt_sorted  <- rebuilt[order(rebuilt$rowlabel1), ]

all(trimws(original_sorted$res1) == trimws(rebuilt_sorted$res1))
#> [1] TRUE

The formatted cell values are identical, confirming that no information is lost in the conversion.

Use Cases

Standards-Compliant Data Exchange

When submitting results to a regulatory agency or sharing across organizations, the ARD format provides a self-describing, tool-agnostic representation of the computed values. The long format is straightforward to validate, compare across submissions, or load into any analysis environment.

Separating Computation from Presentation

In a production workflow, you might compute results once, archive the ARD, and then apply formatting later – or apply different formatting for different audiences. The spec defines the presentation; the ARD holds the numbers.

# Same ARD, different formatting
compact_spec <- tplyr_spec(
  cols = "TRT01P",
  layers = tplyr_layers(
    group_count(
      "SEX",
      settings = layer_settings(
        format_strings = list(n_counts = f_str("xx", "n"))
      )
    ),
    group_desc(
      "AGE",
      settings = layer_settings(
        format_strings = list(
          "n"    = f_str("xx", "n"),
          "Mean" = f_str("xx.x", "mean")
        )
      )
    )
  )
)

compact_result <- tplyr_from_ard(ard, compact_spec)
kable(compact_result[, !grepl("^ord", names(compact_result))])

rowlabel1	res1	res2	res3
F	53	40	50
M	33	44	34
Mean	75.2	74.4	75.7
n	86	84	84

The same underlying data now appears in a more compact layout, with counts displayed without percentages and only two descriptive statistics rows.

Archiving and Reproducibility

Because the ARD captures every computed statistic as a plain numeric value, it serves as a durable archive of the analysis results. Paired with a saved spec (see vignette("serialize")), the full table can be reproduced at any time without re-running the analysis against the source data.

Summary

The ARD workflow in tplyr2 is straightforward:

Build your table with tplyr_build() as usual.
Convert to long format with tplyr_to_ard() to get one row per statistic.
Reconstruct a formatted table with tplyr_from_ard() using any compatible spec.

This separation of computed results from formatted output supports standards-compliant data exchange, flexible re-formatting, and reproducible archiving of analysis results.