Skip to contents

Per the FDA Study Data Technical Conformance Guide(https://www.fda.gov/media/88173/download) section 3.3.2, dataset files sizes shouldn't exceed 5 GB. If datasets are large enough, they should be split based on a variable. For example, laboratory readings in ADLB can be split by LBCAT to split up hematology and chemistry data.

Usage

xportr_split(.df, split_by = NULL)

Arguments

.df

A data frame of CDISC standard.

split_by

A quoted variable that will be passed to base::split().

Value

A data frame with an additional attribute added so xportr_write()

knows how to split the data frame.

Details

This function will tell xportr_write() to split the data frame based on the variable passed in split_by. When written, the file name will be prepended with a number for uniqueness. These files should be noted in the Reviewer Guides per CDISC guidance to note how you split your files.

Examples


adlb <- data.frame(
  USUBJID = c(1001, 1002, 1003),
  LBCAT = c("HEMATOLOGY", "HEMATOLOGY", "CHEMISTRY")
)

adlb <- xportr_split(adlb, "LBCAT")