Getting Started with xportr
The demo will make use of a small ADSL
dataset available
with the xportr
package and has the following features:
- 306 observations
- 51 variables
- Data types other than character and numeric
- Missing labels on variables
- Missing label for data set
- Order of variables not following specification file
- Formats missing
To create a fully compliant v5 xpt ADSL
dataset, that
was developed using R, we will need to apply the 6 main functions within
the xportr
package:
Preparing your Specification Files
In order to make use of the functions within xportr
you will need to create an R data frame that contains your specification
file. You will most likely need to do some pre-processing of your spec
sheets after loading in the spec files for them to work appropriately
with the xportr
functions. Please see our example spec
sheets in
system.file(file.path("specs", "ADaM_spec.xlsx"), package = "xportr")
to see how xportr
expects the specification sheets.
var_spec <- read_xlsx(
system.file(file.path("specs/", "ADaM_spec.xlsx"), package = "xportr"),
sheet = "Variables"
) %>%
rename(type = "Data Type") %>%
rename_with(tolower)
Below is a quick snapshot of the specification file pertaining to the
ADSL
data set, which we will make use of in the 6
xportr function calls below. Take note of the order,
label, type, length and format columns.
xportr_type()
NOTE: We make use of str()
to expose
the attributes (length, labels, formats, type) of the datasets. We have
suppressed these calls for the sake of brevity.
In order to be compliant with transport v5 specifications an
xpt
file can only have two data types: character and
numeric/dbl. Currently the ADSL
data set has chr, dbl,
time, factor and date.
tibble [306 × 51] (S3: tbl_df/tbl/data.frame) $ STUDYID : chr [1:306] "CDISCPILOT01" "CDISCPILOT01" "CDISCPILOT01" "CDISCPILOT01" ... ..- attr(*, "label")= chr "Study Identifier" $ USUBJID : chr [1:306] "01-701-1015" "01-701-1023" "01-701-1028" "01-701-1033" ... ..- attr(*, "label")= chr "Unique Subject Identifier" $ SUBJID : chr [1:306] "1015" "1023" "1028" "1033" ... ..- attr(*, "label")= chr "Subject Identifier for the Study" $ RFSTDTC : chr [1:306] "2014-01-02" "2012-08-05" "2013-07-19" "2014-03-18" ... ..- attr(*, "label")= chr "Subject Reference Start Date/Time" $ RFENDTC : chr [1:306] "2014-07-02" "2012-09-02" "2014-01-14" "2014-04-14" ... ..- attr(*, "label")= chr "Subject Reference End Date/Time" $ RFXSTDTC: chr [1:306] "2014-01-02" "2012-08-05" "2013-07-19" "2014-03-18" ... ..- attr(*, "label")= chr "Date/Time of First Study Treatment" $ RFXENDTC: chr [1:306] "2014-07-02" "2012-09-01" "2014-01-14" "2014-03-31" ... ..- attr(*, "label")= chr "Date/Time of Last Study Treatment" $ RFICDTC : chr [1:306] NA NA NA NA ... ..- attr(*, "label")= chr "Date/Time of Informed Consent" $ RFPENDTC: chr [1:306] "2014-07-02T11:45" "2013-02-18" "2014-01-14T11:10" "2014-09-15" ... ..- attr(*, "label")= chr "Date/Time of End of Participation" $ DTHDTC : chr [1:306] NA NA NA NA ... ..- attr(*, "label")= chr "Date/Time of Death" $ DTHFL : chr [1:306] NA NA NA NA ... ..- attr(*, "label")= chr "Subject Death Flag" $ SITEID : chr [1:306] "701" "701" "701" "701" ... ..- attr(*, "label")= chr "Study Site Identifier" $ AGE : num [1:306] 63 64 71 74 77 85 59 68 81 84 ... ..- attr(*, "label")= chr "Age" $ AGEU : chr [1:306] "YEARS" "YEARS" "YEARS" "YEARS" ... ..- attr(*, "label")= chr "Age Units" $ SEX : chr [1:306] "F" "M" "M" "M" ... ..- attr(*, "label")= chr "Sex" $ RACE : chr [1:306] "WHITE" "WHITE" "WHITE" "WHITE" ... ..- attr(*, "label")= chr "Race" $ ETHNIC : chr [1:306] "HISPANIC OR LATINO" "HISPANIC OR LATINO" "NOT HISPANIC OR LATINO" "NOT HISPANIC OR LATINO" ... ..- attr(*, "label")= chr "Ethnicity" $ ARMCD : chr [1:306] "Pbo" "Pbo" "Xan_Hi" "Xan_Lo" ... ..- attr(*, "label")= chr "Planned Arm Code" $ ARM : chr [1:306] "Placebo" "Placebo" "Xanomeline High Dose" "Xanomeline Low Dose" ... ..- attr(*, "label")= chr "Description of Planned Arm" $ ACTARMCD: chr [1:306] "Pbo" "Pbo" "Xan_Hi" "Xan_Lo" ... ..- attr(*, "label")= chr "Actual Arm Code" $ ACTARM : chr [1:306] "Placebo" "Placebo" "Xanomeline High Dose" "Xanomeline Low Dose" ... ..- attr(*, "label")= chr "Description of Actual Arm" $ COUNTRY : chr [1:306] "USA" "USA" "USA" "USA" ... ..- attr(*, "label")= chr "Country" $ DMDTC : chr [1:306] "2013-12-26" "2012-07-22" "2013-07-11" "2014-03-10" ... ..- attr(*, "label")= chr "Date/Time of Collection" $ DMDY : num [1:306] -7 -14 -8 -8 -7 -21 NA -9 -13 -7 ... ..- attr(*, "label")= chr "Study Day of Collection" $ TRT01P : chr [1:306] "Placebo" "Placebo" "Xanomeline High Dose" "Xanomeline Low Dose" ... ..- attr(*, "label")= chr "Description of Planned Arm" $ TRT01A : chr [1:306] "Placebo" "Placebo" "Xanomeline High Dose" "Xanomeline Low Dose" ... ..- attr(*, "label")= chr "Description of Actual Arm" $ TRTSDTM : POSIXct[1:306], format: "2014-01-02" ... $ TRTSTMF : chr [1:306] "H" "H" "H" "H" ... $ TRTEDTM : POSIXct[1:306], format: "2014-07-02 23:59:59" ... $ TRTETMF : chr [1:306] "H" "H" "H" "H" ... $ TRTSDT : Date[1:306], format: "2014-01-02" ... $ TRTEDT : Date[1:306], format: "2014-07-02" ... $ TRTDURD : num [1:306] 182 28 180 14 183 26 NA 190 10 55 ... $ SCRFDT : Date[1:306], format: NA ... $ EOSDT : Date[1:306], format: "2014-07-02" ... $ EOSSTT : chr [1:306] "COMPLETED" "DISCONTINUED" "COMPLETED" "DISCONTINUED" ... $ FRVDT : Date[1:306], format: NA ... $ RANDDT : Date[1:306], format: "2014-01-02" ... $ DTHDT : Date[1:306], format: NA ... $ DTHDTF : chr [1:306] NA NA NA NA ... $ DTHADY : num [1:306] NA NA NA NA NA NA NA NA NA NA ... $ LDDTHELD: num [1:306] NA NA NA NA NA NA NA NA NA NA ... $ LSTALVDT: Date[1:306], format: "2014-07-02" ... $ SAFFL : chr [1:306] "Y" "Y" "Y" "Y" ... $ RACEGR1 : chr [1:306] "White" "White" "White" "White" ... $ AGEGR1 : chr [1:306] "18-64" "18-64" ">64" ">64" ... $ REGION1 : chr [1:306] "NA" "NA" "NA" "NA" ... $ LDDTHGR1: chr [1:306] NA NA NA NA ... $ DTH30FL : chr [1:306] NA NA NA NA ... $ DTHA30FL: chr [1:306] NA NA NA NA ... $ DTHB30FL: chr [1:306] NA NA NA NA ... - attr(*, "label")= chr "Demographics"
Using xportr_type()
and the supplied specification file,
we can coerce the variables in the ADSL
set to be
either numeric or character.
adsl_type <- xportr_type(adsl_xportr, var_spec, domain = "ADSL", verbose = "message")
Now all appropriate types have been applied to the dataset as seen below.
tibble [306 × 51] (S3: tbl_df/tbl/data.frame) $ STUDYID : chr [1:306] "CDISCPILOT01" "CDISCPILOT01" "CDISCPILOT01" "CDISCPILOT01" ... ..- attr(*, "label")= chr "Study Identifier" $ USUBJID : chr [1:306] "01-701-1015" "01-701-1023" "01-701-1028" "01-701-1033" ... ..- attr(*, "label")= chr "Unique Subject Identifier" $ SUBJID : chr [1:306] "1015" "1023" "1028" "1033" ... ..- attr(*, "label")= chr "Subject Identifier for the Study" $ RFSTDTC : chr [1:306] "2014-01-02" "2012-08-05" "2013-07-19" "2014-03-18" ... ..- attr(*, "label")= chr "Subject Reference Start Date/Time" $ RFENDTC : chr [1:306] "2014-07-02" "2012-09-02" "2014-01-14" "2014-04-14" ... ..- attr(*, "label")= chr "Subject Reference End Date/Time" $ RFXSTDTC: chr [1:306] "2014-01-02" "2012-08-05" "2013-07-19" "2014-03-18" ... ..- attr(*, "label")= chr "Date/Time of First Study Treatment" $ RFXENDTC: chr [1:306] "2014-07-02" "2012-09-01" "2014-01-14" "2014-03-31" ... ..- attr(*, "label")= chr "Date/Time of Last Study Treatment" $ RFICDTC : chr [1:306] NA NA NA NA ... ..- attr(*, "label")= chr "Date/Time of Informed Consent" $ RFPENDTC: chr [1:306] "2014-07-02T11:45" "2013-02-18" "2014-01-14T11:10" "2014-09-15" ... ..- attr(*, "label")= chr "Date/Time of End of Participation" $ DTHDTC : chr [1:306] NA NA NA NA ... ..- attr(*, "label")= chr "Date/Time of Death" $ DTHFL : chr [1:306] NA NA NA NA ... ..- attr(*, "label")= chr "Subject Death Flag" $ SITEID : chr [1:306] "701" "701" "701" "701" ... ..- attr(*, "label")= chr "Study Site Identifier" $ AGE : num [1:306] 63 64 71 74 77 85 59 68 81 84 ... ..- attr(*, "label")= chr "Age" $ AGEU : chr [1:306] "YEARS" "YEARS" "YEARS" "YEARS" ... ..- attr(*, "label")= chr "Age Units" $ SEX : chr [1:306] "F" "M" "M" "M" ... ..- attr(*, "label")= chr "Sex" $ RACE : chr [1:306] "WHITE" "WHITE" "WHITE" "WHITE" ... ..- attr(*, "label")= chr "Race" $ ETHNIC : chr [1:306] "HISPANIC OR LATINO" "HISPANIC OR LATINO" "NOT HISPANIC OR LATINO" "NOT HISPANIC OR LATINO" ... ..- attr(*, "label")= chr "Ethnicity" $ ARMCD : chr [1:306] "Pbo" "Pbo" "Xan_Hi" "Xan_Lo" ... ..- attr(*, "label")= chr "Planned Arm Code" $ ARM : chr [1:306] "Placebo" "Placebo" "Xanomeline High Dose" "Xanomeline Low Dose" ... ..- attr(*, "label")= chr "Description of Planned Arm" $ ACTARMCD: chr [1:306] "Pbo" "Pbo" "Xan_Hi" "Xan_Lo" ... ..- attr(*, "label")= chr "Actual Arm Code" $ ACTARM : chr [1:306] "Placebo" "Placebo" "Xanomeline High Dose" "Xanomeline Low Dose" ... ..- attr(*, "label")= chr "Description of Actual Arm" $ COUNTRY : chr [1:306] "USA" "USA" "USA" "USA" ... ..- attr(*, "label")= chr "Country" $ DMDTC : chr [1:306] "2013-12-26" "2012-07-22" "2013-07-11" "2014-03-10" ... ..- attr(*, "label")= chr "Date/Time of Collection" $ DMDY : num [1:306] -7 -14 -8 -8 -7 -21 NA -9 -13 -7 ... ..- attr(*, "label")= chr "Study Day of Collection" $ TRT01P : chr [1:306] "Placebo" "Placebo" "Xanomeline High Dose" "Xanomeline Low Dose" ... ..- attr(*, "label")= chr "Description of Planned Arm" $ TRT01A : chr [1:306] "Placebo" "Placebo" "Xanomeline High Dose" "Xanomeline Low Dose" ... ..- attr(*, "label")= chr "Description of Actual Arm" $ TRTSDTM : POSIXct[1:306], format: "2014-01-02" ... $ TRTSTMF : chr [1:306] "H" "H" "H" "H" ... $ TRTEDTM : POSIXct[1:306], format: "2014-07-02 23:59:59" ... $ TRTETMF : chr [1:306] "H" "H" "H" "H" ... $ TRTSDT : Date[1:306], format: "2014-01-02" ... $ TRTEDT : Date[1:306], format: "2014-07-02" ... $ TRTDURD : num [1:306] 182 28 180 14 183 26 NA 190 10 55 ... $ SCRFDT : Date[1:306], format: NA ... $ EOSDT : Date[1:306], format: "2014-07-02" ... $ EOSSTT : chr [1:306] "COMPLETED" "DISCONTINUED" "COMPLETED" "DISCONTINUED" ... $ FRVDT : Date[1:306], format: NA ... $ RANDDT : Date[1:306], format: "2014-01-02" ... $ DTHDT : Date[1:306], format: NA ... $ DTHDTF : chr [1:306] NA NA NA NA ... $ DTHADY : num [1:306] NA NA NA NA NA NA NA NA NA NA ... $ LDDTHELD: num [1:306] NA NA NA NA NA NA NA NA NA NA ... $ LSTALVDT: Date[1:306], format: "2014-07-02" ... $ SAFFL : chr [1:306] "Y" "Y" "Y" "Y" ... $ RACEGR1 : chr [1:306] "White" "White" "White" "White" ... $ AGEGR1 : chr [1:306] "18-64" "18-64" ">64" ">64" ... $ REGION1 : chr [1:306] "NA" "NA" "NA" "NA" ... $ LDDTHGR1: chr [1:306] NA NA NA NA ... $ DTH30FL : chr [1:306] NA NA NA NA ... $ DTHA30FL: chr [1:306] NA NA NA NA ... $ DTHB30FL: chr [1:306] NA NA NA NA ... - attr(*, "label")= chr "Demographics" - attr(*, "_xportr.df_arg_")= chr "ADSL"
xportr_length()
Next we can apply the lengths from a variable level specification
file to the data frame. xportr_length()
will identify
variables that are missing from your specification file. The function
will also alert you to how many lengths have been applied successfully.
Before we apply the lengths lets verify that no lengths have been
applied to the original dataframe.
tibble [306 × 51] (S3: tbl_df/tbl/data.frame) $ STUDYID : chr [1:306] "CDISCPILOT01" "CDISCPILOT01" "CDISCPILOT01" "CDISCPILOT01" ... ..- attr(*, "label")= chr "Study Identifier" $ USUBJID : chr [1:306] "01-701-1015" "01-701-1023" "01-701-1028" "01-701-1033" ... ..- attr(*, "label")= chr "Unique Subject Identifier" $ SUBJID : chr [1:306] "1015" "1023" "1028" "1033" ... ..- attr(*, "label")= chr "Subject Identifier for the Study" $ RFSTDTC : chr [1:306] "2014-01-02" "2012-08-05" "2013-07-19" "2014-03-18" ... ..- attr(*, "label")= chr "Subject Reference Start Date/Time" $ RFENDTC : chr [1:306] "2014-07-02" "2012-09-02" "2014-01-14" "2014-04-14" ... ..- attr(*, "label")= chr "Subject Reference End Date/Time" $ RFXSTDTC: chr [1:306] "2014-01-02" "2012-08-05" "2013-07-19" "2014-03-18" ... ..- attr(*, "label")= chr "Date/Time of First Study Treatment" $ RFXENDTC: chr [1:306] "2014-07-02" "2012-09-01" "2014-01-14" "2014-03-31" ... ..- attr(*, "label")= chr "Date/Time of Last Study Treatment" $ RFICDTC : chr [1:306] NA NA NA NA ... ..- attr(*, "label")= chr "Date/Time of Informed Consent" $ RFPENDTC: chr [1:306] "2014-07-02T11:45" "2013-02-18" "2014-01-14T11:10" "2014-09-15" ... ..- attr(*, "label")= chr "Date/Time of End of Participation" $ DTHDTC : chr [1:306] NA NA NA NA ... ..- attr(*, "label")= chr "Date/Time of Death" $ DTHFL : chr [1:306] NA NA NA NA ... ..- attr(*, "label")= chr "Subject Death Flag" $ SITEID : chr [1:306] "701" "701" "701" "701" ... ..- attr(*, "label")= chr "Study Site Identifier" $ AGE : num [1:306] 63 64 71 74 77 85 59 68 81 84 ... ..- attr(*, "label")= chr "Age" $ AGEU : chr [1:306] "YEARS" "YEARS" "YEARS" "YEARS" ... ..- attr(*, "label")= chr "Age Units" $ SEX : chr [1:306] "F" "M" "M" "M" ... ..- attr(*, "label")= chr "Sex" $ RACE : chr [1:306] "WHITE" "WHITE" "WHITE" "WHITE" ... ..- attr(*, "label")= chr "Race" $ ETHNIC : chr [1:306] "HISPANIC OR LATINO" "HISPANIC OR LATINO" "NOT HISPANIC OR LATINO" "NOT HISPANIC OR LATINO" ... ..- attr(*, "label")= chr "Ethnicity" $ ARMCD : chr [1:306] "Pbo" "Pbo" "Xan_Hi" "Xan_Lo" ... ..- attr(*, "label")= chr "Planned Arm Code" $ ARM : chr [1:306] "Placebo" "Placebo" "Xanomeline High Dose" "Xanomeline Low Dose" ... ..- attr(*, "label")= chr "Description of Planned Arm" $ ACTARMCD: chr [1:306] "Pbo" "Pbo" "Xan_Hi" "Xan_Lo" ... ..- attr(*, "label")= chr "Actual Arm Code" $ ACTARM : chr [1:306] "Placebo" "Placebo" "Xanomeline High Dose" "Xanomeline Low Dose" ... ..- attr(*, "label")= chr "Description of Actual Arm" $ COUNTRY : chr [1:306] "USA" "USA" "USA" "USA" ... ..- attr(*, "label")= chr "Country" $ DMDTC : chr [1:306] "2013-12-26" "2012-07-22" "2013-07-11" "2014-03-10" ... ..- attr(*, "label")= chr "Date/Time of Collection" $ DMDY : num [1:306] -7 -14 -8 -8 -7 -21 NA -9 -13 -7 ... ..- attr(*, "label")= chr "Study Day of Collection" $ TRT01P : chr [1:306] "Placebo" "Placebo" "Xanomeline High Dose" "Xanomeline Low Dose" ... ..- attr(*, "label")= chr "Description of Planned Arm" $ TRT01A : chr [1:306] "Placebo" "Placebo" "Xanomeline High Dose" "Xanomeline Low Dose" ... ..- attr(*, "label")= chr "Description of Actual Arm" $ TRTSDTM : POSIXct[1:306], format: "2014-01-02" ... $ TRTSTMF : chr [1:306] "H" "H" "H" "H" ... $ TRTEDTM : POSIXct[1:306], format: "2014-07-02 23:59:59" ... $ TRTETMF : chr [1:306] "H" "H" "H" "H" ... $ TRTSDT : Date[1:306], format: "2014-01-02" ... $ TRTEDT : Date[1:306], format: "2014-07-02" ... $ TRTDURD : num [1:306] 182 28 180 14 183 26 NA 190 10 55 ... $ SCRFDT : Date[1:306], format: NA ... $ EOSDT : Date[1:306], format: "2014-07-02" ... $ EOSSTT : chr [1:306] "COMPLETED" "DISCONTINUED" "COMPLETED" "DISCONTINUED" ... $ FRVDT : Date[1:306], format: NA ... $ RANDDT : Date[1:306], format: "2014-01-02" ... $ DTHDT : Date[1:306], format: NA ... $ DTHDTF : chr [1:306] NA NA NA NA ... $ DTHADY : num [1:306] NA NA NA NA NA NA NA NA NA NA ... $ LDDTHELD: num [1:306] NA NA NA NA NA NA NA NA NA NA ... $ LSTALVDT: Date[1:306], format: "2014-07-02" ... $ SAFFL : chr [1:306] "Y" "Y" "Y" "Y" ... $ RACEGR1 : chr [1:306] "White" "White" "White" "White" ... $ AGEGR1 : chr [1:306] "18-64" "18-64" ">64" ">64" ... $ REGION1 : chr [1:306] "NA" "NA" "NA" "NA" ... $ LDDTHGR1: chr [1:306] NA NA NA NA ... $ DTH30FL : chr [1:306] NA NA NA NA ... $ DTHA30FL: chr [1:306] NA NA NA NA ... $ DTHB30FL: chr [1:306] NA NA NA NA ... - attr(*, "label")= chr "Demographics"
No lengths have been applied to the variables as seen in the printout
- the lengths would be in the attr()
part of each
variables. Let’s now use xportr_length()
to apply our
lengths from the specification file.
adsl_length <- adsl_xportr %>% xportr_length(var_spec, domain = "ADSL", verbose = "message")
tibble [306 × 51] (S3: tbl_df/tbl/data.frame) $ STUDYID : chr [1:306] "CDISCPILOT01" "CDISCPILOT01" "CDISCPILOT01" "CDISCPILOT01" ... ..- attr(*, "label")= chr "Study Identifier" ..- attr(*, "width")= num 21 $ USUBJID : chr [1:306] "01-701-1015" "01-701-1023" "01-701-1028" "01-701-1033" ... ..- attr(*, "label")= chr "Unique Subject Identifier" ..- attr(*, "width")= num 30 $ SUBJID : chr [1:306] "1015" "1023" "1028" "1033" ... ..- attr(*, "label")= chr "Subject Identifier for the Study" ..- attr(*, "width")= num 8 $ RFSTDTC : chr [1:306] "2014-01-02" "2012-08-05" "2013-07-19" "2014-03-18" ... ..- attr(*, "label")= chr "Subject Reference Start Date/Time" ..- attr(*, "width")= num 19 $ RFENDTC : chr [1:306] "2014-07-02" "2012-09-02" "2014-01-14" "2014-04-14" ... ..- attr(*, "label")= chr "Subject Reference End Date/Time" ..- attr(*, "width")= num 19 $ RFXSTDTC: chr [1:306] "2014-01-02" "2012-08-05" "2013-07-19" "2014-03-18" ... ..- attr(*, "label")= chr "Date/Time of First Study Treatment" ..- attr(*, "width")= num 19 $ RFXENDTC: chr [1:306] "2014-07-02" "2012-09-01" "2014-01-14" "2014-03-31" ... ..- attr(*, "label")= chr "Date/Time of Last Study Treatment" ..- attr(*, "width")= num 19 $ RFICDTC : chr [1:306] NA NA NA NA ... ..- attr(*, "label")= chr "Date/Time of Informed Consent" ..- attr(*, "width")= num 19 $ RFPENDTC: chr [1:306] "2014-07-02T11:45" "2013-02-18" "2014-01-14T11:10" "2014-09-15" ... ..- attr(*, "label")= chr "Date/Time of End of Participation" ..- attr(*, "width")= num 19 $ DTHDTC : chr [1:306] NA NA NA NA ... ..- attr(*, "label")= chr "Date/Time of Death" ..- attr(*, "width")= num 19 $ DTHFL : chr [1:306] NA NA NA NA ... ..- attr(*, "label")= chr "Subject Death Flag" ..- attr(*, "width")= num 2 $ SITEID : chr [1:306] "701" "701" "701" "701" ... ..- attr(*, "label")= chr "Study Site Identifier" ..- attr(*, "width")= num 5 $ AGE : num [1:306] 63 64 71 74 77 85 59 68 81 84 ... ..- attr(*, "label")= chr "Age" ..- attr(*, "width")= num 8 $ AGEU : chr [1:306] "YEARS" "YEARS" "YEARS" "YEARS" ... ..- attr(*, "label")= chr "Age Units" ..- attr(*, "width")= num 10 $ SEX : chr [1:306] "F" "M" "M" "M" ... ..- attr(*, "label")= chr "Sex" ..- attr(*, "width")= num 1 $ RACE : chr [1:306] "WHITE" "WHITE" "WHITE" "WHITE" ... ..- attr(*, "label")= chr "Race" ..- attr(*, "width")= num 60 $ ETHNIC : chr [1:306] "HISPANIC OR LATINO" "HISPANIC OR LATINO" "NOT HISPANIC OR LATINO" "NOT HISPANIC OR LATINO" ... ..- attr(*, "label")= chr "Ethnicity" ..- attr(*, "width")= num 100 $ ARMCD : chr [1:306] "Pbo" "Pbo" "Xan_Hi" "Xan_Lo" ... ..- attr(*, "label")= chr "Planned Arm Code" ..- attr(*, "width")= num 20 $ ARM : chr [1:306] "Placebo" "Placebo" "Xanomeline High Dose" "Xanomeline Low Dose" ... ..- attr(*, "label")= chr "Description of Planned Arm" ..- attr(*, "width")= num 200 $ ACTARMCD: chr [1:306] "Pbo" "Pbo" "Xan_Hi" "Xan_Lo" ... ..- attr(*, "label")= chr "Actual Arm Code" ..- attr(*, "width")= num 20 $ ACTARM : chr [1:306] "Placebo" "Placebo" "Xanomeline High Dose" "Xanomeline Low Dose" ... ..- attr(*, "label")= chr "Description of Actual Arm" ..- attr(*, "width")= num 200 $ COUNTRY : chr [1:306] "USA" "USA" "USA" "USA" ... ..- attr(*, "label")= chr "Country" ..- attr(*, "width")= num 3 $ DMDTC : chr [1:306] "2013-12-26" "2012-07-22" "2013-07-11" "2014-03-10" ... ..- attr(*, "label")= chr "Date/Time of Collection" ..- attr(*, "width")= num 19 $ DMDY : num [1:306] -7 -14 -8 -8 -7 -21 NA -9 -13 -7 ... ..- attr(*, "label")= chr "Study Day of Collection" ..- attr(*, "width")= num 8 $ TRT01P : chr [1:306] "Placebo" "Placebo" "Xanomeline High Dose" "Xanomeline Low Dose" ... ..- attr(*, "label")= chr "Description of Planned Arm" ..- attr(*, "width")= num 40 $ TRT01A : chr [1:306] "Placebo" "Placebo" "Xanomeline High Dose" "Xanomeline Low Dose" ... ..- attr(*, "label")= chr "Description of Actual Arm" ..- attr(*, "width")= num 40 $ TRTSDTM : POSIXct[1:306], format: "2014-01-02" ... $ TRTSTMF : chr [1:306] "H" "H" "H" "H" ... ..- attr(*, "width")= num 2 $ TRTEDTM : POSIXct[1:306], format: "2014-07-02 23:59:59" ... $ TRTETMF : chr [1:306] "H" "H" "H" "H" ... ..- attr(*, "width")= num 2 $ TRTSDT : Date[1:306], format: "2014-01-02" ... $ TRTEDT : Date[1:306], format: "2014-07-02" ... $ TRTDURD : num [1:306] 182 28 180 14 183 26 NA 190 10 55 ... ..- attr(*, "width")= num 8 $ SCRFDT : Date[1:306], format: NA ... $ EOSDT : Date[1:306], format: "2014-07-02" ... $ EOSSTT : chr [1:306] "COMPLETED" "DISCONTINUED" "COMPLETED" "DISCONTINUED" ... ..- attr(*, "width")= num 200 $ FRVDT : Date[1:306], format: NA ... $ RANDDT : Date[1:306], format: "2014-01-02" ... $ DTHDT : Date[1:306], format: NA ... $ DTHDTF : chr [1:306] NA NA NA NA ... ..- attr(*, "width")= num 2 $ DTHADY : num [1:306] NA NA NA NA NA NA NA NA NA NA ... ..- attr(*, "width")= num 8 $ LDDTHELD: num [1:306] NA NA NA NA NA NA NA NA NA NA ... ..- attr(*, "width")= num 8 $ LSTALVDT: Date[1:306], format: "2014-07-02" ... $ SAFFL : chr [1:306] "Y" "Y" "Y" "Y" ... ..- attr(*, "width")= num 2 $ RACEGR1 : chr [1:306] "White" "White" "White" "White" ... ..- attr(*, "width")= num 200 $ AGEGR1 : chr [1:306] "18-64" "18-64" ">64" ">64" ... ..- attr(*, "width")= num 20 $ REGION1 : chr [1:306] "NA" "NA" "NA" "NA" ... ..- attr(*, "width")= num 80 $ LDDTHGR1: chr [1:306] NA NA NA NA ... ..- attr(*, "width")= num 200 $ DTH30FL : chr [1:306] NA NA NA NA ... ..- attr(*, "width")= num 200 $ DTHA30FL: chr [1:306] NA NA NA NA ... ..- attr(*, "width")= num 200 $ DTHB30FL: chr [1:306] NA NA NA NA ... ..- attr(*, "width")= num 200 - attr(*, "label")= chr "Demographics" - attr(*, "_xportr.df_arg_")= chr "ADSL"
Note the additional attr(*, "width")=
after each
variable with the width. These have been directly applied from the
specification file that we loaded above!
xportr_order()
Please note that the order of the ADSL
variables, see
above, does not match the specification file order
column.
We can quickly remedy this with a call to xportr_order()
.
Note that the variable SITEID
has been moved as well as
many others to match the specification file order column. Variables not
in the spec are moved to the end of the data and a message is written to
the console.
adsl_order <- xportr_order(adsl_xportr, var_spec, domain = "ADSL", verbose = "message")
── All variables in specification file are in dataset ──
── 50 reordered in dataset ──
Variable reordered in `.df`: `SITEID`, `USUBJID`, `SUBJID`,
`COUNTRY`, `AGE`, `AGEU`, `AGEGR1`, `SEX`, `RACE`,
`RACEGR1`, `ETHNIC`, `RFSTDTC`, `RFENDTC`, `RFXSTDTC`,
`RFXENDTC`, `RFICDTC`, `RFPENDTC`, `DMDTC`, `DMDY`,
`SAFFL`, `ARM`, `ARMCD`, `ACTARM`, `ACTARMCD`, `TRT01P`,
`TRT01A`, `TRTSDTM`, `TRTSTMF`, `TRTEDTM`, `TRTETMF`,
`TRTSDT`, `TRTEDT`, `DTHFL`, `DTHDTC`, `DTHDT`, `DTHDTF`,
`DTHADY`, `REGION1`, `TRTDURD`, `LDDTHELD`, `LSTALVDT`,
`LDDTHGR1`, `DTH30FL`, `DTHA30FL`, `DTHB30FL`, `FRVDT`,
`RANDDT`, `SCRFDT`, `EOSDT`, and `EOSSTT`
xportr_format()
Now we apply formats to the dataset. These will typically be
DATE9.
, DATETIME20
or TIME5
, but
many others can be used. Notice that in the ADSL
dataset
there are 8 Date/Time variables and they are missing formats. Here we
just take a peak at a few TRT
variables, which have a
NULL
format.
# A tibble: 4 × 2 Variable Format1 TRTSDT
2 TRTEDT 3 TRTSDTM 4 TRTEDTM
Using our xportr_format()
we can apply our formats to
the dataset.
adsl_fmt <- adsl_xportr %>% xportr_format(var_spec, domain = "ADSL")
# A tibble: 4 × 2 Variable Format1 TRTSDT DATE9. 2 TRTEDT DATE9. 3 TRTSDTM DATETIME20. 4 TRTEDTM DATETIME20.
NOTE: You can use
attr(data$variable, which = "format")
to inspect formats
applied to a dataframe. The above output has these individual calls
bound together for easier viewing.
xportr_label()
Please observe that our ADSL
dataset is missing many
variable labels. Sometimes these labels can be lost while using R’s
function. However, a CDISC compliant data set needs to have each
variable with a label.
tibble [306 × 51] (S3: tbl_df/tbl/data.frame) $ STUDYID : chr [1:306] "CDISCPILOT01" "CDISCPILOT01" "CDISCPILOT01" "CDISCPILOT01" ... $ USUBJID : chr [1:306] "01-701-1015" "01-701-1023" "01-701-1028" "01-701-1033" ... $ SUBJID : chr [1:306] "1015" "1023" "1028" "1033" ... $ RFSTDTC : chr [1:306] "2014-01-02" "2012-08-05" "2013-07-19" "2014-03-18" ... $ RFENDTC : chr [1:306] "2014-07-02" "2012-09-02" "2014-01-14" "2014-04-14" ... $ RFXSTDTC: chr [1:306] "2014-01-02" "2012-08-05" "2013-07-19" "2014-03-18" ... $ RFXENDTC: chr [1:306] "2014-07-02" "2012-09-01" "2014-01-14" "2014-03-31" ... $ RFICDTC : chr [1:306] NA NA NA NA ... $ RFPENDTC: chr [1:306] "2014-07-02T11:45" "2013-02-18" "2014-01-14T11:10" "2014-09-15" ... $ DTHDTC : chr [1:306] NA NA NA NA ... $ DTHFL : chr [1:306] NA NA NA NA ... $ SITEID : chr [1:306] "701" "701" "701" "701" ... $ AGE : num [1:306] 63 64 71 74 77 85 59 68 81 84 ... $ AGEU : chr [1:306] "YEARS" "YEARS" "YEARS" "YEARS" ... $ SEX : chr [1:306] "F" "M" "M" "M" ... $ RACE : chr [1:306] "WHITE" "WHITE" "WHITE" "WHITE" ... $ ETHNIC : chr [1:306] "HISPANIC OR LATINO" "HISPANIC OR LATINO" "NOT HISPANIC OR LATINO" "NOT HISPANIC OR LATINO" ... $ ARMCD : chr [1:306] "Pbo" "Pbo" "Xan_Hi" "Xan_Lo" ... $ ARM : chr [1:306] "Placebo" "Placebo" "Xanomeline High Dose" "Xanomeline Low Dose" ... $ ACTARMCD: chr [1:306] "Pbo" "Pbo" "Xan_Hi" "Xan_Lo" ... $ ACTARM : chr [1:306] "Placebo" "Placebo" "Xanomeline High Dose" "Xanomeline Low Dose" ... $ COUNTRY : chr [1:306] "USA" "USA" "USA" "USA" ... $ DMDTC : chr [1:306] "2013-12-26" "2012-07-22" "2013-07-11" "2014-03-10" ... $ DMDY : num [1:306] -7 -14 -8 -8 -7 -21 NA -9 -13 -7 ... $ TRT01P : chr [1:306] "Placebo" "Placebo" "Xanomeline High Dose" "Xanomeline Low Dose" ... $ TRT01A : chr [1:306] "Placebo" "Placebo" "Xanomeline High Dose" "Xanomeline Low Dose" ... $ TRTSDTM : POSIXct[1:306], format: "2014-01-02" ... $ TRTSTMF : chr [1:306] "H" "H" "H" "H" ... $ TRTEDTM : POSIXct[1:306], format: "2014-07-02 23:59:59" ... $ TRTETMF : chr [1:306] "H" "H" "H" "H" ... $ TRTSDT : Date[1:306], format: "2014-01-02" ... $ TRTEDT : Date[1:306], format: "2014-07-02" ... $ TRTDURD : num [1:306] 182 28 180 14 183 26 NA 190 10 55 ... $ SCRFDT : Date[1:306], format: NA ... $ EOSDT : Date[1:306], format: "2014-07-02" ... $ EOSSTT : chr [1:306] "COMPLETED" "DISCONTINUED" "COMPLETED" "DISCONTINUED" ... $ FRVDT : Date[1:306], format: NA ... $ RANDDT : Date[1:306], format: "2014-01-02" ... $ DTHDT : Date[1:306], format: NA ... $ DTHDTF : chr [1:306] NA NA NA NA ... $ DTHADY : num [1:306] NA NA NA NA NA NA NA NA NA NA ... $ LDDTHELD: num [1:306] NA NA NA NA NA NA NA NA NA NA ... $ LSTALVDT: Date[1:306], format: "2014-07-02" ... $ SAFFL : chr [1:306] "Y" "Y" "Y" "Y" ... $ RACEGR1 : chr [1:306] "White" "White" "White" "White" ... $ AGEGR1 : chr [1:306] "18-64" "18-64" ">64" ">64" ... $ REGION1 : chr [1:306] "NA" "NA" "NA" "NA" ... $ LDDTHGR1: chr [1:306] NA NA NA NA ... $ DTH30FL : chr [1:306] NA NA NA NA ... $ DTHA30FL: chr [1:306] NA NA NA NA ... $ DTHB30FL: chr [1:306] NA NA NA NA ... - attr(*, "label")= chr "Demographics"
Using the xport_label
function we can take the
specifications file and label all the variables available.
xportr_label
will produce a warning message if you the
variable in the data set is not in the specification file.
adsl_lbl <- adsl_xportr %>% xportr_label(var_spec, domain = "ADSL", "message")
str(adsl_lbl)
tibble [306 × 51] (S3: tbl_df/tbl/data.frame) $ STUDYID : chr [1:306] "CDISCPILOT01" "CDISCPILOT01" "CDISCPILOT01" "CDISCPILOT01" ... ..- attr(*, "label")= chr "Study Identifier" $ USUBJID : chr [1:306] "01-701-1015" "01-701-1023" "01-701-1028" "01-701-1033" ... ..- attr(*, "label")= chr "Unique Subject Identifier" $ SUBJID : chr [1:306] "1015" "1023" "1028" "1033" ... ..- attr(*, "label")= chr "Subject Identifier for the Study" $ RFSTDTC : chr [1:306] "2014-01-02" "2012-08-05" "2013-07-19" "2014-03-18" ... ..- attr(*, "label")= chr "Subject Reference Start Date/Time" $ RFENDTC : chr [1:306] "2014-07-02" "2012-09-02" "2014-01-14" "2014-04-14" ... ..- attr(*, "label")= chr "Subject Reference End Date/Time" $ RFXSTDTC: chr [1:306] "2014-01-02" "2012-08-05" "2013-07-19" "2014-03-18" ... ..- attr(*, "label")= chr "Date/Time of First Study Treatment" $ RFXENDTC: chr [1:306] "2014-07-02" "2012-09-01" "2014-01-14" "2014-03-31" ... ..- attr(*, "label")= chr "Date/Time of Last Study Treatment" $ RFICDTC : chr [1:306] NA NA NA NA ... ..- attr(*, "label")= chr "Date/Time of Informed Consent" $ RFPENDTC: chr [1:306] "2014-07-02T11:45" "2013-02-18" "2014-01-14T11:10" "2014-09-15" ... ..- attr(*, "label")= chr "Date/Time of End of Participation" $ DTHDTC : chr [1:306] NA NA NA NA ... ..- attr(*, "label")= chr "Date / Time of Death" $ DTHFL : chr [1:306] NA NA NA NA ... ..- attr(*, "label")= chr "Subject Death Flag" $ SITEID : chr [1:306] "701" "701" "701" "701" ... ..- attr(*, "label")= chr "Study Site Identifier" $ AGE : num [1:306] 63 64 71 74 77 85 59 68 81 84 ... ..- attr(*, "label")= chr "Age" $ AGEU : chr [1:306] "YEARS" "YEARS" "YEARS" "YEARS" ... ..- attr(*, "label")= chr "Age Units" $ SEX : chr [1:306] "F" "M" "M" "M" ... ..- attr(*, "label")= chr "Sex" $ RACE : chr [1:306] "WHITE" "WHITE" "WHITE" "WHITE" ... ..- attr(*, "label")= chr "Race" $ ETHNIC : chr [1:306] "HISPANIC OR LATINO" "HISPANIC OR LATINO" "NOT HISPANIC OR LATINO" "NOT HISPANIC OR LATINO" ... ..- attr(*, "label")= chr "Ethnicity" $ ARMCD : chr [1:306] "Pbo" "Pbo" "Xan_Hi" "Xan_Lo" ... ..- attr(*, "label")= chr "Planned Arm Code" $ ARM : chr [1:306] "Placebo" "Placebo" "Xanomeline High Dose" "Xanomeline Low Dose" ... ..- attr(*, "label")= chr "Description of Planned Arm" $ ACTARMCD: chr [1:306] "Pbo" "Pbo" "Xan_Hi" "Xan_Lo" ... ..- attr(*, "label")= chr "Actual Arm Code" $ ACTARM : chr [1:306] "Placebo" "Placebo" "Xanomeline High Dose" "Xanomeline Low Dose" ... ..- attr(*, "label")= chr "Description of Actual Arm" $ COUNTRY : chr [1:306] "USA" "USA" "USA" "USA" ... ..- attr(*, "label")= chr "Country" $ DMDTC : chr [1:306] "2013-12-26" "2012-07-22" "2013-07-11" "2014-03-10" ... ..- attr(*, "label")= chr "Date/Time of Collection" $ DMDY : num [1:306] -7 -14 -8 -8 -7 -21 NA -9 -13 -7 ... ..- attr(*, "label")= chr "Study Day of Collection" $ TRT01P : chr [1:306] "Placebo" "Placebo" "Xanomeline High Dose" "Xanomeline Low Dose" ... ..- attr(*, "label")= chr "Planned Treatment for Period 01" $ TRT01A : chr [1:306] "Placebo" "Placebo" "Xanomeline High Dose" "Xanomeline Low Dose" ... ..- attr(*, "label")= chr "Actual Treatment for Period 01" $ TRTSDTM : POSIXct[1:306], format: "2014-01-02" ... $ TRTSTMF : chr [1:306] "H" "H" "H" "H" ... ..- attr(*, "label")= chr "Time of First Exposure Imputation Flag" $ TRTEDTM : POSIXct[1:306], format: "2014-07-02 23:59:59" ... $ TRTETMF : chr [1:306] "H" "H" "H" "H" ... ..- attr(*, "label")= chr "Time of Last Exposure Imputation Flag" $ TRTSDT : Date[1:306], format: "2014-01-02" ... $ TRTEDT : Date[1:306], format: "2014-07-02" ... $ TRTDURD : num [1:306] 182 28 180 14 183 26 NA 190 10 55 ... ..- attr(*, "label")= chr "Total Duration of Trt (days)" $ SCRFDT : Date[1:306], format: NA ... $ EOSDT : Date[1:306], format: "2014-07-02" ... $ EOSSTT : chr [1:306] "COMPLETED" "DISCONTINUED" "COMPLETED" "DISCONTINUED" ... ..- attr(*, "label")= chr "End of Study Status" $ FRVDT : Date[1:306], format: NA ... $ RANDDT : Date[1:306], format: "2014-01-02" ... $ DTHDT : Date[1:306], format: NA ... $ DTHDTF : chr [1:306] NA NA NA NA ... ..- attr(*, "label")= chr "Date of Death Imputation Flag" $ DTHADY : num [1:306] NA NA NA NA NA NA NA NA NA NA ... ..- attr(*, "label")= chr "Relative Day of Death" $ LDDTHELD: num [1:306] NA NA NA NA NA NA NA NA NA NA ... ..- attr(*, "label")= chr "Elapsed Days from Last Dose to Death" $ LSTALVDT: Date[1:306], format: "2014-07-02" ... $ SAFFL : chr [1:306] "Y" "Y" "Y" "Y" ... ..- attr(*, "label")= chr "Safety Population Flag" $ RACEGR1 : chr [1:306] "White" "White" "White" "White" ... ..- attr(*, "label")= chr "Pooled Race Group 1" $ AGEGR1 : chr [1:306] "18-64" "18-64" ">64" ">64" ... ..- attr(*, "label")= chr "Pooled Age Group 1" $ REGION1 : chr [1:306] "NA" "NA" "NA" "NA" ... ..- attr(*, "label")= chr "Geographic Region 1" $ LDDTHGR1: chr [1:306] NA NA NA NA ... ..- attr(*, "label")= chr "Last Does to Death Group" $ DTH30FL : chr [1:306] NA NA NA NA ... ..- attr(*, "label")= chr "Under 30 Group" $ DTHA30FL: chr [1:306] NA NA NA NA ... ..- attr(*, "label")= chr "Over 30 Group" $ DTHB30FL: chr [1:306] NA NA NA NA ... ..- attr(*, "label")= chr "Over 30 plus 30 days Group" - attr(*, "label")= chr "Demographics" - attr(*, "_xportr.df_arg_")= chr "ADSL"
xportr_write()
Finally, we arrive at exporting the R data frame object as a
xpt
file with xportr_write()
. The
xpt
file will be written directly to your current working
directory. To make it more interesting, we have put together all six
functions with the magrittr pipe, %>%
. A user can now
apply types, length, variable labels, formats, data set label and write
out their final xpt file in one pipe! Appropriate warnings and messages
will be supplied to a user to the console for any potential issues
before sending off to standard clinical data set validator application
or data reviewers.
adsl_xportr %>%
xportr_type(var_spec, "ADSL", "message") %>%
xportr_length(var_spec, "ADSL", verbose = "message") %>%
xportr_label(var_spec, "ADSL", "message") %>%
xportr_order(var_spec, "ADSL", "message") %>%
xportr_format(var_spec, "ADSL") %>%
xportr_write("adsl.xpt")
── All variables in specification file are in dataset ──
── 50 reordered in dataset ──
Variable reordered in `.df`: `SITEID`, `USUBJID`, `SUBJID`,
`COUNTRY`, `AGE`, `AGEU`, `AGEGR1`, `SEX`, `RACE`,
`RACEGR1`, `ETHNIC`, `RFSTDTC`, `RFENDTC`, `RFXSTDTC`,
`RFXENDTC`, `RFICDTC`, `RFPENDTC`, `DMDTC`, `DMDY`,
`SAFFL`, `ARM`, `ARMCD`, `ACTARM`, `ACTARMCD`, `TRT01P`,
`TRT01A`, `TRTSDTM`, `TRTSTMF`, `TRTEDTM`, `TRTETMF`,
`TRTSDT`, `TRTEDT`, `DTHFL`, `DTHDTC`, `DTHDT`, `DTHDTF`,
`DTHADY`, `REGION1`, `TRTDURD`, `LDDTHELD`, `LSTALVDT`,
`LDDTHGR1`, `DTH30FL`, `DTHA30FL`, `DTHB30FL`, `FRVDT`,
`RANDDT`, `SCRFDT`, `EOSDT`, and `EOSSTT`
That’s it! We now have a xpt
file created in R with all
appropriate types, lengths, labels, ordering and formats from our
specification file. If you are interested in exploring more of the
custom warnings and error messages as well as more background on
xpt
generation be sure to check out the Deep Dive User Guide.
As always, we welcome your feedback. If you spot a bug, would like to see a new feature, or if any documentation is unclear - submit an issue on xportr’s GitHub page.