Getting Started with xportr

The demo will make use of a small ADSL data set that is apart of the {admiral} package. The script that generates this ADSL dataset can be created by using this command admiral::use_ad_template("adsl").

The ADSL has the following features:

  • 306 observations
  • 48 variables
  • Data types other than character and numeric
  • Missing labels on variables
  • Missing label for data set
  • Order of varibles not following specification file
  • Formats missing

To create a fully compliant v5 xpt ADSL dataset, that was developed using R, we will need to apply the 6 main functions within the xportr package:

# Loading packages
library(dplyr)
library(labelled)
library(xportr)
library(admiral)

# Loading in our example data
adsl <- admiral::admiral_adsl



NOTE: Dataset can be created by using this command admiral::use_ad_template("adsl").

Preparing your Specification Files


In order to make use of the functions within xportr you will need to create an R data frame that contains your specification file. You will most likely need to do some pre-processing of your spec sheets after loading in the spec files for them to work appropriately with the xportr functions. Please see our example spec sheets in system.file(paste0("specs/", "ADaM_admiral_spec.xlsx"), package = "xportr") to see how xportr expects the specification sheets.


var_spec <- readxl::read_xlsx(
  system.file(paste0("specs/", "ADaM_admiral_spec.xlsx"), package = "xportr"), sheet = "Variables") %>%
  dplyr::rename(type = "Data Type") %>%
  rlang::set_names(tolower) 
  


Below is a quick snapshot of the specification file pertaining to the ADSL data set, which we will make use of in the 6 xportr function calls below. Take note of the order, label, type, length and format columns.



xportr_type()


In order to be compliant with transport v5 specifications an xpt file can only have two data types: character and numeric/dbl. Currently the ADSL data set has chr, dbl, time, factor and date.

look_for(adsl, details = TRUE)
   pos variable label                      col_type missing unique_values
   1   STUDYID  Study Identifier           chr      0       1            
   2   USUBJID  Unique Subject Identifier  chr      0       306          
   3   SUBJID   Subject Identifier for th~ chr      0       306          
   4   RFSTDTC  Subject Reference Start D~ chr      52      206          
   5   RFENDTC  Subject Reference End Dat~ chr      52      212          
   6   RFXSTDTC Date/Time of First Study ~ chr      52      206          
   7   RFXENDTC Date/Time of Last Study T~ chr      54      212          
   8   RFICDTC  Date/Time of Informed Con~ chr      306     1            
   9   RFPENDTC Date/Time of End of Parti~ chr      0       296          
   10  DTHDTC   Date/Time of Death         chr      303     4            
   11  DTHFL    Subject Death Flag         chr      303     2            
   12  SITEID   Study Site Identifier      chr      0       17           
   13  AGE      Age                        dbl      0       37           
   14  AGEU     Age Units                  chr      0       1            
   15  SEX      Sex                        chr      0       2            
   16  RACE     Race                       chr      0       4            
   17  ETHNIC   Ethnicity                  chr      0       2            
   18  ARMCD    Planned Arm Code           chr      0       4            
   19  ARM      Description of Planned Arm chr      0       4            
   20  ACTARMCD Actual Arm Code            chr      0       4            
   21  ACTARM   Description of Actual Arm  chr      0       4            
   22  COUNTRY  Country                    chr      0       1            
   23  DMDTC    Date/Time of Collection    chr      0       237          
   24  DMDY     Study Day of Collection    dbl      52      27           
   25  TRT01P   Description of Planned Arm chr      0       4            
   26  TRT01A   Description of Actual Arm  chr      0       4            
   27  TRTSDTM  —                          dttm     52      206          
   28  TRTSTMF  —                          chr      52      2            
   29  TRTEDTM  —                          dttm     54      212          
   30  TRTETMF  —                          chr      54      2            
   31  TRTSDT   —                          date     52      206          
   32  TRTEDT   —                          date     54      212          
   33  TRTDURD  —                          dbl      54      117          
   34  SCRFDT   —                          date     254     49           
   35  EOSDT    —                          date     52      212          
   36  EOSSTT   —                          chr      52      3            
   37  FRVDT    —                          date     270     36           
   38  RANDDT   —                          date     52      206          
   39  DTHDT    —                          date     303     4            
   40  DTHADY   —                          dbl      303     4            
   41  LDDTHELD —                          dbl      303     4            
   42  LSTALVDT —                          date     52      213          
   43  AGEGR1   —                          fct      0       2            
                                                                         
                                                                         
   44  SAFFL    —                          chr      52      2            
   45  RACEGR1  —                          chr      0       2            
   46  REGION1  —                          chr      0       1            
   47  LDDTHGR1 —                          chr      303     2            
   48  DTH30FL  —                          chr      303     2            
   49  DTHA30FL —                          chr      306     1            
   50  DTHB30FL —                          chr      305     2            
   values                     na_values na_range
   range: CDISCPILOT01 - CDI~                   
   range: 01-701-1015 - 01-7~                   
   range: 1001 - 1448                           
   range: 2012-07-09 - 2014-~                   
   range: 2012-09-01 - 2015-~                   
   range: 2012-07-09 - 2014-~                   
   range: 2012-08-28 - 2015-~                   
   range:                                       
   range: 2012-08-13 - 2015-~                   
   range: 2013-01-14 - 2014-~                   
   range: Y - Y                                 
   range: 701 - 718                             
   range: 50 - 89                               
   range: YEARS - YEARS                         
   range: F - M                                 
   range: AMERICAN INDIAN OR~                   
   range: HISPANIC OR LATINO~                   
   range: Pbo - Xan_Lo                          
   range: Placebo - Xanomeli~                   
   range: Pbo - Xan_Lo                          
   range: Placebo - Xanomeli~                   
   range: USA - USA                             
   range: 2012-07-06 - 2014-~                   
   range: -37 - -2                              
   range: Placebo - Xanomeli~                   
   range: Placebo - Xanomeli~                   
   range: 2012-07-09 - 2014-~                   
   range: H - H                                 
   range: 2012-08-28 23:59:5~                   
   range: H - H                                 
   range: 2012-07-09 - 2014-~                   
   range: 2012-08-28 - 2015-~                   
   range: 1 - 212                               
   range: 2012-08-13 - 2014-~                   
   range: 2012-09-01 - 2015-~                   
   range: COMPLETED - DISCON~                   
   range: 2013-02-18 - 2014-~                   
   range: 2012-07-09 - 2014-~                   
   range: 2013-01-14 - 2014-~                   
   range: 12 - 175                              
   range: 0 - 2                                 
   range: 2012-09-01 - 2015-~                   
   <18                                          
   18-64                                        
   >=65                                         
   range: Y - Y                                 
   range: Non-white - White                     
   range: NA - NA                               
   range: <= 30 - <= 30                         
   range: Y - Y                                 
   range:                                       
   range: Y - Y


Using xport_type and the supplied specification file, we can coerce the variables in the ADSL set to be either numeric or character.


adsl_type <- xportr_type(adsl, var_spec, domain = "ADSL", verbose = "message") 


Now all appropriate types have been applied to the dataset as seen below.

look_for(adsl_type, details = TRUE)
   pos variable label col_type missing unique_values
   1   STUDYID  —     dbl      306     1            
   2   USUBJID  —     dbl      306     1            
   3   SUBJID   —     dbl      0       306          
   4   RFSTDTC  —     dbl      306     1            
   5   RFENDTC  —     dbl      306     1            
   6   RFXSTDTC —     dbl      306     1            
   7   RFXENDTC —     dbl      306     1            
   8   RFICDTC  —     dbl      306     1            
   9   RFPENDTC —     dbl      306     1            
   10  DTHDTC   —     dbl      306     1            
   11  DTHFL    —     dbl      306     1            
   12  SITEID   —     dbl      0       17           
   13  AGE      —     dbl      0       37           
   14  AGEU     —     dbl      306     1            
   15  SEX      —     dbl      306     1            
   16  RACE     —     dbl      306     1            
   17  ETHNIC   —     dbl      306     1            
   18  ARMCD    —     dbl      306     1            
   19  ARM      —     dbl      306     1            
   20  ACTARMCD —     dbl      306     1            
   21  ACTARM   —     dbl      306     1            
   22  COUNTRY  —     dbl      306     1            
   23  DMDTC    —     dbl      306     1            
   24  DMDY     —     dbl      52      27           
   25  TRT01P   —     dbl      306     1            
   26  TRT01A   —     dbl      306     1            
   27  TRTSDTM  —     dbl      52      206          
   28  TRTSTMF  —     chr      52      2            
   29  TRTEDTM  —     dbl      54      212          
   30  TRTETMF  —     chr      54      2            
   31  TRTSDT   —     dbl      52      206          
   32  TRTEDT   —     dbl      54      212          
   33  TRTDURD  —     dbl      54      117          
   34  SCRFDT   —     dbl      254     49           
   35  EOSDT    —     dbl      52      212          
   36  EOSSTT   —     dbl      306     1            
   37  FRVDT    —     dbl      270     36           
   38  RANDDT   —     chr      52      206          
   39  DTHDT    —     dbl      303     4            
   40  DTHADY   —     dbl      303     4            
   41  LDDTHELD —     dbl      303     4            
   42  LSTALVDT —     dbl      52      213          
   43  AGEGR1   —     dbl      0       2            
   44  SAFFL    —     dbl      306     1            
   45  RACEGR1  —     dbl      306     1            
   46  REGION1  —     dbl      306     1            
   47  LDDTHGR1 —     dbl      306     1            
   48  DTH30FL  —     dbl      306     1            
   49  DTHA30FL —     dbl      306     1            
   50  DTHB30FL —     dbl      306     1            
   values                         na_values na_range
   range:                                           
   range:                                           
   range: 1001 - 1448                               
   range:                                           
   range:                                           
   range:                                           
   range:                                           
   range:                                           
   range:                                           
   range:                                           
   range:                                           
   range: 701 - 718                                 
   range: 50 - 89                                   
   range:                                           
   range:                                           
   range:                                           
   range:                                           
   range:                                           
   range:                                           
   range:                                           
   range:                                           
   range:                                           
   range:                                           
   range: -37 - -2                                  
   range:                                           
   range:                                           
   range: 1341792000 - 1409616000                   
   range: H - H                                     
   range: 1346198399 - 1425599999                   
   range: H - H                                     
   range: 15530 - 16315                             
   range: 15580 - 16499                             
   range: 1 - 212                                   
   range: 15565 - 16181                             
   range: 15584 - 16499                             
   range:                                           
   range: 15754 - 16389                             
   range: 2012-07-09 - 2014-09-02                   
   range: 15719 - 16375                             
   range: 12 - 175                                  
   range: 0 - 2                                     
   range: 15584 - 16499                             
   range: 2 - 3                                     
   range:                                           
   range:                                           
   range:                                           
   range:                                           
   range:                                           
   range:                                           
   range:

xportr_length()


Next we can apply the lengths from a variable level specification file to the data frame. xportr_length will identify variables that are missing from your specification file. The function will also alert you to how many lengths have been applied successfully. Before we apply the lengths lets verify that no lengths have been applied to the original dataframe.


str(adsl)
  tibble [306 × 50] (S3: tbl_df/tbl/data.frame)
   $ STUDYID : chr [1:306] "CDISCPILOT01" "CDISCPILOT01" "CDISCPILOT01" "CDISCPILOT01" ...
    ..- attr(*, "label")= chr "Study Identifier"
   $ USUBJID : chr [1:306] "01-701-1015" "01-701-1023" "01-701-1028" "01-701-1033" ...
    ..- attr(*, "label")= chr "Unique Subject Identifier"
   $ SUBJID  : chr [1:306] "1015" "1023" "1028" "1033" ...
    ..- attr(*, "label")= chr "Subject Identifier for the Study"
   $ RFSTDTC : chr [1:306] "2014-01-02" "2012-08-05" "2013-07-19" "2014-03-18" ...
    ..- attr(*, "label")= chr "Subject Reference Start Date/Time"
   $ RFENDTC : chr [1:306] "2014-07-02" "2012-09-02" "2014-01-14" "2014-04-14" ...
    ..- attr(*, "label")= chr "Subject Reference End Date/Time"
   $ RFXSTDTC: chr [1:306] "2014-01-02" "2012-08-05" "2013-07-19" "2014-03-18" ...
    ..- attr(*, "label")= chr "Date/Time of First Study Treatment"
   $ RFXENDTC: chr [1:306] "2014-07-02" "2012-09-01" "2014-01-14" "2014-03-31" ...
    ..- attr(*, "label")= chr "Date/Time of Last Study Treatment"
   $ RFICDTC : chr [1:306] NA NA NA NA ...
    ..- attr(*, "label")= chr "Date/Time of Informed Consent"
   $ RFPENDTC: chr [1:306] "2014-07-02T11:45" "2013-02-18" "2014-01-14T11:10" "2014-09-15" ...
    ..- attr(*, "label")= chr "Date/Time of End of Participation"
   $ DTHDTC  : chr [1:306] NA NA NA NA ...
    ..- attr(*, "label")= chr "Date/Time of Death"
   $ DTHFL   : chr [1:306] NA NA NA NA ...
    ..- attr(*, "label")= chr "Subject Death Flag"
   $ SITEID  : chr [1:306] "701" "701" "701" "701" ...
    ..- attr(*, "label")= chr "Study Site Identifier"
   $ AGE     : num [1:306] 63 64 71 74 77 85 59 68 81 84 ...
    ..- attr(*, "label")= chr "Age"
   $ AGEU    : chr [1:306] "YEARS" "YEARS" "YEARS" "YEARS" ...
    ..- attr(*, "label")= chr "Age Units"
   $ SEX     : chr [1:306] "F" "M" "M" "M" ...
    ..- attr(*, "label")= chr "Sex"
   $ RACE    : chr [1:306] "WHITE" "WHITE" "WHITE" "WHITE" ...
    ..- attr(*, "label")= chr "Race"
   $ ETHNIC  : chr [1:306] "HISPANIC OR LATINO" "HISPANIC OR LATINO" "NOT HISPANIC OR LATINO" "NOT HISPANIC OR LATINO" ...
    ..- attr(*, "label")= chr "Ethnicity"
   $ ARMCD   : chr [1:306] "Pbo" "Pbo" "Xan_Hi" "Xan_Lo" ...
    ..- attr(*, "label")= chr "Planned Arm Code"
   $ ARM     : chr [1:306] "Placebo" "Placebo" "Xanomeline High Dose" "Xanomeline Low Dose" ...
    ..- attr(*, "label")= chr "Description of Planned Arm"
   $ ACTARMCD: chr [1:306] "Pbo" "Pbo" "Xan_Hi" "Xan_Lo" ...
    ..- attr(*, "label")= chr "Actual Arm Code"
   $ ACTARM  : chr [1:306] "Placebo" "Placebo" "Xanomeline High Dose" "Xanomeline Low Dose" ...
    ..- attr(*, "label")= chr "Description of Actual Arm"
   $ COUNTRY : chr [1:306] "USA" "USA" "USA" "USA" ...
    ..- attr(*, "label")= chr "Country"
   $ DMDTC   : chr [1:306] "2013-12-26" "2012-07-22" "2013-07-11" "2014-03-10" ...
    ..- attr(*, "label")= chr "Date/Time of Collection"
   $ DMDY    : num [1:306] -7 -14 -8 -8 -7 -21 NA -9 -13 -7 ...
    ..- attr(*, "label")= chr "Study Day of Collection"
   $ TRT01P  : chr [1:306] "Placebo" "Placebo" "Xanomeline High Dose" "Xanomeline Low Dose" ...
    ..- attr(*, "label")= chr "Description of Planned Arm"
   $ TRT01A  : chr [1:306] "Placebo" "Placebo" "Xanomeline High Dose" "Xanomeline Low Dose" ...
    ..- attr(*, "label")= chr "Description of Actual Arm"
   $ TRTSDTM : POSIXct[1:306], format: "2014-01-02" "2012-08-05" ...
   $ TRTSTMF : chr [1:306] "H" "H" "H" "H" ...
   $ TRTEDTM : POSIXct[1:306], format: "2014-07-02 23:59:59" "2012-09-01 23:59:59" ...
   $ TRTETMF : chr [1:306] "H" "H" "H" "H" ...
   $ TRTSDT  : Date[1:306], format: "2014-01-02" "2012-08-05" ...
   $ TRTEDT  : Date[1:306], format: "2014-07-02" "2012-09-01" ...
   $ TRTDURD : num [1:306] 182 28 180 14 183 26 NA 190 10 55 ...
   $ SCRFDT  : Date[1:306], format: NA NA ...
   $ EOSDT   : Date[1:306], format: "2014-07-02" "2012-09-02" ...
   $ EOSSTT  : chr [1:306] "COMPLETED" "DISCONTINUED" "COMPLETED" "DISCONTINUED" ...
   $ FRVDT   : Date[1:306], format: NA "2013-02-18" ...
   $ RANDDT  : Date[1:306], format: "2014-01-02" "2012-08-05" ...
   $ DTHDT   : Date[1:306], format: NA NA ...
   $ DTHADY  : num [1:306] NA NA NA NA NA NA NA NA NA NA ...
   $ LDDTHELD: num [1:306] NA NA NA NA NA NA NA NA NA NA ...
   $ LSTALVDT: Date[1:306], format: "2014-07-02" "2012-09-02" ...
   $ AGEGR1  : Factor w/ 3 levels "<18","18-64",..: 2 2 3 3 3 3 2 3 3 3 ...
   $ SAFFL   : chr [1:306] "Y" "Y" "Y" "Y" ...
   $ RACEGR1 : chr [1:306] "White" "White" "White" "White" ...
   $ REGION1 : chr [1:306] "NA" "NA" "NA" "NA" ...
   $ LDDTHGR1: chr [1:306] NA NA NA NA ...
   $ DTH30FL : chr [1:306] NA NA NA NA ...
   $ DTHA30FL: chr [1:306] NA NA NA NA ...
   $ DTHB30FL: chr [1:306] NA NA NA NA ...


No lengths have been applied to the variables as seen in the printout - the lengths would be in the attr part of each variables. Let’s now use xportr_length to apply our lengths from the specification file.

adsl_length <- adsl %>% xportr_length(var_spec, domain = "ADSL", "message")
  
  ── 
[1m
[1mVariable lengths missing from metadata.
[1m
[22m ──
  
  
[32m✔
[39m 3 lengths resolved
  Variable(s) present in dataframe but doesn't exist in `metadata`.
  
[31m✖
[39m Problem with `TRTSTMF`, `TRTETMF`, and `RANDDT`


str(adsl_length)
  tibble [306 × 50] (S3: tbl_df/tbl/data.frame)
   $ STUDYID : chr [1:306] "CDISCPILOT01" "CDISCPILOT01" "CDISCPILOT01" "CDISCPILOT01" ...
    ..- attr(*, "label")= chr "Study Identifier"
    ..- attr(*, "width")= num 21
   $ USUBJID : chr [1:306] "01-701-1015" "01-701-1023" "01-701-1028" "01-701-1033" ...
    ..- attr(*, "label")= chr "Unique Subject Identifier"
    ..- attr(*, "width")= num 30
   $ SUBJID  : chr [1:306] "1015" "1023" "1028" "1033" ...
    ..- attr(*, "label")= chr "Subject Identifier for the Study"
    ..- attr(*, "width")= num 8
   $ RFSTDTC : chr [1:306] "2014-01-02" "2012-08-05" "2013-07-19" "2014-03-18" ...
    ..- attr(*, "label")= chr "Subject Reference Start Date/Time"
    ..- attr(*, "width")= num 19
   $ RFENDTC : chr [1:306] "2014-07-02" "2012-09-02" "2014-01-14" "2014-04-14" ...
    ..- attr(*, "label")= chr "Subject Reference End Date/Time"
    ..- attr(*, "width")= num 19
   $ RFXSTDTC: chr [1:306] "2014-01-02" "2012-08-05" "2013-07-19" "2014-03-18" ...
    ..- attr(*, "label")= chr "Date/Time of First Study Treatment"
    ..- attr(*, "width")= num 19
   $ RFXENDTC: chr [1:306] "2014-07-02" "2012-09-01" "2014-01-14" "2014-03-31" ...
    ..- attr(*, "label")= chr "Date/Time of Last Study Treatment"
    ..- attr(*, "width")= num 19
   $ RFICDTC : chr [1:306] NA NA NA NA ...
    ..- attr(*, "label")= chr "Date/Time of Informed Consent"
    ..- attr(*, "width")= num 19
   $ RFPENDTC: chr [1:306] "2014-07-02T11:45" "2013-02-18" "2014-01-14T11:10" "2014-09-15" ...
    ..- attr(*, "label")= chr "Date/Time of End of Participation"
    ..- attr(*, "width")= num 19
   $ DTHDTC  : chr [1:306] NA NA NA NA ...
    ..- attr(*, "label")= chr "Date/Time of Death"
    ..- attr(*, "width")= num 19
   $ DTHFL   : chr [1:306] NA NA NA NA ...
    ..- attr(*, "label")= chr "Subject Death Flag"
    ..- attr(*, "width")= num 2
   $ SITEID  : chr [1:306] "701" "701" "701" "701" ...
    ..- attr(*, "label")= chr "Study Site Identifier"
    ..- attr(*, "width")= num 5
   $ AGE     : num [1:306] 63 64 71 74 77 85 59 68 81 84 ...
    ..- attr(*, "label")= chr "Age"
    ..- attr(*, "width")= num 8
   $ AGEU    : chr [1:306] "YEARS" "YEARS" "YEARS" "YEARS" ...
    ..- attr(*, "label")= chr "Age Units"
    ..- attr(*, "width")= num 10
   $ SEX     : chr [1:306] "F" "M" "M" "M" ...
    ..- attr(*, "label")= chr "Sex"
    ..- attr(*, "width")= num 1
   $ RACE    : chr [1:306] "WHITE" "WHITE" "WHITE" "WHITE" ...
    ..- attr(*, "label")= chr "Race"
    ..- attr(*, "width")= num 60
   $ ETHNIC  : chr [1:306] "HISPANIC OR LATINO" "HISPANIC OR LATINO" "NOT HISPANIC OR LATINO" "NOT HISPANIC OR LATINO" ...
    ..- attr(*, "label")= chr "Ethnicity"
    ..- attr(*, "width")= num 100
   $ ARMCD   : chr [1:306] "Pbo" "Pbo" "Xan_Hi" "Xan_Lo" ...
    ..- attr(*, "label")= chr "Planned Arm Code"
    ..- attr(*, "width")= num 20
   $ ARM     : chr [1:306] "Placebo" "Placebo" "Xanomeline High Dose" "Xanomeline Low Dose" ...
    ..- attr(*, "label")= chr "Description of Planned Arm"
    ..- attr(*, "width")= num 200
   $ ACTARMCD: chr [1:306] "Pbo" "Pbo" "Xan_Hi" "Xan_Lo" ...
    ..- attr(*, "label")= chr "Actual Arm Code"
    ..- attr(*, "width")= num 20
   $ ACTARM  : chr [1:306] "Placebo" "Placebo" "Xanomeline High Dose" "Xanomeline Low Dose" ...
    ..- attr(*, "label")= chr "Description of Actual Arm"
    ..- attr(*, "width")= num 200
   $ COUNTRY : chr [1:306] "USA" "USA" "USA" "USA" ...
    ..- attr(*, "label")= chr "Country"
    ..- attr(*, "width")= num 3
   $ DMDTC   : chr [1:306] "2013-12-26" "2012-07-22" "2013-07-11" "2014-03-10" ...
    ..- attr(*, "label")= chr "Date/Time of Collection"
    ..- attr(*, "width")= num 19
   $ DMDY    : num [1:306] -7 -14 -8 -8 -7 -21 NA -9 -13 -7 ...
    ..- attr(*, "label")= chr "Study Day of Collection"
    ..- attr(*, "width")= num 8
   $ TRT01P  : chr [1:306] "Placebo" "Placebo" "Xanomeline High Dose" "Xanomeline Low Dose" ...
    ..- attr(*, "label")= chr "Description of Planned Arm"
    ..- attr(*, "width")= num 40
   $ TRT01A  : chr [1:306] "Placebo" "Placebo" "Xanomeline High Dose" "Xanomeline Low Dose" ...
    ..- attr(*, "label")= chr "Description of Actual Arm"
    ..- attr(*, "width")= num 40
   $ TRTSDTM : POSIXct[1:306], format: "2014-01-02" "2012-08-05" ...
   $ TRTSTMF : chr [1:306] "H" "H" "H" "H" ...
    ..- attr(*, "width")= num 200
   $ TRTEDTM : POSIXct[1:306], format: "2014-07-02 23:59:59" "2012-09-01 23:59:59" ...
   $ TRTETMF : chr [1:306] "H" "H" "H" "H" ...
    ..- attr(*, "width")= num 200
   $ TRTSDT  : Date[1:306], format: "2014-01-02" "2012-08-05" ...
   $ TRTEDT  : Date[1:306], format: "2014-07-02" "2012-09-01" ...
   $ TRTDURD : num [1:306] 182 28 180 14 183 26 NA 190 10 55 ...
    ..- attr(*, "width")= num 8
   $ SCRFDT  : Date[1:306], format: NA NA ...
   $ EOSDT   : Date[1:306], format: "2014-07-02" "2012-09-02" ...
   $ EOSSTT  : chr [1:306] "COMPLETED" "DISCONTINUED" "COMPLETED" "DISCONTINUED" ...
    ..- attr(*, "width")= num 200
   $ FRVDT   : Date[1:306], format: NA "2013-02-18" ...
   $ RANDDT  : Date[1:306], format: "2014-01-02" "2012-08-05" ...
   $ DTHDT   : Date[1:306], format: NA NA ...
   $ DTHADY  : num [1:306] NA NA NA NA NA NA NA NA NA NA ...
    ..- attr(*, "width")= num 8
   $ LDDTHELD: num [1:306] NA NA NA NA NA NA NA NA NA NA ...
    ..- attr(*, "width")= num 8
   $ LSTALVDT: Date[1:306], format: "2014-07-02" "2012-09-02" ...
   $ AGEGR1  : Factor w/ 3 levels "<18","18-64",..: 2 2 3 3 3 3 2 3 3 3 ...
    ..- attr(*, "width")= num 20
   $ SAFFL   : chr [1:306] "Y" "Y" "Y" "Y" ...
    ..- attr(*, "width")= num 2
   $ RACEGR1 : chr [1:306] "White" "White" "White" "White" ...
    ..- attr(*, "width")= num 200
   $ REGION1 : chr [1:306] "NA" "NA" "NA" "NA" ...
    ..- attr(*, "width")= num 80
   $ LDDTHGR1: chr [1:306] NA NA NA NA ...
    ..- attr(*, "width")= num 200
   $ DTH30FL : chr [1:306] NA NA NA NA ...
    ..- attr(*, "width")= num 200
   $ DTHA30FL: chr [1:306] NA NA NA NA ...
    ..- attr(*, "width")= num 200
   $ DTHB30FL: chr [1:306] NA NA NA NA ...
    ..- attr(*, "width")= num 200
   - attr(*, "_xportr.df_arg_")= chr "ADSL"

Note the additional attr(*, "width")= after each variable with the width. These have been directly applied from the specification file that we loaded above!

xportr_order()

Please note that the order of the ADSL variables, see above, does not match specification file order column. We can quickly remedy this with a call to xportr_order(). Note that the variable SITEID has been moved as well as many others to match the specification file order column.

adsl_order <- xportr_order(adsl,var_spec, domain = "ADSL", verbose = "message") 

xportr_format()

Now we apply formats to the dataset. These will typically be DATE9., DATETIME20 or TIME5, but many others can be used. Notice that 8 Date/Time variables are missing a format in our ADSL dataset. Here we just take a peak at a few TRT variables, which have a NULL format.

attr(adsl$TRTSDT, "format.sas")
  NULL
attr(adsl$TRTEDT, "format.sas")
  NULL
attr(adsl$TRTSDTM, "format.sas")
  NULL
attr(adsl$TRTEDTM, "format.sas")
  NULL

Using our xportr_format() we apply our formats.

adsl_fmt <- adsl %>% xportr_format(var_spec, domain = "ADSL", "message")
attr(adsl_fmt$TRTSDT, "format.sas")
  [1] "DATE9."
attr(adsl_fmt$TRTEDT, "format.sas")
  [1] "DATE9."
attr(adsl_fmt$TRTSDTM, "format.sas")
  [1] "DATETIME20."
attr(adsl_fmt$TRTEDTM, "format.sas")
  [1] "DATETIME20."

xportr_label()


Please observe that our ADSL dataset is missing many variable labels. Sometimes these labels can be lost while using R’s function. However, A CDISC compliant data set needs to have each variable with a variable label.

look_for(adsl, details = FALSE)
   pos variable label                             
    1  STUDYID  Study Identifier                  
    2  USUBJID  Unique Subject Identifier         
    3  SUBJID   Subject Identifier for the Study  
    4  RFSTDTC  Subject Reference Start Date/Time 
    5  RFENDTC  Subject Reference End Date/Time   
    6  RFXSTDTC Date/Time of First Study Treatment
    7  RFXENDTC Date/Time of Last Study Treatment 
    8  RFICDTC  Date/Time of Informed Consent     
    9  RFPENDTC Date/Time of End of Participation 
   10  DTHDTC   Date/Time of Death                
   11  DTHFL    Subject Death Flag                
   12  SITEID   Study Site Identifier             
   13  AGE      Age                               
   14  AGEU     Age Units                         
   15  SEX      Sex                               
   16  RACE     Race                              
   17  ETHNIC   Ethnicity                         
   18  ARMCD    Planned Arm Code                  
   19  ARM      Description of Planned Arm        
   20  ACTARMCD Actual Arm Code                   
   21  ACTARM   Description of Actual Arm         
   22  COUNTRY  Country                           
   23  DMDTC    Date/Time of Collection           
   24  DMDY     Study Day of Collection           
   25  TRT01P   Description of Planned Arm        
   26  TRT01A   Description of Actual Arm         
   27  TRTSDTM  —                                 
   28  TRTSTMF  —                                 
   29  TRTEDTM  —                                 
   30  TRTETMF  —                                 
   31  TRTSDT   —                                 
   32  TRTEDT   —                                 
   33  TRTDURD  —                                 
   34  SCRFDT   —                                 
   35  EOSDT    —                                 
   36  EOSSTT   —                                 
   37  FRVDT    —                                 
   38  RANDDT   —                                 
   39  DTHDT    —                                 
   40  DTHADY   —                                 
   41  LDDTHELD —                                 
   42  LSTALVDT —                                 
   43  AGEGR1   —                                 
   44  SAFFL    —                                 
   45  RACEGR1  —                                 
   46  REGION1  —                                 
   47  LDDTHGR1 —                                 
   48  DTH30FL  —                                 
   49  DTHA30FL —                                 
   50  DTHB30FL —


Using the xport_label function we can take the specifications file and label all the variables available. xportr_label will produce a warning message if you the variable in the data set is not in the specification file.


adsl_update <- adsl %>% xportr_label(var_spec, domain = "ADSL", "message")
  ── 
[1m
[1mVariable labels missing from metadata.
[1m
[22m ──
  
  
[32m✔
[39m 3 labels skipped
  Variable(s) present in dataframe but doesn't exist in `metadata`.
  
[31m✖
[39m Problem with `TRTSTMF`, `TRTETMF`, and `RANDDT`
look_for(adsl_update, details = FALSE)
   pos variable label                                  
    1  STUDYID  Study Identifier                       
    2  USUBJID  Unique Subject Identifier              
    3  SUBJID   Subject Identifier for the Study       
    4  RFSTDTC  Subject Reference Start Date/Time      
    5  RFENDTC  Subject Reference End Date/Time        
    6  RFXSTDTC Date/Time of First Study Treatment     
    7  RFXENDTC Date/Time of Last Study Treatment      
    8  RFICDTC  Date/Time of Informed Consent          
    9  RFPENDTC Date/Time of End of Participation      
   10  DTHDTC   Date / Time of Death                   
   11  DTHFL    Subject Death Flag                     
   12  SITEID   Study Site Identifier                  
   13  AGE      Age                                    
   14  AGEU     Age Units                              
   15  SEX      Sex                                    
   16  RACE     Race                                   
   17  ETHNIC   Ethnicity                              
   18  ARMCD    Planned Arm Code                       
   19  ARM      Description of Planned Arm             
   20  ACTARMCD Actual Arm Code                        
   21  ACTARM   Description of Actual Arm              
   22  COUNTRY  Country                                
   23  DMDTC    Date/Time of Collection                
   24  DMDY     Study Day of Collection                
   25  TRT01P   Planned Treatment for Period 01        
   26  TRT01A   Actual Treatment for Period 01         
   27  TRTSDTM  Datetime of First Exposure to Treatment
   28  TRTSTMF                                         
   29  TRTEDTM  Datetime of Last Exposure to Treatment 
   30  TRTETMF                                         
   31  TRTSDT   Date of First Exposure to Treatment    
   32  TRTEDT   Date of Last Exposure to Treatment     
   33  TRTDURD  Total Duration of Trt  (days)          
   34  SCRFDT   Screen Failure Date                    
   35  EOSDT    End of Study Date                      
   36  EOSSTT   End of Study Status                    
   37  FRVDT    Final Retrievel Visit Date             
   38  RANDDT                                          
   39  DTHDT    Death Date                             
   40  DTHADY   Relative Day of Death                  
   41  LDDTHELD Elapsed Days from Last Dose to Death   
   42  LSTALVDT Date Last Known Alive                  
   43  AGEGR1   Pooled Age Group 1                     
   44  SAFFL    Safety Population Flag                 
   45  RACEGR1  Pooled Race Group 1                    
   46  REGION1  Geographic Region 1                    
   47  LDDTHGR1 Last Does to Death Group               
   48  DTH30FL  Under 30  Group                        
   49  DTHA30FL Over 30  Group                         
   50  DTHB30FL Over 30 plus 30 days Group

xportr_write()


Finally, we arrive at exporting the R data frame object as a xpt file with the function xportr_write(). The xpt file will be written directly to your current working directory. To make it more interesting, we have put together all six functions with the magrittr pipe, %>%. A user can now apply types, length, variable labels, formats, data set label and write out their final xpt file in one pipe! Appropriate warnings and messages will be supplied to a user to the console for any potential issues before sending off to standard clinical data set validator application or data reviewers.

adsl %>%
  xportr_type(var_spec, "ADSL", "message") %>%
  xportr_length(var_spec, "ADSL", "message") %>%
  xportr_label(var_spec, "ADSL", "message") %>%
  xportr_order(var_spec, "ADSL", "message") %>% 
  xportr_format(var_spec, "ADSL", "message") %>% 
  xportr_write("adsl.xpt", label = "Subject-Level Analysis Dataset")

That’s it! We now have a xpt file created in R with all appropriate types, lengths, labels, ordering and formats from our specification file.

As always, we welcome your feedback. If you spot a bug, would like to see a new feature, or if any documentation is unclear - submit an issue on xportr’s Github page.