The demo will make use of a small ADSL
data set that is apart of the {admiral}
package. The script that generates this ADSL
dataset can be created by using this command admiral::use_ad_template("adsl")
.
The ADSL
has the following features:
To create a fully compliant v5 xpt ADSL
dataset, that was developed using R, we will need to apply the 6 main functions within the xportr
package:
# Loading packages
library(dplyr)
library(labelled)
library(xportr)
library(admiral)
# Loading in our example data
adsl <- admiral::admiral_adsl
NOTE: Dataset can be created by using this command admiral::use_ad_template("adsl")
.
In order to make use of the functions within xportr
you will need to create an R data frame that contains your specification file. You will most likely need to do some pre-processing of your spec sheets after loading in the spec files for them to work appropriately with the xportr
functions. Please see our example spec sheets in system.file(paste0("specs/", "ADaM_admiral_spec.xlsx"), package = "xportr")
to see how xportr
expects the specification sheets.
var_spec <- readxl::read_xlsx(
system.file(paste0("specs/", "ADaM_admiral_spec.xlsx"), package = "xportr"), sheet = "Variables") %>%
dplyr::rename(type = "Data Type") %>%
rlang::set_names(tolower)
Below is a quick snapshot of the specification file pertaining to the ADSL
data set, which we will make use of in the 6 xportr
function calls below. Take note of the order, label, type, length and format columns.
In order to be compliant with transport v5 specifications an xpt
file can only have two data types: character and numeric/dbl. Currently the ADSL
data set has chr, dbl, time, factor and date.
look_for(adsl, details = TRUE)
pos variable label col_type missing unique_values
1 STUDYID Study Identifier chr 0 1
2 USUBJID Unique Subject Identifier chr 0 306
3 SUBJID Subject Identifier for th~ chr 0 306
4 RFSTDTC Subject Reference Start D~ chr 52 206
5 RFENDTC Subject Reference End Dat~ chr 52 212
6 RFXSTDTC Date/Time of First Study ~ chr 52 206
7 RFXENDTC Date/Time of Last Study T~ chr 54 212
8 RFICDTC Date/Time of Informed Con~ chr 306 1
9 RFPENDTC Date/Time of End of Parti~ chr 0 296
10 DTHDTC Date/Time of Death chr 303 4
11 DTHFL Subject Death Flag chr 303 2
12 SITEID Study Site Identifier chr 0 17
13 AGE Age dbl 0 37
14 AGEU Age Units chr 0 1
15 SEX Sex chr 0 2
16 RACE Race chr 0 4
17 ETHNIC Ethnicity chr 0 2
18 ARMCD Planned Arm Code chr 0 4
19 ARM Description of Planned Arm chr 0 4
20 ACTARMCD Actual Arm Code chr 0 4
21 ACTARM Description of Actual Arm chr 0 4
22 COUNTRY Country chr 0 1
23 DMDTC Date/Time of Collection chr 0 237
24 DMDY Study Day of Collection dbl 52 27
25 TRT01P Description of Planned Arm chr 0 4
26 TRT01A Description of Actual Arm chr 0 4
27 TRTSDTM — dttm 52 206
28 TRTSTMF — chr 52 2
29 TRTEDTM — dttm 54 212
30 TRTETMF — chr 54 2
31 TRTSDT — date 52 206
32 TRTEDT — date 54 212
33 TRTDURD — dbl 54 117
34 SCRFDT — date 254 49
35 EOSDT — date 52 212
36 EOSSTT — chr 52 3
37 FRVDT — date 270 36
38 RANDDT — date 52 206
39 DTHDT — date 303 4
40 DTHADY — dbl 303 4
41 LDDTHELD — dbl 303 4
42 LSTALVDT — date 52 213
43 AGEGR1 — fct 0 2
44 SAFFL — chr 52 2
45 RACEGR1 — chr 0 2
46 REGION1 — chr 0 1
47 LDDTHGR1 — chr 303 2
48 DTH30FL — chr 303 2
49 DTHA30FL — chr 306 1
50 DTHB30FL — chr 305 2
values na_values na_range
range: CDISCPILOT01 - CDI~
range: 01-701-1015 - 01-7~
range: 1001 - 1448
range: 2012-07-09 - 2014-~
range: 2012-09-01 - 2015-~
range: 2012-07-09 - 2014-~
range: 2012-08-28 - 2015-~
range:
range: 2012-08-13 - 2015-~
range: 2013-01-14 - 2014-~
range: Y - Y
range: 701 - 718
range: 50 - 89
range: YEARS - YEARS
range: F - M
range: AMERICAN INDIAN OR~
range: HISPANIC OR LATINO~
range: Pbo - Xan_Lo
range: Placebo - Xanomeli~
range: Pbo - Xan_Lo
range: Placebo - Xanomeli~
range: USA - USA
range: 2012-07-06 - 2014-~
range: -37 - -2
range: Placebo - Xanomeli~
range: Placebo - Xanomeli~
range: 2012-07-09 - 2014-~
range: H - H
range: 2012-08-28 23:59:5~
range: H - H
range: 2012-07-09 - 2014-~
range: 2012-08-28 - 2015-~
range: 1 - 212
range: 2012-08-13 - 2014-~
range: 2012-09-01 - 2015-~
range: COMPLETED - DISCON~
range: 2013-02-18 - 2014-~
range: 2012-07-09 - 2014-~
range: 2013-01-14 - 2014-~
range: 12 - 175
range: 0 - 2
range: 2012-09-01 - 2015-~
<18
18-64
>=65
range: Y - Y
range: Non-white - White
range: NA - NA
range: <= 30 - <= 30
range: Y - Y
range:
range: Y - Y
Using xport_type
and the supplied specification file, we can coerce the variables in the ADSL
set to be either numeric or character.
adsl_type <- xportr_type(adsl, var_spec, domain = "ADSL", verbose = "message")
Now all appropriate types have been applied to the dataset as seen below.
look_for(adsl_type, details = TRUE)
pos variable label col_type missing unique_values
1 STUDYID — dbl 306 1
2 USUBJID — dbl 306 1
3 SUBJID — dbl 0 306
4 RFSTDTC — dbl 306 1
5 RFENDTC — dbl 306 1
6 RFXSTDTC — dbl 306 1
7 RFXENDTC — dbl 306 1
8 RFICDTC — dbl 306 1
9 RFPENDTC — dbl 306 1
10 DTHDTC — dbl 306 1
11 DTHFL — dbl 306 1
12 SITEID — dbl 0 17
13 AGE — dbl 0 37
14 AGEU — dbl 306 1
15 SEX — dbl 306 1
16 RACE — dbl 306 1
17 ETHNIC — dbl 306 1
18 ARMCD — dbl 306 1
19 ARM — dbl 306 1
20 ACTARMCD — dbl 306 1
21 ACTARM — dbl 306 1
22 COUNTRY — dbl 306 1
23 DMDTC — dbl 306 1
24 DMDY — dbl 52 27
25 TRT01P — dbl 306 1
26 TRT01A — dbl 306 1
27 TRTSDTM — dbl 52 206
28 TRTSTMF — chr 52 2
29 TRTEDTM — dbl 54 212
30 TRTETMF — chr 54 2
31 TRTSDT — dbl 52 206
32 TRTEDT — dbl 54 212
33 TRTDURD — dbl 54 117
34 SCRFDT — dbl 254 49
35 EOSDT — dbl 52 212
36 EOSSTT — dbl 306 1
37 FRVDT — dbl 270 36
38 RANDDT — chr 52 206
39 DTHDT — dbl 303 4
40 DTHADY — dbl 303 4
41 LDDTHELD — dbl 303 4
42 LSTALVDT — dbl 52 213
43 AGEGR1 — dbl 0 2
44 SAFFL — dbl 306 1
45 RACEGR1 — dbl 306 1
46 REGION1 — dbl 306 1
47 LDDTHGR1 — dbl 306 1
48 DTH30FL — dbl 306 1
49 DTHA30FL — dbl 306 1
50 DTHB30FL — dbl 306 1
values na_values na_range
range:
range:
range: 1001 - 1448
range:
range:
range:
range:
range:
range:
range:
range:
range: 701 - 718
range: 50 - 89
range:
range:
range:
range:
range:
range:
range:
range:
range:
range:
range: -37 - -2
range:
range:
range: 1341792000 - 1409616000
range: H - H
range: 1346198399 - 1425599999
range: H - H
range: 15530 - 16315
range: 15580 - 16499
range: 1 - 212
range: 15565 - 16181
range: 15584 - 16499
range:
range: 15754 - 16389
range: 2012-07-09 - 2014-09-02
range: 15719 - 16375
range: 12 - 175
range: 0 - 2
range: 15584 - 16499
range: 2 - 3
range:
range:
range:
range:
range:
range:
range:
Next we can apply the lengths from a variable level specification file to the data frame. xportr_length
will identify variables that are missing from your specification file. The function will also alert you to how many lengths have been applied successfully. Before we apply the lengths lets verify that no lengths have been applied to the original dataframe.
str(adsl)
tibble [306 × 50] (S3: tbl_df/tbl/data.frame)
$ STUDYID : chr [1:306] "CDISCPILOT01" "CDISCPILOT01" "CDISCPILOT01" "CDISCPILOT01" ...
..- attr(*, "label")= chr "Study Identifier"
$ USUBJID : chr [1:306] "01-701-1015" "01-701-1023" "01-701-1028" "01-701-1033" ...
..- attr(*, "label")= chr "Unique Subject Identifier"
$ SUBJID : chr [1:306] "1015" "1023" "1028" "1033" ...
..- attr(*, "label")= chr "Subject Identifier for the Study"
$ RFSTDTC : chr [1:306] "2014-01-02" "2012-08-05" "2013-07-19" "2014-03-18" ...
..- attr(*, "label")= chr "Subject Reference Start Date/Time"
$ RFENDTC : chr [1:306] "2014-07-02" "2012-09-02" "2014-01-14" "2014-04-14" ...
..- attr(*, "label")= chr "Subject Reference End Date/Time"
$ RFXSTDTC: chr [1:306] "2014-01-02" "2012-08-05" "2013-07-19" "2014-03-18" ...
..- attr(*, "label")= chr "Date/Time of First Study Treatment"
$ RFXENDTC: chr [1:306] "2014-07-02" "2012-09-01" "2014-01-14" "2014-03-31" ...
..- attr(*, "label")= chr "Date/Time of Last Study Treatment"
$ RFICDTC : chr [1:306] NA NA NA NA ...
..- attr(*, "label")= chr "Date/Time of Informed Consent"
$ RFPENDTC: chr [1:306] "2014-07-02T11:45" "2013-02-18" "2014-01-14T11:10" "2014-09-15" ...
..- attr(*, "label")= chr "Date/Time of End of Participation"
$ DTHDTC : chr [1:306] NA NA NA NA ...
..- attr(*, "label")= chr "Date/Time of Death"
$ DTHFL : chr [1:306] NA NA NA NA ...
..- attr(*, "label")= chr "Subject Death Flag"
$ SITEID : chr [1:306] "701" "701" "701" "701" ...
..- attr(*, "label")= chr "Study Site Identifier"
$ AGE : num [1:306] 63 64 71 74 77 85 59 68 81 84 ...
..- attr(*, "label")= chr "Age"
$ AGEU : chr [1:306] "YEARS" "YEARS" "YEARS" "YEARS" ...
..- attr(*, "label")= chr "Age Units"
$ SEX : chr [1:306] "F" "M" "M" "M" ...
..- attr(*, "label")= chr "Sex"
$ RACE : chr [1:306] "WHITE" "WHITE" "WHITE" "WHITE" ...
..- attr(*, "label")= chr "Race"
$ ETHNIC : chr [1:306] "HISPANIC OR LATINO" "HISPANIC OR LATINO" "NOT HISPANIC OR LATINO" "NOT HISPANIC OR LATINO" ...
..- attr(*, "label")= chr "Ethnicity"
$ ARMCD : chr [1:306] "Pbo" "Pbo" "Xan_Hi" "Xan_Lo" ...
..- attr(*, "label")= chr "Planned Arm Code"
$ ARM : chr [1:306] "Placebo" "Placebo" "Xanomeline High Dose" "Xanomeline Low Dose" ...
..- attr(*, "label")= chr "Description of Planned Arm"
$ ACTARMCD: chr [1:306] "Pbo" "Pbo" "Xan_Hi" "Xan_Lo" ...
..- attr(*, "label")= chr "Actual Arm Code"
$ ACTARM : chr [1:306] "Placebo" "Placebo" "Xanomeline High Dose" "Xanomeline Low Dose" ...
..- attr(*, "label")= chr "Description of Actual Arm"
$ COUNTRY : chr [1:306] "USA" "USA" "USA" "USA" ...
..- attr(*, "label")= chr "Country"
$ DMDTC : chr [1:306] "2013-12-26" "2012-07-22" "2013-07-11" "2014-03-10" ...
..- attr(*, "label")= chr "Date/Time of Collection"
$ DMDY : num [1:306] -7 -14 -8 -8 -7 -21 NA -9 -13 -7 ...
..- attr(*, "label")= chr "Study Day of Collection"
$ TRT01P : chr [1:306] "Placebo" "Placebo" "Xanomeline High Dose" "Xanomeline Low Dose" ...
..- attr(*, "label")= chr "Description of Planned Arm"
$ TRT01A : chr [1:306] "Placebo" "Placebo" "Xanomeline High Dose" "Xanomeline Low Dose" ...
..- attr(*, "label")= chr "Description of Actual Arm"
$ TRTSDTM : POSIXct[1:306], format: "2014-01-02" "2012-08-05" ...
$ TRTSTMF : chr [1:306] "H" "H" "H" "H" ...
$ TRTEDTM : POSIXct[1:306], format: "2014-07-02 23:59:59" "2012-09-01 23:59:59" ...
$ TRTETMF : chr [1:306] "H" "H" "H" "H" ...
$ TRTSDT : Date[1:306], format: "2014-01-02" "2012-08-05" ...
$ TRTEDT : Date[1:306], format: "2014-07-02" "2012-09-01" ...
$ TRTDURD : num [1:306] 182 28 180 14 183 26 NA 190 10 55 ...
$ SCRFDT : Date[1:306], format: NA NA ...
$ EOSDT : Date[1:306], format: "2014-07-02" "2012-09-02" ...
$ EOSSTT : chr [1:306] "COMPLETED" "DISCONTINUED" "COMPLETED" "DISCONTINUED" ...
$ FRVDT : Date[1:306], format: NA "2013-02-18" ...
$ RANDDT : Date[1:306], format: "2014-01-02" "2012-08-05" ...
$ DTHDT : Date[1:306], format: NA NA ...
$ DTHADY : num [1:306] NA NA NA NA NA NA NA NA NA NA ...
$ LDDTHELD: num [1:306] NA NA NA NA NA NA NA NA NA NA ...
$ LSTALVDT: Date[1:306], format: "2014-07-02" "2012-09-02" ...
$ AGEGR1 : Factor w/ 3 levels "<18","18-64",..: 2 2 3 3 3 3 2 3 3 3 ...
$ SAFFL : chr [1:306] "Y" "Y" "Y" "Y" ...
$ RACEGR1 : chr [1:306] "White" "White" "White" "White" ...
$ REGION1 : chr [1:306] "NA" "NA" "NA" "NA" ...
$ LDDTHGR1: chr [1:306] NA NA NA NA ...
$ DTH30FL : chr [1:306] NA NA NA NA ...
$ DTHA30FL: chr [1:306] NA NA NA NA ...
$ DTHB30FL: chr [1:306] NA NA NA NA ...
No lengths have been applied to the variables as seen in the printout - the lengths would be in the attr
part of each variables. Let’s now use xportr_length
to apply our lengths from the specification file.
adsl_length <- adsl %>% xportr_length(var_spec, domain = "ADSL", "message")
──
[1m
[1mVariable lengths missing from metadata.
[1m
[22m ──
[32m✔
[39m 3 lengths resolved
Variable(s) present in dataframe but doesn't exist in `metadata`.
[31m✖
[39m Problem with `TRTSTMF`, `TRTETMF`, and `RANDDT`
str(adsl_length)
tibble [306 × 50] (S3: tbl_df/tbl/data.frame)
$ STUDYID : chr [1:306] "CDISCPILOT01" "CDISCPILOT01" "CDISCPILOT01" "CDISCPILOT01" ...
..- attr(*, "label")= chr "Study Identifier"
..- attr(*, "width")= num 21
$ USUBJID : chr [1:306] "01-701-1015" "01-701-1023" "01-701-1028" "01-701-1033" ...
..- attr(*, "label")= chr "Unique Subject Identifier"
..- attr(*, "width")= num 30
$ SUBJID : chr [1:306] "1015" "1023" "1028" "1033" ...
..- attr(*, "label")= chr "Subject Identifier for the Study"
..- attr(*, "width")= num 8
$ RFSTDTC : chr [1:306] "2014-01-02" "2012-08-05" "2013-07-19" "2014-03-18" ...
..- attr(*, "label")= chr "Subject Reference Start Date/Time"
..- attr(*, "width")= num 19
$ RFENDTC : chr [1:306] "2014-07-02" "2012-09-02" "2014-01-14" "2014-04-14" ...
..- attr(*, "label")= chr "Subject Reference End Date/Time"
..- attr(*, "width")= num 19
$ RFXSTDTC: chr [1:306] "2014-01-02" "2012-08-05" "2013-07-19" "2014-03-18" ...
..- attr(*, "label")= chr "Date/Time of First Study Treatment"
..- attr(*, "width")= num 19
$ RFXENDTC: chr [1:306] "2014-07-02" "2012-09-01" "2014-01-14" "2014-03-31" ...
..- attr(*, "label")= chr "Date/Time of Last Study Treatment"
..- attr(*, "width")= num 19
$ RFICDTC : chr [1:306] NA NA NA NA ...
..- attr(*, "label")= chr "Date/Time of Informed Consent"
..- attr(*, "width")= num 19
$ RFPENDTC: chr [1:306] "2014-07-02T11:45" "2013-02-18" "2014-01-14T11:10" "2014-09-15" ...
..- attr(*, "label")= chr "Date/Time of End of Participation"
..- attr(*, "width")= num 19
$ DTHDTC : chr [1:306] NA NA NA NA ...
..- attr(*, "label")= chr "Date/Time of Death"
..- attr(*, "width")= num 19
$ DTHFL : chr [1:306] NA NA NA NA ...
..- attr(*, "label")= chr "Subject Death Flag"
..- attr(*, "width")= num 2
$ SITEID : chr [1:306] "701" "701" "701" "701" ...
..- attr(*, "label")= chr "Study Site Identifier"
..- attr(*, "width")= num 5
$ AGE : num [1:306] 63 64 71 74 77 85 59 68 81 84 ...
..- attr(*, "label")= chr "Age"
..- attr(*, "width")= num 8
$ AGEU : chr [1:306] "YEARS" "YEARS" "YEARS" "YEARS" ...
..- attr(*, "label")= chr "Age Units"
..- attr(*, "width")= num 10
$ SEX : chr [1:306] "F" "M" "M" "M" ...
..- attr(*, "label")= chr "Sex"
..- attr(*, "width")= num 1
$ RACE : chr [1:306] "WHITE" "WHITE" "WHITE" "WHITE" ...
..- attr(*, "label")= chr "Race"
..- attr(*, "width")= num 60
$ ETHNIC : chr [1:306] "HISPANIC OR LATINO" "HISPANIC OR LATINO" "NOT HISPANIC OR LATINO" "NOT HISPANIC OR LATINO" ...
..- attr(*, "label")= chr "Ethnicity"
..- attr(*, "width")= num 100
$ ARMCD : chr [1:306] "Pbo" "Pbo" "Xan_Hi" "Xan_Lo" ...
..- attr(*, "label")= chr "Planned Arm Code"
..- attr(*, "width")= num 20
$ ARM : chr [1:306] "Placebo" "Placebo" "Xanomeline High Dose" "Xanomeline Low Dose" ...
..- attr(*, "label")= chr "Description of Planned Arm"
..- attr(*, "width")= num 200
$ ACTARMCD: chr [1:306] "Pbo" "Pbo" "Xan_Hi" "Xan_Lo" ...
..- attr(*, "label")= chr "Actual Arm Code"
..- attr(*, "width")= num 20
$ ACTARM : chr [1:306] "Placebo" "Placebo" "Xanomeline High Dose" "Xanomeline Low Dose" ...
..- attr(*, "label")= chr "Description of Actual Arm"
..- attr(*, "width")= num 200
$ COUNTRY : chr [1:306] "USA" "USA" "USA" "USA" ...
..- attr(*, "label")= chr "Country"
..- attr(*, "width")= num 3
$ DMDTC : chr [1:306] "2013-12-26" "2012-07-22" "2013-07-11" "2014-03-10" ...
..- attr(*, "label")= chr "Date/Time of Collection"
..- attr(*, "width")= num 19
$ DMDY : num [1:306] -7 -14 -8 -8 -7 -21 NA -9 -13 -7 ...
..- attr(*, "label")= chr "Study Day of Collection"
..- attr(*, "width")= num 8
$ TRT01P : chr [1:306] "Placebo" "Placebo" "Xanomeline High Dose" "Xanomeline Low Dose" ...
..- attr(*, "label")= chr "Description of Planned Arm"
..- attr(*, "width")= num 40
$ TRT01A : chr [1:306] "Placebo" "Placebo" "Xanomeline High Dose" "Xanomeline Low Dose" ...
..- attr(*, "label")= chr "Description of Actual Arm"
..- attr(*, "width")= num 40
$ TRTSDTM : POSIXct[1:306], format: "2014-01-02" "2012-08-05" ...
$ TRTSTMF : chr [1:306] "H" "H" "H" "H" ...
..- attr(*, "width")= num 200
$ TRTEDTM : POSIXct[1:306], format: "2014-07-02 23:59:59" "2012-09-01 23:59:59" ...
$ TRTETMF : chr [1:306] "H" "H" "H" "H" ...
..- attr(*, "width")= num 200
$ TRTSDT : Date[1:306], format: "2014-01-02" "2012-08-05" ...
$ TRTEDT : Date[1:306], format: "2014-07-02" "2012-09-01" ...
$ TRTDURD : num [1:306] 182 28 180 14 183 26 NA 190 10 55 ...
..- attr(*, "width")= num 8
$ SCRFDT : Date[1:306], format: NA NA ...
$ EOSDT : Date[1:306], format: "2014-07-02" "2012-09-02" ...
$ EOSSTT : chr [1:306] "COMPLETED" "DISCONTINUED" "COMPLETED" "DISCONTINUED" ...
..- attr(*, "width")= num 200
$ FRVDT : Date[1:306], format: NA "2013-02-18" ...
$ RANDDT : Date[1:306], format: "2014-01-02" "2012-08-05" ...
$ DTHDT : Date[1:306], format: NA NA ...
$ DTHADY : num [1:306] NA NA NA NA NA NA NA NA NA NA ...
..- attr(*, "width")= num 8
$ LDDTHELD: num [1:306] NA NA NA NA NA NA NA NA NA NA ...
..- attr(*, "width")= num 8
$ LSTALVDT: Date[1:306], format: "2014-07-02" "2012-09-02" ...
$ AGEGR1 : Factor w/ 3 levels "<18","18-64",..: 2 2 3 3 3 3 2 3 3 3 ...
..- attr(*, "width")= num 20
$ SAFFL : chr [1:306] "Y" "Y" "Y" "Y" ...
..- attr(*, "width")= num 2
$ RACEGR1 : chr [1:306] "White" "White" "White" "White" ...
..- attr(*, "width")= num 200
$ REGION1 : chr [1:306] "NA" "NA" "NA" "NA" ...
..- attr(*, "width")= num 80
$ LDDTHGR1: chr [1:306] NA NA NA NA ...
..- attr(*, "width")= num 200
$ DTH30FL : chr [1:306] NA NA NA NA ...
..- attr(*, "width")= num 200
$ DTHA30FL: chr [1:306] NA NA NA NA ...
..- attr(*, "width")= num 200
$ DTHB30FL: chr [1:306] NA NA NA NA ...
..- attr(*, "width")= num 200
- attr(*, "_xportr.df_arg_")= chr "ADSL"
Note the additional attr(*, "width")=
after each variable with the width. These have been directly applied from the specification file that we loaded above!
Please note that the order of the ADSL
variables, see above, does not match specification file order column. We can quickly remedy this with a call to xportr_order()
. Note that the variable SITEID
has been moved as well as many others to match the specification file order column.
adsl_order <- xportr_order(adsl,var_spec, domain = "ADSL", verbose = "message")
Now we apply formats to the dataset. These will typically be DATE9.
, DATETIME20
or TIME5
, but many others can be used. Notice that 8 Date/Time variables are missing a format in our ADSL
dataset. Here we just take a peak at a few TRT
variables, which have a NULL
format.
attr(adsl$TRTSDT, "format.sas")
NULL
attr(adsl$TRTEDT, "format.sas")
NULL
attr(adsl$TRTSDTM, "format.sas")
NULL
attr(adsl$TRTEDTM, "format.sas")
NULL
Using our xportr_format()
we apply our formats.
adsl_fmt <- adsl %>% xportr_format(var_spec, domain = "ADSL", "message")
Please observe that our ADSL
dataset is missing many variable labels. Sometimes these labels can be lost while using R’s function. However, A CDISC compliant data set needs to have each variable with a variable label.
look_for(adsl, details = FALSE)
pos variable label
1 STUDYID Study Identifier
2 USUBJID Unique Subject Identifier
3 SUBJID Subject Identifier for the Study
4 RFSTDTC Subject Reference Start Date/Time
5 RFENDTC Subject Reference End Date/Time
6 RFXSTDTC Date/Time of First Study Treatment
7 RFXENDTC Date/Time of Last Study Treatment
8 RFICDTC Date/Time of Informed Consent
9 RFPENDTC Date/Time of End of Participation
10 DTHDTC Date/Time of Death
11 DTHFL Subject Death Flag
12 SITEID Study Site Identifier
13 AGE Age
14 AGEU Age Units
15 SEX Sex
16 RACE Race
17 ETHNIC Ethnicity
18 ARMCD Planned Arm Code
19 ARM Description of Planned Arm
20 ACTARMCD Actual Arm Code
21 ACTARM Description of Actual Arm
22 COUNTRY Country
23 DMDTC Date/Time of Collection
24 DMDY Study Day of Collection
25 TRT01P Description of Planned Arm
26 TRT01A Description of Actual Arm
27 TRTSDTM —
28 TRTSTMF —
29 TRTEDTM —
30 TRTETMF —
31 TRTSDT —
32 TRTEDT —
33 TRTDURD —
34 SCRFDT —
35 EOSDT —
36 EOSSTT —
37 FRVDT —
38 RANDDT —
39 DTHDT —
40 DTHADY —
41 LDDTHELD —
42 LSTALVDT —
43 AGEGR1 —
44 SAFFL —
45 RACEGR1 —
46 REGION1 —
47 LDDTHGR1 —
48 DTH30FL —
49 DTHA30FL —
50 DTHB30FL —
Using the xport_label
function we can take the specifications file and label all the variables available. xportr_label
will produce a warning message if you the variable in the data set is not in the specification file.
adsl_update <- adsl %>% xportr_label(var_spec, domain = "ADSL", "message")
──
[1m
[1mVariable labels missing from metadata.
[1m
[22m ──
[32m✔
[39m 3 labels skipped
Variable(s) present in dataframe but doesn't exist in `metadata`.
[31m✖
[39m Problem with `TRTSTMF`, `TRTETMF`, and `RANDDT`
look_for(adsl_update, details = FALSE)
pos variable label
1 STUDYID Study Identifier
2 USUBJID Unique Subject Identifier
3 SUBJID Subject Identifier for the Study
4 RFSTDTC Subject Reference Start Date/Time
5 RFENDTC Subject Reference End Date/Time
6 RFXSTDTC Date/Time of First Study Treatment
7 RFXENDTC Date/Time of Last Study Treatment
8 RFICDTC Date/Time of Informed Consent
9 RFPENDTC Date/Time of End of Participation
10 DTHDTC Date / Time of Death
11 DTHFL Subject Death Flag
12 SITEID Study Site Identifier
13 AGE Age
14 AGEU Age Units
15 SEX Sex
16 RACE Race
17 ETHNIC Ethnicity
18 ARMCD Planned Arm Code
19 ARM Description of Planned Arm
20 ACTARMCD Actual Arm Code
21 ACTARM Description of Actual Arm
22 COUNTRY Country
23 DMDTC Date/Time of Collection
24 DMDY Study Day of Collection
25 TRT01P Planned Treatment for Period 01
26 TRT01A Actual Treatment for Period 01
27 TRTSDTM Datetime of First Exposure to Treatment
28 TRTSTMF
29 TRTEDTM Datetime of Last Exposure to Treatment
30 TRTETMF
31 TRTSDT Date of First Exposure to Treatment
32 TRTEDT Date of Last Exposure to Treatment
33 TRTDURD Total Duration of Trt (days)
34 SCRFDT Screen Failure Date
35 EOSDT End of Study Date
36 EOSSTT End of Study Status
37 FRVDT Final Retrievel Visit Date
38 RANDDT
39 DTHDT Death Date
40 DTHADY Relative Day of Death
41 LDDTHELD Elapsed Days from Last Dose to Death
42 LSTALVDT Date Last Known Alive
43 AGEGR1 Pooled Age Group 1
44 SAFFL Safety Population Flag
45 RACEGR1 Pooled Race Group 1
46 REGION1 Geographic Region 1
47 LDDTHGR1 Last Does to Death Group
48 DTH30FL Under 30 Group
49 DTHA30FL Over 30 Group
50 DTHB30FL Over 30 plus 30 days Group
Finally, we arrive at exporting the R data frame object as a xpt file with the function xportr_write()
. The xpt file will be written directly to your current working directory. To make it more interesting, we have put together all six functions with the magrittr pipe, %>%
. A user can now apply types, length, variable labels, formats, data set label and write out their final xpt file in one pipe! Appropriate warnings and messages will be supplied to a user to the console for any potential issues before sending off to standard clinical data set validator application or data reviewers.
adsl %>%
xportr_type(var_spec, "ADSL", "message") %>%
xportr_length(var_spec, "ADSL", "message") %>%
xportr_label(var_spec, "ADSL", "message") %>%
xportr_order(var_spec, "ADSL", "message") %>%
xportr_format(var_spec, "ADSL", "message") %>%
xportr_write("adsl.xpt", label = "Subject-Level Analysis Dataset")
That’s it! We now have a xpt file created in R with all appropriate types, lengths, labels, ordering and formats from our specification file.
As always, we welcome your feedback. If you spot a bug, would like to see a new feature, or if any documentation is unclear - submit an issue on xportr’s Github page.