vignettes/tf_from_file.Rmd
tf_from_file.Rmd
We recommend that before reviewing this vignette, you review the basics of pharmaRTF and the more advanced features.
In this vignette, we will take you through some final features of pharmaRTF to help tie together how this package can centralize titles and footnotes within your organization, and show you the tools you need to embed this into your own internal process.
Being able to specify everything directly in a program is useful. For adhoc tables or to quickly put together an output, keeping everything central and in one spot is nice. But in a mature process, companies typically have a strategy to centralize the assignment of titles and footnotes. Keeping all of your titles and footnotes in one place has many advantages. You can global replace changes that impact numerous outputs. You can keep separate repetitive text instead of repeating it for each table. Updates take place in one file instead of having to open each individual impacted program. In short, centralizing this process is highly desirable within an organization.
pharmaRTF has the tools you need to make this process simple, and this comes from the function titles_and_footnotes_from_df()
. This function allows you to import your titles and footnotes from a data frame. That data frame can either come directly from your R session, or you can build a custom reader function to import the data from an external file. So let’s take a look.
Some of the syntax we use throughout this document uses the %>%
pipe operator. This is to display support for a tidyverse style of coding. You can find the %>%
operator in the magittr package.
The idea behind titles_and_footnotes_from_df()
is simply that there is a set of expectations of what the data looks like that flows into the function. What that data looks like within an external file is irrelevant, as long as it can be transformed to meet the needs of titles_and_footnotes_from_df()
on the way in. This is possible because of the reader
parameter. The reader
parameter is just some function that returns a data frame, and that data frame must be compliant with the required variables and types expected by titles_and_footnotes_from_df()
. Those variables, types, and requirements are as follows.
Variable | Column.Types | Required |
---|---|---|
type | character | Yes |
text1 | character | Yes |
text2 | character | No |
align | character | No |
bold | logical | No |
italic | logical | No |
font | character | No |
font_size | numeric | No |
index | numeric | No |
These variables and types should all look familiar because they’re exactly what’s required for the object hf_line
. In fact, titles_and_footnotes_from_df()
simply takes this data and applies the add_titles()
and add_footnotes()
function to the rtf_doc
object to attach titles and footnotes.
type
and text1
are the only required values and this is because titles_and_foonotes_from_df()
needs to know where to attach the text, and needs to have text to attach. For variables that aren’t required, they may be omitted from the data returned by the reader function and the default values from the hf_line
object will be used.
The entire concept behind titles_and_footnotes_from_df()
is allowing the freedom of creating your own process. By allowing the user to pass in a custom reader function to the reader
parameter, the source of given titles and footnotes can be anything the user wants. You could read from flat text, CSV, Excel, Access, any remote file that R can process and return a data frame meeting the above requirements is on the table.
With this in mind, let’s consider a simple example. Assume that I have the excel file titles.xlsx
. This file looks as follows:
table_number | index | type | text1 | text2 | align | bold | italic |
---|---|---|---|---|---|---|---|
14-2.01 | 1 | title | Protocol: CDISCPILOT01 | PAGE_FORMAT: Page %s of %s | split | TRUE | TRUE |
14-2.01 | 2 | title | Population: Intent-to-Treat | NA | left | TRUE | TRUE |
14-2.01 | 3 | title | Table 14-2.01 | NA | center | TRUE | TRUE |
14-2.01 | 4 | title | Summary of Demographic and Baseline Characteristics | NA | center | TRUE | TRUE |
14-2.01 | 1 | footnote | [1] P-values are results of ANOVA treatment group comparison for continuous variable and Pearson’s chisquare test for categorical variables. | NA | left | FALSE | TRUE |
14-2.01 | 2 | footnote | NOTE: Duration of disease is computed as months between date of enrollment and date of onset of the first definite symptoms of Alzheimer’s disease. | NA | left | FALSE | TRUE |
14-2.01 | 3 | footnote | FILE_PATH: Source: %s | DATE_FORMAT: %H:%M %A, %B %d, %Y | split | FALSE | TRUE |
14-4.01 | 1 | title | Protocol: CDISCPILOT01 | PAGE_FORMAT: Page %s of %s | split | TRUE | TRUE |
14-4.01 | 2 | title | Population: Safety | NA | left | TRUE | TRUE |
14-4.01 | 3 | title | Table 14-4.01 | NA | center | TRUE | TRUE |
14-4.01 | 4 | title | Summary of Planned Exposure to Study Drug, as of End of Study | NA | center | TRUE | TRUE |
14-4.01 | 1 | footnote | [1] Includes completers and early terminators. | NA | left | FALSE | TRUE |
14-4.01 | 2 | footnote | [2] End of Study refers to week 26/Early Termination. | NA | left | FALSE | TRUE |
14-4.01 | 3 | footnote | FILE_PATH: Source: %s | DATE_FORMAT: %H:%M %A, %B %d, %Y | split | FALSE | TRUE |
(For more information about the special text fields you see here, see our advanced usage vignette)
Here I have everything I need to generate my titles. I just need to make sure of two things:
So overall this will be a pretty simple function. Let’s create it.
example_custom_reader <- function(..., table_number=NULL) {
# If a column isn't populated then the type may be guessed wrong so force it
col_types <- c('text', 'numeric', 'text', 'text', 'text', 'text', 'logical', 'logical')
# pass through arguments from ...
df <- readxl::read_excel(..., col_types=col_types)
# Subset and return that dataframe
df[df$table_number==table_number, !names(df) == 'table_number']
}
Within a few steps I have everything I need:
readxl::read_excel()
to import the data from the excel filetable_number
column and just keep the variables that need to be sent to titles_and_footnotes_from_df()
Note that here I use the ...
parameter. This is because
from.file
parameter is passed forward into the reader function...
option on titles_and_foonotes_from_df()
is also passed forward, allowing you to pass whatever arguments necessary from the titles_and_footnotes_from_df()
call into your custom reader.With this function now available, let’s see what the function returns.
example_custom_reader('titles.xlsx', table_number = '14-2.01')
#> # A tibble: 7 × 7
#> index type text1 text2 align bold italic
#> <dbl> <chr> <chr> <chr> <chr> <lgl> <lgl>
#> 1 1 title Protocol: CDISCPILOT01 PAGE_FORMAT: … split TRUE TRUE
#> 2 2 title Population: Intent-to-Treat <NA> left TRUE TRUE
#> 3 3 title Table 14-2.01 <NA> cent… TRUE TRUE
#> 4 4 title Summary of Demographic and B… <NA> cent… TRUE TRUE
#> 5 1 footnote [1] P-values are results of … <NA> left FALSE TRUE
#> 6 2 footnote NOTE: Duration of disease is… <NA> left FALSE TRUE
#> 7 3 footnote FILE_PATH: Source: %s DATE_FORMAT: … split FALSE TRUE
Here can see the data frame that the function processes and returns. In this example, the data processing was very simple - but in a more advanced case this function could read from multiple sheets, merge in default titles and footnotes, read from different data sources - whatever suits your process.
Now with an understanding of how a custom reader function works, let’s tie it all together. Let’s assume that our table has been programmed, and that the huxtable table is ready in our session (see the huxtable tips vignette to for tips on how to prepare your table). The huxtable table is our session looks like this:
Completers at Week 24 | Safety Population [1] | ||||||
Placebo\line (N=60) | Xanomeline\line Low Dose\line (N=28) | Xanomeline\line High Dose\line (N=30) | Placebo\line (N=86) | Xanomeline\line Low Dose\line (N=84) | Xanomeline\line High Dose\line (N=84) | ||
Average daily dose (mg) | n | 60 | 28 | 30 | 86 | 84 | 84 |
Mean | 0.0 | 54.0 | 77.0 | 0.0 | 54.0 | 71.6 | |
SD | 0.00 | 0.00 | 0.58 | 0.00 | 0.00 | 8.11 | |
Median | 0.0 | 54.0 | 76.9 | 0.0 | 54.0 | 75.1 | |
Min | 0.0 | 54.0 | 76.1 | 0.0 | 54.0 | 54.0 | |
Max | 0.0 | 54.0 | 78.6 | 0.0 | 54.0 | 78.6 | |
Cumulative dose at end of study [2] | n | 60 | 28 | 30 | 86 | 84 | 84 |
Mean | 0.0 | 9918.6 | 14089.5 | 0.0 | 5347.3 | 7551.0 | |
SD | 0.00 | 603.84 | 481.01 | 0.00 | 3680.35 | 5531.04 | |
Median | 0.0 | 9936.0 | 14080.5 | 0.0 | 4455.0 | 5778.0 | |
Min | 0.0 | 7884.0 | 12960.0 | 0.0 | 108.0 | 54.0 | |
Max | 0.0 | 11448.0 | 15417.0 | 0.0 | 11448.0 | 15417.0 | |
Note: The data here comes from the CDISC Pilot, which you can read about here. Also, see our replication of the tables from this CDISC pilot in our Github repository.
Here’s how you would go about creating your rtf_doc
object. (For more information about the functions being used here, see our advanced usage vignette)
doc <- rtf_doc(ht, header_rows = 2) %>% titles_and_footnotes_from_df(
from.file='../data/titles.xlsx',
reader=example_custom_reader,
table_number='14-4.01') %>%
set_column_header_buffer(top=1) %>%
set_font_size(10)
write_rtf(doc, file="table16.rtf")
Within the titles_and_footnotes_from_df()
function you can see the parameters used:
from.file
is the location of our titles/footnotesreader
is the function we created within this vignettetable_number
is the number of the table we are creatingIn the function call, from.file
and table_number
are passed forward into the custom reader. From there, validation of the incoming data frame happens, any missing variables have their defaults applied, and then add_titles()
and add_footnotes()
are called on the titles and footnotes within the data respectively. You can see in the resulting table that the titles and footnotes have been applied
And there you have it. By using this process, the addition of titles and footnotes to your rtf_doc
greatly simplifies, and this code becomes very reusable from program to program, allowing you to focus on the nuance of your table creation rather than the redundant application of creating titles and footnotes. As mentioned, it’s necessary to be able to manually specify your titles and footnotes as needed - but automating the process of applying them is much more desirable when your studies reach a certain scale.
If you have not already reviewed them, be sure to check out our other vignettes
vignette("pharmaRTF")
vignette("huxtable_tips")
vignette("advanced_usage")