CTRU R Functions
This repository contains R functions to facilitate work at the Sheffield Clinical Trials Research Unit (CTRU), part of the School of Health and Related Research (ScHARR) at The University of Sheffield. The intention is to share code between colleagues so that common repetitive tasks become trivial and we do not spend time solving the same problems.
Readers may also find the following slides useful. They are written by the author of this package and contain a host of examples and links to additional resoures on using R in a reproducible workflow...
- RepRoducibility Slides written by the author of this package on using R to work in a reprodcible manner.
- Reporting in the 21st Century slides for a short presentation to the Medical Statistics Group on working in a reproducible manner.
Installation and Usage
If all you want to do is use these functions then its pretty straight-forward to install them thanks to the devtools
package. Install it from CRAN and then install this repository from GitHub...
install.packages('devtools')
devtools::install_github('ns-ctru/ctru')
## And of course load the library
library(ctru)
You can now use the functions read_prospect()
, fields_prospect()
and so forth.
Shiny Application(s)
The package now includes a Shiny application (i.e. interactive Web page) that allows the calculation of sample sizes using a number of different R packages. A helper function is included so that once you have installed and loaded the library (as describved above) you can start the application using...
ctru_shiny()
Read more about the included Shiny applications below.
Collaborating
To collaborate in this work you will need to install Git on your computer and have a GitHub account. If you're not familiar with either of these you may find the tutorial Conversational Git a useful place to start. The GitHub help pages are also excellent.
Once you've got a GitHub account you need to fork the ns-ctru/ctru
repository, clone your fork to your computer to work on it, make changes/addition and push them back to your fork then make a make pull requests.
SSH Keys
I would advocate using SSH Keys with your GitHub account to make it easy to push updates without having to enter your password every single time.
Functions
read_prospect()
- Function to facilitate reading and labelling of data exported from as plain text files from the CTRU 'bespoke' database Prospect.
- Uses the exported
Lookups.csv
to convert all factor variables to the correct encoding. - Unfortunately it can't recreate the relational nature of the data that exists within the database from which it has been exported :-/.
ToDo
- Add functionality to download 'Fields' and 'Froms' tabs from DM Googlesheets using either googlesheets.
- Have
event_name
converted to factor internally (may require inclusion ofevent_name
inLookups.csv
that is exported from Prospect).
recruitment()
- Function to summarise screening and recruitment(/enrolment) to studies.
- Produces tables and figures overall and by study site.
ToDo
- Generate plots by site (don't need to do tables, since they can be subsetted from the master)
- Possibly add option to summarise by treatment arm too.
table_summary()
- Function to summarise specified measurements (numerical/continuous and factor variables are handled) by the specified subset and time points.
- For numerical/continuous variables N/Mean/SD/Min/Max/Median/IQR reported for specified variables for the specified grouping.
- For factor variables that are reported numbers and proportions are reported for the specified variables.
ToDo
- Full support for Non-Standard Evaluation when explicitly supplying grouing variables as an argument rather than
...
.
plot_summary()
- Function to plot specified measurements by specified subset.
- Produces histograms by specified treatment groups for continuous variables.
- Produces bar charts by specified treatment groups for factor variables.
- Pooled plots are produced and optionally individual plots for each variable can be produced.
ToDo
- For factor variables need to group responses into surveys and
facet_grid()
them with rows for surveys and columns for the specified groups. - Extend factor summaries to be performed by specified events.
- Finish off plotting continuous variables by variable (rows) and event (columns).
idm_lsoa()
- Function to combine Lower Super Output Area (LSOA) level Index of Multiple Deprivation statistics with an arbitrary user specified data frame based on 2011 postcode. Provides the overall IMD score and each component as absolute numbers and deciles as well as the ranking of all scores and components across England.
ToDo
- Add in 2010 data.
- Add in data on LSOAs in Wales.
eq5d_score()
- Function for calculating EQ5D-5L (see slide 40 and 41 for scoring). Could possibly have it summarise and plot scores by user-specified variable (default being the event and the group)
ToDo
- Very much a work in progress, need to fully understand Non-Standard Evaluation to get the function working and fully flexible.
consort()
- Function for producing CONSORT flow-diagrams
ToDo
- Everything, most likely useing
diagram
package (further examples here). - This may not be that straight-forward to abstract in light of the way CTRU data is (un)structured as there is no single file that defines who was seen at what stage, all numbers need extracting from the available data. Kind of the thing that databases are geared towards really.
regress_ctru()
ToDo
- Include options to set the reference level (via
relevel()
)for each factor variable in a model (something akin to the waytexreg()
handles things). - Option (default) to exponentiate model coefficients and CIs when link function is
binomial
. - Include ability to bootstrap regression results, particularly important for mixed models where p-values are unreliable due to uncertainty in the degrees of freedom. Some leverage to do this via
texreg()
butstargazer()
is a more flexible tabulating option. - Include all results from ITT/PP models, coefficients and CIs, p-values as part fo the returned list which can then be parsed for inclusion in text.
Shiny Applications
Shiny applications are included in this packages (currently n = 1). A helper function (ctru_shiny()
) is included to start the different applications. It includes the option to specify the display.mode
which can be useful if you wish to look at the source code in the application (use the option display.mode = "showcase"
if so).
Sample Size Calculations
A WebUI to a number of R packages which will calculate sample sizes and/or power for the specified parameters. . To start it run...
ctru_shiny(example = 'sample_size')
Links
A few links to other resources that people might find useful...
- RepRoducibility Slides written by the author of this package on using R to work in a reprodcible manner.
- Reporting in the 21st Century slides for a short presentation to the Medical Statistics Group on working in a reproducible manner.