/ToxPlot-R-Package

Official Repo for the ToxPlot R package. Author: Jun Wang, Email: njekin@gmail.com

Primary LanguageR

Introduction

The toxplot (Toxicology Plot) package provides a convenient interface to batch process high-throughput toxicology bioassay screening data. It's designed specifically for screening study that features a primary inhibition (loss of signal) assay and a companion cytotoxicity assay. This package provides functions for data normalization, quality-control analysis, dose-response curve fitting, visualization, and a unqiue toxicity-adjusted potency ranking system.

This package was initially written to analyze NIS RAIU assay screening of ToxCast phase I chemical library. The results from this report have been published on the journal Enviornmental Science & Technology.

Citation link:[1]

Wang J., et al., 2018. High-Throughput Screening and Quantitative Chemical Ranking for Sodium Iodide Symporter (NIS) Inhibitors in ToxCast Phase I Chemical Library. Environmental Science & Technology. DOI: 10.1021/acs.est.7b06145

Refer to the analysis markdown for detailed usage of the package in this study.

Below you will find a brief demo usage of the package.

Installation

You can install the package in R from github using devtools.

devtools::install_github('njekin/ToxPlot-R-Package')
library(toxplot)

or install from CRAN.

install.packages("toxplot")
library(toxplot)

Format of input data

To allow the package process data correctly, it is essential to ensure the data input follows required format. Below are the essential columns to include in the input data. Since this package is designed to look at a primary inhibition assay and a parallel cell viability/cytotoxicity assay simultaneously, data from both assays should be put together in a single input file or dataframe.

Columns required:

  • assay: name of the assay.
  • apid: assay plate id. apid shoud be a unique id for each 96 well plate, can distinguish replicate, but doesn't distinguish primary and cytotox assay
  • pid: plate id. used to represent mother plate id, doesn't distinguish replicate, nor assay type.
  • spid: chemical sample name/ID
  • rowi: row position on 96 well plate
  • coli: column position on 96 well plate
  • rval: raw reading value for each well
  • repi: replicate id, (1 to 3)
  • conc: molar concentration (M, no the uM that ToxCast pipeline uses)
  • wllt: well type. define whether a well contains a control or a test sample

Welltype explained: t: test chemical/sample n: DMSO negative control pr: positive control of primary assay, for RAIU assay the chemical is NaClO4 nrc: negative chemical control (2,4-D, a chemical that is supposed to be negative in both raiu and celltiter-glo toxicity assay) pc: positive control for celltiter-glo cytotixicity assay (DCNQ)

Demo data included in the package

Load the demo dataset included in the package. Below is the head of the dataframe.

library(devtools)
library(tidyverse)
load_all()

knitr::kable(head(demo_mc), caption = "Head rows of demo data")
assay pid spid rowi coli conc wllt wllq rep rval apid
Cytotox Plate_11 DMSO 1 1 NA n 1 rep1 51931 Plate_11_rep1
Cytotox Plate_11 DMSO 2 12 NA n 1 rep1 48694 Plate_11_rep1
Cytotox Plate_11 DMSO 3 12 NA n 1 rep1 47870 Plate_11_rep1
Cytotox Plate_11 DMSO 4 12 NA n 1 rep1 47624 Plate_11_rep1
Cytotox Plate_11 DMSO 5 12 NA n 1 rep1 47383 Plate_11_rep1
Cytotox Plate_11 DMSO 6 12 NA n 1 rep1 46533 Plate_11_rep1

Data Analysis

Define basic assay info

Before analyzing the data, it is necessary to define the name of the primary and cytotoxicity assay. Note that the names defined here should exactly match what's provided in the column of the input dataframe.

  #define the names of the primary and toxicity assay.
assay_info <- list(
  prim_assay = "Primary",
  toxi_assay = "Cytotox"
)

Data Normalization

The normalize_per_plate function normalize raw readings as percent of the median/mean of negative control wells (DMSO in this case). The normalized values are included in nval_mean and nval_median column.

# normalization
demo_mc_norm <- normalize_per_plate(demo_mc, nctrl = "DMSO")
knitr::kable(head(demo_mc_norm))
assay pid spid rowi coli conc wllt wllq rep rval apid nval_mean nval_median
Cytotox Plate_11 DMSO 1 1 NA n 1 rep1 51931 Plate_11_rep1 107.66974 108.76285
Cytotox Plate_11 DMSO 2 12 NA n 1 rep1 48694 Plate_11_rep1 100.95839 101.98337
Cytotox Plate_11 DMSO 3 12 NA n 1 rep1 47870 Plate_11_rep1 99.24998 100.25761
Cytotox Plate_11 DMSO 4 12 NA n 1 rep1 47624 Plate_11_rep1 98.73994 99.74239
Cytotox Plate_11 DMSO 5 12 NA n 1 rep1 47383 Plate_11_rep1 98.24027 99.23765
Cytotox Plate_11 DMSO 6 12 NA n 1 rep1 46533 Plate_11_rep1 96.47794 97.45743

Quality Control Metrics

The qc_per_plate function calculate qc metrics for each assay plate, returns three tables each representing the statistics of negative controls, positive controls and QC measures including the CV of DMSO controls and Z' score.

Z' factor is calculated as follows: $$Z'=1-\frac{3\sigma_{positive\ control} + 3\sigma_{DMSO\ control}}{|\mu_{positive\ control} - \mu_{DMSO\ control}|}$$

# qc
qc <- qc_per_plate(demo_mc_norm, assay_info)
knitr::kable(qc$neg_ctrl_sum)
apid assay count_DMSO count_DMSO_NA mean_DMSO sd_DMSO CV_DMSO
Plate_11_rep1 Cytotox 8 0 101.01525 4.234387 4.191830
Plate_11_rep1 Primary 8 0 98.27582 7.857461 7.995315
Plate_11_rep2 Cytotox 8 0 101.42554 5.750167 5.669348
Plate_11_rep2 Primary 8 0 102.37073 9.500463 9.280449
Plate_11_rep3 Cytotox 8 0 99.74308 4.735941 4.748140
Plate_11_rep3 Primary 8 0 96.74751 8.379446 8.661149
knitr::kable(qc$pos_ctrl_sum)
apid assay sd_positive mean_positive
Plate_11_rep1 Primary 0.1785015 3.061242
Plate_11_rep2 Primary 0.2166091 3.571755
Plate_11_rep3 Primary 0.3947476 3.016518
Plate_11_rep1 Cytotox NA 4.102876
Plate_11_rep2 Cytotox NA 4.007689
Plate_11_rep3 Cytotox NA 4.476789
knitr::kable(qc$qc)
unique_id apid assay CV_DMSO Z_prime SSMD
Plate_11_rep1_Primary Plate_11_rep1 Primary 7.995315 0.7468046 12.11460
Plate_11_rep2_Primary Plate_11_rep2 Primary 9.280449 0.7049441 10.39668
Plate_11_rep3_Primary Plate_11_rep3 Primary 8.661149 0.7191689 11.17343
Plate_11_rep1_Cytotox Plate_11_rep1 Cytotox 4.191830 0.8689211 22.88699
Plate_11_rep2_Cytotox Plate_11_rep2 Cytotox 5.669348 0.8229226 16.94174
Plate_11_rep3_Cytotox Plate_11_rep3 Cytotox 4.748140 0.8508620 20.11560

Dose-response curve fitting

The fit_curve_tcpl function uses the hill model provided in U.S.EPA's ToxCast pipeline tcpl package[2] to fit dose-response curves. This function serves as an convenient interface to call the tcplFit function in the tcpl package, and returns a list object containing all data and modeling results. Compared to using the tcpl package, toxplot package doesn't require usage of mysql/sqlite database.

The Hill Model:

$$f(x) = \frac{tp}{1+10^{(ga-x)gw}}$$

Where x is the log concentration, tp is the top asymptote, ga is the AC50 (the log concentration where the modeled activity equals 50% of the top asymptote), and gw is the hill coefficient. The Hill model provided in the tcpl R package constrains the three parameters as following:

    1. 0 ≤ tp ≤ 1.2 times the maximum response value
    1. (minimum log concentration minus 2) ≤ ga ≤ (maximum log concentration plus 0.5)
    1. 0.3 ≤ gw ≤ 8
# curve fitting
demo_md <- fit_curve_tcpl(filter(demo_mc_norm, wllt == "t"), assay_info)
#> Processing 9 samples(spid)....
#> TP0001501G09 ||TP0001501G10 ||TP0001501G11 ||TP0001502A01 ||TP0001502B01 ||TP0001502B03 ||TP0001502B04 ||TP0001502B05 ||TP0001502B07 ||
#> Curve Fitting Completed!
#> Calculation time: 2.1 secs

Rank Chemicals

A toxicity-adjusted ranking score is calculated to rank chemical potency. For more information about the calculation of ranking score, please refer to this publication

# calculate ranking score
demo_rank <- rank_tcpl(demo_md)
knitr::kable(head(demo_rank))
index spid chnm casn taa med_diff AC50_toxi AC50_prim absEC80_toxi absEC50_toxi absEC80_prim absEC50_prim cyto_lim ranking_score
1 TP0001501G09 NA NA 6.0094767 11.056046 -4.457038 -4.318049 -4.208958 NA -4.967273 NA -4.208958 NA
2 TP0001501G10 NA NA 8.7902903 16.530767 -4.413282 -4.273303 -4.359345 NA -4.826529 NA -4.359345 NA
3 TP0001501G11 NA NA 12.5729649 15.735048 -4.350704 -4.462967 -4.549528 -4.04231 -4.901908 -4.352939 -4.549528 NA
4 TP0001502A01 NA NA 0.1157745 2.621527 -4.384463 -4.415429 -4.096896 NA -4.174754 NA -4.096896 NA
5 TP0001502B01 NA NA 4.6352583 9.095696 -4.358898 -4.302734 -4.361850 NA -4.569708 NA -4.361850 NA
6 TP0001502B03 NA NA 11.3269038 33.796854 NA -4.582086 NA NA -4.923170 NA NA NA

Visualize data and fitted curve

The plot_tcpl function uses ggplot2 to plot all the fitted curve with data in original direction. The funciton returns a list of ggplot2 objects.

# make plots
demo_plots <- plot_tcpl(demo_md, demo_rank, notation = FALSE)
# Visualize plot
demo_plots[[1]]
demo_plots[[2]]

It is also very convenient to have the interactive version of the plot with the plotly package.

library(plotly)
ggplotly(demo_plots[[2]])

Export plots to pdf

The save_plot_pdf function saves all plots into one pdf file.

save_plot_pdf(demo_plots, "allplots.pdf")

Reference

[1] https://pubs.acs.org/doi/10.1021/acs.est.7b06145

[2] https://cran.r-project.org/package=tcpl