siscreenr

high throughput screening analysis package

My very own package for analyzing our siRNA based high throughput screens. All functions were designed and written (and sometimes re-written) by myself.

The package is designed to be used in siRNA based screening campaigns with microscopy readout by the ScanR imaging system (Olympus). It is assumed the screen is done in replicates. I attempted to build general tools but my own needs are reflected in the core design philosophy.

Installation:

Run the following to install the package:

if (!requireNamespace("remotes")) install.packages("remotes")
remotes::install_github("olobiolo/siscreenr")

Dependencies:

Reading and writing large files is done with package data.table.
Dates are handled with package lubridate.
Plotting is done with packages ggplot2 and lattice.
GeneBank data base is queried with package reutils.
Finally, some S3 methods are created with package metamethods.

Literature and Inspirations

Hadley Wickham's Advanced R.
Patrick Burns's The R inferno.
Version 2 was built using packages dplyr and tidyr, later incorporated into tidyverse.
Version 3 abandons the tidyr and dplyr in favor of data.table. dplyr is only used in unit tests.

Disclaimer

This is a work in progress. There may well be bugs I missed. All feedback is welcome.

There is extensive documentation in the form of help pages.

Long form documentation (vignettes) is pending. This has to suffice for now.

Usage

The package is meant for interactive use and thus requiers the User to have a handle on R.

Besides functions immediately involved in data analysis, there are some utilities, e.g. for updating the siRNA library annotation and building layout files from parts, in case the plate layout changes during the campaign.

The basic forkflow:

This workflow was developed for screens in which a phenotype is quantified and silencing target genes can cause the phenotype to be enhanced or diminished.

Data building:

the screen log file is loaded and compared to the existing data files
data files are loaded and collated into a single data frame
a layout file is attached to denote well types

Data normalization:

data can me normalized plate-wise or globally
three methods of normalization are available: mean, median and medpolish
- the mean method subtracts the mean value of a measurement in a reference group from all data points
- the median method works the same as the mean method but subtracts the median to remove the influence of outliers
- the medpolish method runs Tukey's median polish on each plate to remove potential spatial effects; it is always applied plate-wise

Conversion to z-scores:

normalized measurements are standardized by converting them to z-scores: zi = (xi - mean(x)) / sd(x)
robust z-scores are also available (median and median absolute deviation replace mean and standard deviation, respectively)
when calculating z-scores, the mean and sd estimation can be limited to a subset of observations; this allows for choosing the group to which sample wells are compared

Hit scoring:

single wells are scored as positive or negative hits (higher or lower measurement value, respectively)
given a z-score treshold (typically 2-3), observations recieve hit scores, depending on their z-score values:
- wells with z-scores equal or higher than {treshold} recieve a hit score of 1
- wells with z-scores equal or lower than {-treshold} recieve a hit score of -1
- wells with z-scores higher than {-treshold} and lower than {treshold} recieve a hit score of 0
hit scores are summarized over replicates
wells that meet the stringency criterion are considered hits
the stringency criterion can be the number or the fraction of replicates that pass the z-score treshold

Example: In a screen with three replicates the z-score treshold is 2.4 and the stringency criterion is 2.

A well with z-scores of 2.2, 2.6 and 2.57 has hit scores of 0, 1 and 1, yielding a summarized hit score of 2: a hit.
A well will z-scores of -2.4, 2.8 and 3.1. has hit scores of -1, 1 and 1, which yields a summarized hit score of 1: no hit.
Finally, a well with z-scores of 2.23, 2.1 and 2.5 has hit scores of 0, 0 and 1: also not a hit.

Once hits are determined, well annotation is attached.
Some tools for data visualization are available:

scatter plot of zscore vs cell viability
hit distribution plots: number of hits per row, per column, and per plate (for quality control)

A report file can be generated at will, this is left to the User's discretion.

An alternative workflow:

A slightly altered workflow is implemented for screens in which a phenotype occurs in a known range, from a minimum in a negative control to a maximum in a positive control. Silencing of target genes is expressed within that range. This is commonly called Normalized Percent Inhibition/Activation, depending on whether the positive control inhibits or activates the phenotype, and is commonly used in chemical screenings.

Data building happens in the usual way.
Normalization is done by converting measurements into NPI/NPA. The sample wells will typically fall between 0 and 100%.
Hit scoring is done by setting a treshold on the NPI/NPA and applyin the stringency criterion.
Data annotation proceeds normally.
A plotting tool for NPI is available. The hit distribution tool as applicable.
Reporting is left to the User, as usual.