siscreenr

high throughput screening analysis package

My very own package for analyzing our siRNA based high throughput screens. All functions were designed and written (and sometimes re-written) by myself.

The package is designed to be used in siRNA based screening campaigns with microscopy readout by the ScanR imaging system (Olympus). It is assumed the screen is done in replicates. I attempted to build general tools but my own needs are reflected in the core design philosophy.

Installation:

Run the following to install the package:

devtools::install_github("olobiolo/siscreenr")

You may need to install the devtools package beforehand:

install.packages("devtools")

Literature and Inspirations

Hadley Wickham's Advanced R (https://adv-r.hadley.nz/).
Patrick Burns's The R inferno (https://www.burns-stat.com/pages/Tutor/R_inferno.pdf).
Hadley Wickham's dplyr and tidyr, later incorporated into tidyverse (https://www.tidyverse.org/).

Slowly but surely transitioning away from using tidyverse in favor of data.table (https://www.rdocumentation.org/packages/data.table/versions/1.12.8).

Disclaimer

This is a work in progress. There may well be bugs I missed. All feedback is welcome.

There is extensive documentation in the form of help pages.

Long form documentation (vignettes) is pending. This has to suffice for now.

Usage

The package is meant for interactive use and thus requiers the User to have a handle on R.

Besides functions immediately involved in data analysis, there are some utilities, e.g. for updating the siRNA library annotation and building layout files from parts, in case the plate layout changes during the campaign.

The basic forkflow:

This workflow was developed for screens in which a phenotype is quantified and silencing target genes can cause the phenotype to be enhanced or diminished.

Data building:

the screen log file is loaded and compared to the existing data files
data files are loaded and collated into a single data frame
a layout file is attached to denote well types

Data normalization:

data can me normalized plate-wise or globally
three methods of normalization are available: mean, median and medpolish
- the mean method subtracts the mean value of a measurement in a reference group from all data points
- the median method works the same as the mean method but subtracts the median to remove the influence of outliers
- the medpolish method runs Tukey's median polish on each plate to remove potential spatial effects; it is always applied plate-wise

Conversion to zscores:

normalized measurements are standardized by converting them to zscores: zi = (xi - mean(x)) / sd(x)
robust zscores are also available (median and median absolute deviation replace mean and standard deviation, respectively)
when calculating zscores, the mean and sd estimatino can be limited to a subset of observations; this allows for choosing the group to which sample wells are compared

Hit scoring:

single wells are scored as positive or negative hits (higher or lower measurement value, respectively)
given a zscore treshold (typically 2-3), observations are given hit scores, depending on their zscore values:
- wells with zscores equal or higher than {treshold} recieve a hit score of 1
- wells with zscores equal or lower than {-treshold} recieve a hit score of -1
- wells with zscores higher than {-treshold} and lower than {treshold} recieve a hit score of 0
hit scores are summarized over replicates
wells that meet the stringency criterion are considered hits
the stringency criterion can be the number or the fraction of replicates that pass the zscore treshold

Example: In a screen with three replicates the zscore treshold is 2.4 and the stringency criterion is 2.

A well with zscores of 2.2, 2.6 and 2.57 has hit scores of 0, 1 and 1, yielding a summarized hit score of 2: a hit.
A well will zscores of -2.4, 2.8 and 3.1. has hit scores of -1, 1 and 1, which yields a summarized hit score of 1: no hit.
Finally, a well with zscores of 2.23, 2.1 and 2.5 has hit scores of 0, 0 and 1: also not a hit.

Once hits are determined, well annotation is attached.
Some tools for data visualization are available:

scatter plot of zscore vs cell viability
hit distribution plots: number of hits per row, per column, and per plate (for quality control)

A report file can be generated at will, this is left to the user's discretion.

An alternative workflow:

A slightly altered workflow is implemented for screens in which a phenotype occurs in a known range, from a minimum in a negative control to a maximum in a positive control. Silencing of target genes is expressed within that range. This is commonly called Normalized Percent Inhibition/Activation, depending on whether the positive control inhibits or activates the phenotype, and is commonly used in chemical screenings.

Data building happens in the usual way.
Normalization is done by converting measurements into NPI/NPA. The sample wells will typically fall between 0 and 100%.
Hit scoring is done by setting a treshold on the NPI/NPA and applyin the stringency criterion.
Data annotation proceeds normally.
A plotting tool for NPI is available. The hit distribution tool as applicable.
Reporting is left to the user, as usual.

smaegol/siscreenr