Basic Illumina array pipeline

This is a simple pipeline for analysing Illumina array data, using the Lumi [1] and Limma [2] packages of Bioconductor. To use it, edit the indicated variables at the top of the lumi.mk makefile, and execute it with 'make -f lumi.mk'. A directory called 'pipeline' will be created and will contain the outputs.

Inputs

This is not a bead-level analysis, it assumes you have the following:

Sample probe profile, control probe profile and samples table, all usually exported from BeadStudio.
An illumina annotation file, e.g. "HumanHT-12_V4_0_R2_15002873_B.txt"
A tab-delimted experiment file describing your samples and with rows matching columns of the input data.
A tab-delimited file defining contrasts.
A set of .gmt format gene set files for differential gene set analysis.

Experiment file

The experiment file is tab-delimited without a column name for sample IDs, like:

age	gender
Sample1	25	M
Sample2	30	F
Sample3 22	F
Sample4 12	M
Sample5	50	M
Sample6	70	F

Contrasts file

The contrasts file defines contrasts in terms of the variables found in the experiment, like:

variable	group1	group2
gender	F	M

Analysis

lumi.mk is a makefile which can be used to run this pipeline. It is executed like:

make -f lumi.mk <TARGET>

Where <TARGET> represents a given target in the makefile.

You can see what the makefile will do before actually running it with:

make -n -f lumi.mk <TARGET>

Makefile targets are:

all

Run all of the following. The default.

split_anno

Split the Illumina annotation file into main- and control- probes

read_lumi

Run readIllumina.R to Make a valid lumiBatch object from the inputs (will be serialised to .RDS)

preprocess_lumi

Run lumiExpresso.R to Call lumiExpresso() to perform background correction, normalisation and variance stabilisation (check for LUMI parameters in the makefile to tweak options).

extract_matrices

Use extractMatrix.R to derive csv-formatted matrices we can use later for exploratory purposes.

run_limma

Run arrayLimma.R to look at the specified contrasts using limma and produce matrices of uncorrected and corrected p values.

run_roast

Employ limma's mroast() method to perform differential gene set analysis.

make_shiny_object

Using makeShiny.R, take the text-format outputs and make a data structure for use with shinyngs. This will be serialised to data.rds, and can the be loaded for visualisation:

eselist <- readRDS('data.rds')
app <- prepareApp('illuminaaarray', eselist)
shiny::shinyApp(ui = app$ui, server = app$server)

References

[1] Du, P., Kibbe, W.A., Lin and S.M. (2008). “lumi: a pipeline for processing Illumina microarray.” Bioinformatics.

P D, X Z, CC H, N J, WA K, L H and SM L (2010). “Comparison of Beta-value and M-value methods for quantifying methylation levels by microarray analysis.” BMC Bioinformatics.

Lin, S.M., Du, P., Kibbe and W.A. (2008). “Model-based Variance-stabilizing Transformation for Illumina Microarray Data.” Nucleic Acids Res.

Du, P., Kibbe, W.A., Lin and S.M. (2007). “nuID: A universal naming schema of oligonucleotides for Illumina, Affymetrix, and other microarrays.” Biology Direct.
[2] Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W and Smyth GK (2015). “limma powers differential expression analyses for RNA-sequencing and microarray studies.” Nucleic Acids Research, 43(7), pp. e47.
[3] >Huber, W., Carey, J. V, Gentleman, R., Anders, S., Carlson, M., Carvalho, S. B, Bravo, C. H, Davis, S., Gatto, L., Girke, T., Gottardo, R., Hahne, F., Hansen, D. K, Irizarry, A. R, Lawrence, M., Love, I. M, MacDonald, J., Obenchain, V., Ole's, K. A, Pag'es, H., Reyes, A., Shannon, P., Smyth, K. G, Tenenbaum, D., Waldron, L., Morgan and M. (2015). “Orchestrating high-throughput genomic analysis with Bioconductor.” Nature Methods, 12(2), pp. 115–121. <a href="http://www.nature.com/nmeth/journal/v12/n2/full/nmeth.3252.html\">http://www.nature.com/nmeth/journal/v12/n2/full/nmeth.3252.html.

pinin4fjords/illumina_array_pipeline