/reading

A collection of papers and books that Storey Lab members should read and others may find useful

Reading List

A collection of papers and books that Storey Lab members should read and that others may find useful.

Books

Background in Statistics, Machine Learning, and Programming

I recommend reading the following books in the order they are listed. I don't have any books listed for C++, but I recommend learning the basics of this language.

Statistical Inference, Casella and Berger

Statistical Models: Theory and Practice, Freedman

Introductory Statistics with R, Dalgaard

Advanced R, Wickham

Nonparametric Regression and Generalized Linear Models: A roughness penalty approach, Green and Silverman

An Introduction to the Bootstrap, Efron and Tibshirani

All of Statistics, Wasserman

An Introduction to Statistical Learning: with Applications in R, James et al.

Learn Python

A First Course in Bayesian Statistical Methods, Hoff

Bayesian Data Analysis, Gelman et al.

Machine Learning: A Probabilistic Perspective, Murphy

Background in Genomics and Biology

As above, I recommend reading the following books in the order they are listed. Keep in mind that genomics books easily become outdated due to how rapidly the field evolves, so please follow up any topics of interest in the most current literature.

Essential Cell Biology, Alberts et al.

A Primer of Genome Science, Gibson and Muse

Genomes 3, Brown -- or Genomes X if X > 3 is available.

Bioinformatics and Functional Genomics, Pevsner

Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids, Durbin et al.

Population Genetics: A Concise Guide, Gillespie

More Specialized or Advanced

Elements of Statistical Learning, Hastie, Tibshirani, and Friedman

Pattern Recognition and Machine Learning, Bishop

Introduction to Probability, Blitzstein and Hwang

An Introduction to Probability Theory and Its Applications, Feller: Volume 1 Volume 2

Probability and Martingales, Williams

Principal Component Analysis, Jolliffe

Latent Variable Models and Factor Analysis: A Unified Approach, Bartholomew et al.

Multivariate Analysis, Mardia, Kent, and Bibby

Statistical Learning with Sparsity: The Lasso and Generalizations, Hastie, Tibshirani, and Wainwright

The Science of Bradley Efron: Selected Papers, Morris and Tibshirani

Asymptotic Statistics, van der Vaart

Molecular Biology of the Cell, Alberts et al.

Learning from Data, Abu-Mostafa, Magdon-Ismail, and Lin

Graphical Models, Exponential Families, and Variational Inference, Wainwright and Jordan

Probabilistic Graphical Models: Principles and Techniques, Koller and Friedman

Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction, Imbens and Rubin

Causality: Models, Reasoning and Inference, Pearl

Causation, Prediction, and Search, Spirtes, Glymour, and Scheines

lme4: Mixed-effects modeling with R, Bates

Statistical Analysis with Missing Data, Little and Rubin

Simply Statistics

The Simply Statistics team have written a bunch of awesome books recently. I recommend checking out all of them.

The Elements of Data Analytic Style, Leek

The Art of Data Science, Peng and Matsui

Data Analysis for the Life Sciences, Irizarry and Love

Exploratory Data Analysis with R, Peng

R Programming for Data Science, Peng

Classic Articles

Statistics

Genomics

Machine Learning

Other

Storey Lab Articles

I suggest you read these papers before joining the lab. They are fundamental to the research that we do.

Efron B, Tibshirani R, Storey JD, and Tusher V. (2001) Empirical Bayes analysis of a microarray experiment. Journal of the American Statistical Association, 96: 1151–1160. [PDF]

Storey JD. (2002) A direct approach to false discovery rates. Journal of the Royal Statistical Society, Series B, 64: 479–498. [PDF]

Storey JD and Tibshirani R. (2003) Statistical significance for genome-wide studies. Proceedings of the National Academy of Sciences, 100: 9440–9445. [PDF]

Storey JD. (2003) The positive false discovery rate: A Bayesian interpretation and the q-value. Annals of Statistics, 31: 2013–2035. [PDF]

Storey JD, Taylor JE, and Siegmund D. (2004) Strong control, conservative point estimation, and simultaneous conservative consistency of false discovery rates: A unified approach. Journal of the Royal Statistical Society, Series B, 66: 187–205. [PDF]

Storey JD, Akey JM, and Kruglyak L. (2005) Multiple locus linkage analysis of genome-wide expression in yeast. PLoS Biology, 3: 1380–1390. [PDF]

Storey JD, Xiao W, Leek JT, Tompkins RG, and Davis RW. (2005) Significance analysis of time course microarray experiments. Proceedings of the National Academy of Sciences, 102: 12837–12842. [PDF]

Storey JD, Dai JY, and Leek JT. (2007) The optimal discovery procedure for large-scale significance testing, with applications to comparative microarray experiments. Biostatistics, 8: 414–432. [PDF]

Akey JM, Biswas S, Leek JT, and Storey JD. (2007) On the design and analysis of expression studies in human populations. Nature Genetics, 39: 807–808. [PDF]

Leek JT and Storey JD. (2007) Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genetics, 3: e161. [PDF]

Chen LS, Emmert-Streib F, and Storey JD. (2007) Harnessing naturally randomized transcription to infer regulatory relationships among genes. Genome Biology, 8: R219. [PDF]

Leek JT and Storey JD. (2008) A general framework for multiple testing dependence. Proceedings of the National Academy of Sciences, 105: 18718–18723. [PDF]

Mecham BH, Nelson P, and Storey JD (2010) Supervised normalization of microarrays. Bioinformatics, 26: 1308–1315. Woo S, Leek JT, and Storey JD (2011) A computationally efficient modular optimal discovery procedure. Bioinformatics, 27: 509–515. [PDF]

Leek JT and Storey JD (2011) The joint null criterion for multiple hypothesis tests. Statistical Applications in Genetics and Molecular Biology, 10: Art 28. [PDF]

Desai KH, Tan CS, Leek JT, Maier RV, Tompkins RG, and Storey JD (2011) Within-patient gene expression dynamics explain subsequent inflammatory complications in critically injured patients: A longitudinal clinical genomics study. PLoS Medicine, 8(9): e1001093. [PDF]

Desai KH and Storey JD. (2012) Cross-dimensional inference of dependent high-dimensional data. Journal of the American Statistical Association, 107(497): 135–151. [PDF]

Hao W, Song M, and Storey JD. (2013) Probabilistic models of genetic variation in structured populations applied to global human studies. arXiv: 1312.2041. [PDF]

Marstrand TT and Storey JD. (2014) Identifying and mapping cell-type specific chromatin programming of gene expression. Proceedings of the National Academy of Sciences, 111(6): E645-E654. [PDF]

Chung NC and Storey JD. (2014) Statistical significance of variables driving systematic variation. Bioinformatics, doi: 10.1093/bioinformatics/btu674. [PDF]

Song M, Hao W, and Storey JD. (2015) Testing for genetic associations in arbitrarily structured populations. Nature Genetics, 10.1038/ng.3244 (also: bioRxiv, doi: 10.1101/012682). [PDF]