A collection of papers and books that Storey Lab members should read and that others may find useful.
I recommend reading the following books in the order they are listed. I don't have any books listed for C++, but I recommend learning the basics of this language.
Statistical Inference, Casella and Berger
Statistical Models: Theory and Practice, Freedman
Introductory Statistics with R, Dalgaard
An Introduction to the Bootstrap, Efron and Tibshirani
An Introduction to Statistical Learning: with Applications in R, James et al.
A First Course in Bayesian Statistical Methods, Hoff
Bayesian Data Analysis, Gelman et al.
Machine Learning: A Probabilistic Perspective, Murphy
As above, I recommend reading the following books in the order they are listed. Keep in mind that genomics books easily become outdated due to how rapidly the field evolves, so please follow up any topics of interest in the most current literature.
Essential Cell Biology, Alberts et al.
A Primer of Genome Science, Gibson and Muse
Genomes 3, Brown -- or Genomes X if X > 3 is available.
Bioinformatics and Functional Genomics, Pevsner
Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids, Durbin et al.
Population Genetics: A Concise Guide, Gillespie
Elements of Statistical Learning, Hastie, Tibshirani, and Friedman
Pattern Recognition and Machine Learning, Bishop
Introduction to Probability, Blitzstein and Hwang
An Introduction to Probability Theory and Its Applications, Feller: Volume 1 Volume 2
Probability and Martingales, Williams
Principal Component Analysis, Jolliffe
Latent Variable Models and Factor Analysis: A Unified Approach, Bartholomew et al.
Multivariate Analysis, Mardia, Kent, and Bibby
The Science of Bradley Efron: Selected Papers, Morris and Tibshirani
Asymptotic Statistics, van der Vaart
Molecular Biology of the Cell, Alberts et al.
Learning from Data, Abu-Mostafa, Magdon-Ismail, and Lin
Graphical Models, Exponential Families, and Variational Inference, Wainwright and Jordan
Probabilistic Graphical Models: Principles and Techniques, Koller and Friedman
Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction, Imbens and Rubin
Causality: Models, Reasoning and Inference, Pearl
Causation, Prediction, and Search, Spirtes, Glymour, and Scheines
lme4: Mixed-effects modeling with R, Bates
Statistical Analysis with Missing Data, Little and Rubin
The Simply Statistics team have written a bunch of awesome books recently. I recommend checking out all of them.
The Elements of Data Analytic Style, Leek
The Art of Data Science, Peng and Matsui
Data Analysis for the Life Sciences, Irizarry and Love
Exploratory Data Analysis with R, Peng
R Programming for Data Science, Peng
I suggest you read these papers before joining the lab. They are fundamental to the research that we do.
Efron B, Tibshirani R, Storey JD, and Tusher V. (2001) Empirical Bayes analysis of a microarray experiment. Journal of the American Statistical Association, 96: 1151–1160. [PDF]
Storey JD. (2002) A direct approach to false discovery rates. Journal of the Royal Statistical Society, Series B, 64: 479–498. [PDF]
Storey JD and Tibshirani R. (2003) Statistical significance for genome-wide studies. Proceedings of the National Academy of Sciences, 100: 9440–9445. [PDF]
Storey JD. (2003) The positive false discovery rate: A Bayesian interpretation and the q-value. Annals of Statistics, 31: 2013–2035. [PDF]
Storey JD, Taylor JE, and Siegmund D. (2004) Strong control, conservative point estimation, and simultaneous conservative consistency of false discovery rates: A unified approach. Journal of the Royal Statistical Society, Series B, 66: 187–205. [PDF]
Storey JD, Akey JM, and Kruglyak L. (2005) Multiple locus linkage analysis of genome-wide expression in yeast. PLoS Biology, 3: 1380–1390. [PDF]
Storey JD, Xiao W, Leek JT, Tompkins RG, and Davis RW. (2005) Significance analysis of time course microarray experiments. Proceedings of the National Academy of Sciences, 102: 12837–12842. [PDF]
Storey JD, Dai JY, and Leek JT. (2007) The optimal discovery procedure for large-scale significance testing, with applications to comparative microarray experiments. Biostatistics, 8: 414–432. [PDF]
Akey JM, Biswas S, Leek JT, and Storey JD. (2007) On the design and analysis of expression studies in human populations. Nature Genetics, 39: 807–808. [PDF]
Leek JT and Storey JD. (2007) Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genetics, 3: e161. [PDF]
Chen LS, Emmert-Streib F, and Storey JD. (2007) Harnessing naturally randomized transcription to infer regulatory relationships among genes. Genome Biology, 8: R219. [PDF]
Leek JT and Storey JD. (2008) A general framework for multiple testing dependence. Proceedings of the National Academy of Sciences, 105: 18718–18723. [PDF]
Mecham BH, Nelson P, and Storey JD (2010) Supervised normalization of microarrays. Bioinformatics, 26: 1308–1315. Woo S, Leek JT, and Storey JD (2011) A computationally efficient modular optimal discovery procedure. Bioinformatics, 27: 509–515. [PDF]
Leek JT and Storey JD (2011) The joint null criterion for multiple hypothesis tests. Statistical Applications in Genetics and Molecular Biology, 10: Art 28. [PDF]
Desai KH, Tan CS, Leek JT, Maier RV, Tompkins RG, and Storey JD (2011) Within-patient gene expression dynamics explain subsequent inflammatory complications in critically injured patients: A longitudinal clinical genomics study. PLoS Medicine, 8(9): e1001093. [PDF]
Desai KH and Storey JD. (2012) Cross-dimensional inference of dependent high-dimensional data. Journal of the American Statistical Association, 107(497): 135–151. [PDF]
Hao W, Song M, and Storey JD. (2013) Probabilistic models of genetic variation in structured populations applied to global human studies. arXiv: 1312.2041. [PDF]
Marstrand TT and Storey JD. (2014) Identifying and mapping cell-type specific chromatin programming of gene expression. Proceedings of the National Academy of Sciences, 111(6): E645-E654. [PDF]
Chung NC and Storey JD. (2014) Statistical significance of variables driving systematic variation. Bioinformatics, doi: 10.1093/bioinformatics/btu674. [PDF]
Song M, Hao W, and Storey JD. (2015) Testing for genetic associations in arbitrarily structured populations. Nature Genetics, 10.1038/ng.3244 (also: bioRxiv, doi: 10.1101/012682). [PDF]