Publicly available dataset for bioinformatics and integrative -omics research (on-going)
- Exome Aggregation Consortium dataset
- 1000 Genomes Project
- Personal Genomes Project
- HapMap Project
- NHLBI exome sequencing project
- Geuvadis RNA sequencing project
→ Available in convenient RData format from the Leek lab - GTEx
- ReCount
- Single Cell Analysis Program – Transcriptome Project
- Stanford microarray database
- iDash data repository (has several datasets)
- i2b2 NLP datasets
- Medline
- Surveillance, Epidemiology, and End Results (SEER) for cancer
- MIMIC II
- Healthcare Cost and Utilization Project (HCUP)
- National Health and Nutrition Examination Survey (NHANES)
- National Ambulatory Medical Care Survey (NAMCS/NHAMCS)
- Community Tracking Study
- National Health Interview Survey