I maintain this list mostly as a personal braindump of interesting medical datasets, with a focus on medical imaging.
Rather than try to group / cluster datasets, I'm going to try to maintain a set of keywords for each.
See commit log for a list of additions over time.
Please feel free to contribute!
Disclaimer: please remember to solve real clinical problems ☺
224,316 chest radiographs of 65,240 patients, with labels from reports
Keywords: very-large, X-ray, labels
100000 radiographs
Keywords: very-large, X-ray, labels
371,920 chest x-rays associated with 227,943 imaging studies
3/16/2019: Not yet linked with MIMIC ICU data. See news article
v2: free-text radiology reports
Need to request access
Keywords: very-large, X-ray, labels
160,000 images from 67,000 patients that were interpreted and reported by radiologists
labeled with 174 different radiographic findings, 19 differential diagnoses and 104 anatomic locations organized as a hierarchical taxonomy mapped to standard Unified Medical Language System (UMLS)
Keywords: very-large, X-ray, labels
1000+ dataset of eye gaze, radiological reports, dictation, segmentation on MICMIC-CXR Database
code to reproduce experiments
Keywords: medium, X-ray, labels
Several collections
Tons of Images of various kinds, including CT, MR, Pathology, PT, with diagnoses
Keywords: vary-large, CT, MR, labels
Part of Cancer Imaging Archive
50000+ patients with CT data, some pathology, limited availability
Keywords: very-large, CT, labels
32000+ CT scans with annotations, meta-data, semantic labels from radiological reports
Keywords: very-large, CT, labels
10,000+ labeled echocardiogram videos and human expert tracing
Keywords: very-large, ultrasound, labels
MRI for 8500 young (9-10yo) subjects (about 4100 for training)
Keywords: large, MRI
4,000 simulated sinogram/image pairs of 2D breast CTs Keywords: large, CT, reconstruction
two large scale neuroimaging datasets on reading and language development
Over 3000 MRI, fMRI
article | more resources
Keywords: large, MRI
414 T1 MRIs from the OASIS dataset, processed using FreeSurfer and SAMSEG
Includes original images, along with processed volumes and resulting anatomical segmentation maps
Keywords: large, MRI, segmentations, labels, annotations, processed
1,370 knee MRI exams with diagonsis (healthy/ACL tear/meniscal tear)
Keywords: large, MRI, labels
k-space data
1500 fully sample knee MRIs and 10K clinical MRIs, and 6.5K brain MRIs.
Part of a challenge
Keywords: large, MRI, k-space
Open-Access Multi-Coil k-Space Dataset for Cardiovascular Magnetic Resonance Imaging
k-space data, roughly 250 volumes
Keywords: medium, MRI, k-space
1704 MRI, 556 amyloid and tau CSF samples, blood markers, genetic info and longitudinal cognitive data on ~400 at risk individuals
Keywords: medium, MRI, genetics, labels
10 Medical image datasets with segmentations
2000+ CT & MR images of various organs from different sources
Keywords: medium, MRI, segmentations
Multiple Acquisitions for Standardization of Structural Imaging Validation and Evaluation
8000 diffusion-weighted volumes
10 3D FLAIR, T1-, and T2-weighted datasets of a single healthy subject
Keywords: large, MRI
1000+ fMRI and other modalities subjects with annotated event files; raw and preprocessed
Keywords: medium, fMRI
List of mri k-space datasets
601 series of CT projection data, reconstructed images, and clinical data reports Keywords: medium, CT, reconstruction
Brain MRI images together with manual FLAIR abnormality segmentation masks
110 subjects from TCIA LGG collection with lower-grade glioma cases
Keywords: medium, brain, MRI, segmentation, LGG, FLAIR
Few subjects, but many modalities (T1,T2,SWI,Angio,DWI, fMRI during Forrest Gump at 3T (audio+visual+eyetracking+physio) and 7T (audio+physio only), some audio tasks, and other important visual tasks)
Keywords: small, multi-modal
LIDC-IDRI consists of diagonstic and lung cancer screening CTs.
1018 cases with some Radiologist Annotations/Segmentations and nodule counts
Also available through LUng Nodule Analysis (LUNA) challenge
Keywords: large, CT, labels
All imaging
Fundus imaging
Keywords: very-large
4703 CXR of COVID19 patients, manually annotated Brixia score
Keywords: large, x-ray, covid
349 CT images collected from several COVID19-related papers
Image captions
Keywords: medium, CT, covid
~5000 xrays
Keywords: medium, x-ray, pneumonia
998 Chest x-ray examinations from 361 COVID+ patients. Annotations with appearance classification and Airspace Disease Grading Clinical variables Keywords: large, x-ray, covid
1350+ Xrays, 150+ CTs, 800 diagnoses
Keywords: medium, CT, covid
1000+ CTs of COVID19 patients
50 are annotated per pixel
Keywords: large, CT, covid, segmentations
~250 chest CTs with positive RT-PCR SARS-CoV-2, annotations of COVID-19 lesions Keywords: medium, CT, covid, annotations, segmentations
~100 segmented CT slices
Keywords: medium, CT, segmentations, covid
~150 xrays, ongoing, some hospital data
Keywords: medium, x-ray, covid
ongoing, about 60 patients at last check, CT
paper pdf
Keywords: medium, CT, covid
1000 X-rays and 240 CTs with annotations (paper)
Keywords: large, CT, covid, segmentations
129 retinal images.
Keywords: small, fundus
40 retinal images with segmentations
Keywords: small, retinal, segmentations
500+ CT scans from 11+ countries with Abdominal Organ Segmentation (the liver, kidney, spleen, and pancreas)
Keywords: large, abdominal, CT
Various imaging (longitudinal MRI), Genetics, Clinical data
Several thousand patients
Keyworks: large, MRI, genetics, clinical
~120 image volumes (whole body CT and MRI images)
more than 1900 annotated anatomical structures
Keywords: medium, MRI, CT, whole-body, manual-segmentation
Seems like 101 manually labelled brain MRIs
Keywords: medium, MRI, brain, manual-segmentation
3000 brain scans (T1w, bold, events)
Standardized tests, scores, demographics
Keywords: large, MRI, fMRI, tests
A curated dataset of digital breast tomosynthesis images from 5,060 patients.
Keywords: large, tomosynthesis, DBT, breast, detection
2600+ scanned film mammography studies
Keywords: large, x-ray
63 manually labelled brain scans.
Costs ($1500?)
Discussion
Keywords: medium, MRI, brain, manual-segmentation, costly
This is a challenge for ISBI2019
22 particiapnts with cognitive and physiological mreasures, and 7T rs-fMRI
200+ subjects across several datasets (CTs, Xrays, MRIs)
20 cardiac MR images in Congenital Heart Disease
paper
~50 children (~10yo) with single follow-up with MRI, fMRI and assesments
Keywords: medium, fMRI, longitudinal
paper
3T fMRI 132 typical dev children, 2 time points, four tasks
Keywords: medium, fMRI, longitudinal
aggregates auditory story-listening fMRI datasets acquired over the course of roughly seven years
Keywords: medium, fMRI
229 T1-weighted MRI scans (n=220) with lesion segmentation
MNI152 standard-space T1-weighted average structural template image
A .csv file containing lesion metadata
paper
Keywords: medium, MRI, segmentations
21 Canine mammary carcinoma whole slide images.
Annotated by 2/3 experts
Keywords: small, 2D, whole slide imaging
48 manually annotated in utero fetal MR
Keywords: small, mri, fetal, labels
Single voluneer, 73 Sessions at multiple sites over ~17 years
MRI, at least T1 at each session, with other modalities varying by session.
Phenotype file provided
Keywords: small, MRI, longitudinal
Single volume, histological space , 100 micron) with GM/WM surfaces and cortical layers
ftp://bigbrain.loris.ca | interactive
Keywords: small, histology, high-resolution, segmentations
Single volume, ultra-high resolution MRI dataset (100-micron)
Keywords: small, MRI, brain
8-subjects large-scale fMRI (40-sessions, high sampling, high resolution). T1w, T2w, T2*w MRI
Video description
Keywords: small, MRI, brain, fMRI
(ex-vivo) brain MRIs or brains of different animals
Keywords: small, MRI, brain, animals
Three Diffusion of healthy traveling adults
Keywords: small, MRI, diffusion, brain
Prenatal brain MRI samples (looks like single subject?)
Keywords: small, MRI, fetal
This dataset is curated based on MIMIC-CXR, containing 3 metadata files that consist of pulmonary edema severity grades extracted from the MIMIC-CXR dataset through different means: 1) by regular expression (regex) from radiology reports, 2) by expert labeling from radiology reports, and 3) by consensus labeling from chest radiographs.
Keywords: pulmonary edema, severity grades, chest x-ray, radiology reports, MIMIC-CXR
predict sepsis in an ICU population
5000 ICU patients in three separate hospital systems
detailed information about critical care stays for over 200,000 admissions at 200+ hospitals across the US.
With access to MIMIC, can access eICU-CRD immediately after signing an updated DUA.
paper
Other lists or pooling resources (relevant xkcd)
- Giorgos Sfikas: medical imaging datasets on github
- Andy Beam: medical data on github
- Christopher Madan: openMorph (open-access MRI, well structured list)
- Stephen Aylward's list of open-Access Medial Image Repositories
- google dataset search
- grand-challenges
- academic torrents
- multiBrain
- openneuro databse
Note the nice "fast preview" feature - The Cancer Image Archive
- Cornell Public Image Databases