/DataHub

A central repository for curating and managing diverse datasets used in healthcare applications.

Apache License 2.0Apache-2.0

Medical Datasets Repository

This repository offers curated medical datasets for AI/ML research, ensuring easy access and ethical usage to support reproducible healthcare studies. Here are additional repositories you may find useful: Models Repository | Standards Repository

Datasets and Resources

Medical Imaging Datasets

Title Description Link
The Cancer Imaging Archive (TCIA) A large archive of medical images of cancer accessible for public download. TCIA
OpenNeuro.org A free and open platform for sharing neuroimaging data. OpenNeuro
OASIS Brains Datasets of brain imaging for neurodegenerative diseases. OASIS Brains
Human Connectome Project Research data about human brain connectivity. Human Connectome Project
PathPresenter Digital pathology images for educational and research purposes. PathPresenter
CAMELYON Challenges Datasets for the detection of metastases in lymph node images. CAMELYON16
CAMELYON17
BioImage Archive A repository for biological images from various modalities. BioImage Archive
Medical Decathlon A collection of medical imaging datasets for training and evaluation. Medical Decathlon

Physiological Data Datasets

Title Description Link
PhysioNet Resource for complex physiological signals. PhysioNet
MIMIC-III A freely accessible critical care database. MIMIC-III

Government and Institutional Data Repositories

Title Description Link
NIH's NIMH Data Archive A large repository of research data on mental health. NIMH Data Archive
Healthdata.gov U.S. government health data. Healthdata.gov
Data.gov U.S. government open data. Data.gov
The World Health Organization Global health observatory data repository. WHO
The Human Mortality Database (HMD) Detailed mortality and population data. HMD
Data and Tools of the National Center for Health Statistics Health statistics and data sets. NCHS Data
OpenFDA Access to various FDA datasets. OpenFDA
The US Census Bureau U.S. population and demographic data. US Census Bureau

Medical Knowledge and Question Answering Datasets

Title Description Link
MedQA (USMLE) General medical knowledge in US medical licensing exam. MedQA
PubMedQA Closed-domain question answering given PubMed abstract. PubMedQA
MedMCQA General medical knowledge in Indian medical entrance exams. MedMCQA
MedRedQA English consumer Question Answering (QA) dataset containing 51,000 pairs of consumer questions and their corresponding expert answers. MedRedQA
MultiMedQA (140) Sample from HealthSearchQA, LiveQA, MedicationQA. MultiMedQA 140
MMLU-Clinical knowledge Clinical knowledge multiple-choice questions. MMLU-Clinical
MMLU Medical genetics Medical genetics multiple-choice questions. MMLU-Genetics
MMLU-Anatomy Anatomy multiple-choice questions. MMLU-Anatomy
MMLU-Professional medicine Professional medicine multiple-choice questions. MMLU-Professional
MMLU-College biology College biology multiple-choice questions. MMLU-Biology
MMLU-College medicine College medicine multiple-choice questions. MMLU-College

References:

  1. Huggin Face Datasets
  2. Kaggle Datasets
  3. Mendeley Data

Getting Help

If you need help or have any questions, view contributing guide or feel free to reach out by opening an issue or joining our Discord Community.

Your contributions are invaluable, and together, we can build a healthier future through innovation and excellence in medical AI/ML!