mudatasets provides some public datasets with multimodal data, primarily focusing on multimodal omics datasets.
MuData library | MuData documentation
# Stable, with muon
pip install "mudatasets[muon]"
# Dev
pip install git+https://github.com/gtca/mudatasets
import mudatasets as mdsmds.list_datasets()mdata = mds.load("pbmc3k_multiome")
print(mdata)Some common attributes for .load() are:
data_dir=for location to save the dataset (~/mudatasets/by default)with_info=Truefor also returning the second argument with dataset description as a dictionary (Falseby default)backed=Truefor reading data in a backed format, only for.h5muand.h5adfiles (Trueby default)files=for downloading specific files from the datasetfull=Truefor downloading all the files defined for the dataset (Falseby default)
mds.info("pbmc3k_multiome")mds.list_files("pbmc3k_multiome")mds.serve_webpage(port=8000)This command will launch a server providing a simple (temporarily created) HTML page at http://localhost:8000 with files across all of the datasets listed.