/abc_atlas_access

Documentation and examples demonstrating how to access data from the Allen Brain Cell Atlas

Primary LanguageJupyter NotebookOtherNOASSERTION

Allen Brain Cell Atlas - Data Access

The Allen Brain Cell Atlas (ABC Atlas) aims to empower researchers worldwide to explore and analyze multiple whole-brain datasets simultaneously. As the Allen Institute and its collaborators continue to add new modalities, species, and insights to the ABC Atlas, this groundbreaking platform will keep growing, opening up endless possibilities for groundbreaking discoveries and breakthroughs in neuroscience. With the ABC Atlas, researchers everywhere can gain new insights into the brain’s complex workings, advancing our understanding of this amazing organ in ways we never thought possible.

Data associated with the ABC Atlas is hosted on Amazon Web Services (AWS) in an S3 bucket as a AWS Public Dataset, arn:aws:s3:::allen-brain-cell-atlas. No account or login is required for access. The purpose of this repo is to provide an overview of the available data, how to download and use it through example use cases.

Data is being share under the Allen Institute Terms of Use.

The summer 2023 public beta data release includes:

  • 1.7 million single cell transcriptomes spanning the whole adult mouse brain using 10Xv2 chemistry (WMB-10Xv2)
  • 2.3 million single cell transcriptomes spanning the whole adult mouse brain using 10Xv3 chemistry (WMB-10Xv3)
  • Clustering analysis of 4.0 million single cell transcriptomes spanning the whole adult mouse brain combining the 10Xv2 and 10Xv3 datasets (WMB-10X)
  • A five level whole adult mouse brain taxonomy of cell types (WMB-taxonomy)
  • 4.0 million cell spatial transcriptomics dataset spanning a single adult mouse brain with a 500 gene panel and mapped to the whole mouse brain taxonomy (MERFISH-C57BL6J-638850)
  • Definition of 18 cell types neighborhoods and UMAP embeddings for fine grain visualization and analysis of neuronal types within and between brain regions (WMB-neighborhoods)
  • An updated Allen CCFv3 with additional annotations for layers of Ammon's horns, main olfactory blub and a simplifed 5-level anatomical heirarchy (Allen-CCF-2020)
  • CCF mapped coordinates for cells in the whole brain spatial transcriptomics dataset (MERFISH-C57BL6J-638850-CCF)

Each release has an associated manifest.json which list all the specific version of directories and files that are part of the release. We recommend using the manifest as the starting point of data download and usage.

Expression matrices are stored in the anndata h5ad format and needs to be downloaded to a local file system for usage. To make data transfer, download and access more efficient, the 10x transcriptomics datasets have been subdivided into smaller packages grouped by method and anatomical origin. The notebooks provide example code on how to access data across these individual files.

Available notebooks:

  • Getting started: learn how to use the manifest.json file to faciliate data download and usage.
  • 10x scRNA-seq clustering analysis and annotation: learn about the whole mouse brain taxonomy through some example use cases and visualization.
  • 10x scRNA-seq gene expression data
    • Part 1: learn about the 10x dataset through some example use cases and visualization of cells in the thalamus.
    • Part 2a: learn how to iterate through all the data packages, to access data for whole brain example use cases in part 2b.
    • Part 2b: explore the whole brain data through visualization and analyses of a set of genes of interest.
  • MERFISH whole brain spatial transcriptomics
    • Part 1: learn about the MERFISH dataset through some example use cases and visualization for a single brain section.
    • Part 2a: learn to access data and prepare for whole brain example use cases in part 2b.
    • Part 2b: explore the whole brain data through visualization and analyses of a set of genes of interest.
  • Cluster groups and embeddings: learn about cell types neighborhoods and neighborhood specific UMAP embeddings through example use cases.
  • Cell type neighborhood gallery: explore and visualize a set of cell types neighborhoods.
  • Allen CCFv3 parcellation and annotation: learn about the Allen CCFv3 and a simplified 5-level anatomical heirarchy through some example use cases and visualization.
  • MERFISH CCF mapped coordinates: learn about how to download and use CCF mapped coordinates through some example use cases and visualization.

Level of support

We are not currently supporting this code, but simply releasing it to the community AS IS but are not able to provide any guarantees of support. The community is welcome to submit issues, but you should not expect an active response.