/5loclung

Custom codes accompanying single cell spatial transcriptomic study on the healthy human lung

Primary LanguageJupyter Notebook

5loclung (5 location lung study)

A spatially resolved atlas of the human lung characterizes a gland-associated immune niche

This repository contains custom codes used in the analysis of single cell, nuclei and spatial transcriptomics data from the healthy human lung, now published in Nature Genetics.

Visit our CellxGene browser!: https://www.lungcellatlas.org/

Code availability

Most of the codes used in manuscript are publicly available packages with specifications written in the methods of the study.

  • Code for fGWAS plots and for cell type proportion analysis is available here: https://github.com/natsuhiko/PHM

  • Code for marker gene dot plots with mean group expressions and expression of TCR regions were previously published (Park, J et al. Science 2020) and the code available here (10.5281/zenodo.3711134)

  • Code and data from cell2location analysis of Visium data is available here

  • Code for shared TCR clonotype analysis across donors and locations is in the tile TCR-clonotypes.ipynb.

  • Code for Cell type composition analysis using linear mixed model (eg. Figure 1e) is available here

  • Code for explained variability code is explained below and in folders Data, Explained Variability and Plots.

Data Availability

The processed scRNA-seq, snRNA-seq and Visium ST data are available for browsing and download via our website www.lungcellatlas.org. The dataset (raw data and metadata) is available on the Human Cell Atlas Data Portal and on the European Nucleotide Archive (ENA) under accession number PRJEB52292 and BioStudies accession S-SUBS17. The Visium data are publicly available on ArrayExpress with the accession number E-MTAB-11640. Imaging data can be downloaded from European Bioinformatics Institute (EBI) BioImage Archive under accession number S-BIAD570. Additional data were accessed to support analysis and conclusions, which can be accessed through National Centre for Biotechnology Information Gene Expression Omnibus GSE136831, and GSE134174 and the HLCA integration, which can be accessed at https://github.com/LungCellAtlas/HLCA.

Instructions on the analysis for calculating explained varibility by a metadata factor

How to Execute

First, clone the repository

$ git clone https://github.com/elo073/5loclung.git

Next, access the data portal (https://5locationslung.cellgeni.sanger.ac.uk/cellxgene.html) and download the H5AD object under "All data". Save it in 5loclung/Data

Finaly, run the following commands:

# Access the script's folder
$ cd 5loclung/Explained\ Variability/ 

# Write count matrices
$ python convert_h5ad.py
$ python convert_h5ad_smg.py

# Execute scripts for explained variability
$ Rscript run.R
$ Rscript run_smg_sc.R
$ Rscript tun_smg_sn.R

The plots will be saved in the 'Plots' folder