This repository contains custom codes used in the analysis of single cell, nuclei and spatial transcriptomics data from the healthy human lung, now published in Nature Genetics.
Visit our CellxGene browser!: https://www.lungcellatlas.org/
Most of the codes used in manuscript are publicly available packages with specifications written in the methods of the study.
-
Code for fGWAS plots and for cell type proportion analysis is available here: https://github.com/natsuhiko/PHM
-
Code for marker gene dot plots with mean group expressions and expression of TCR regions were previously published (Park, J et al. Science 2020) and the code available here (10.5281/zenodo.3711134)
-
Code and data from cell2location analysis of Visium data is available here
-
Code for shared TCR clonotype analysis across donors and locations is in the tile TCR-clonotypes.ipynb.
-
Code for Cell type composition analysis using linear mixed model (eg. Figure 1e) is available here
-
Code for explained variability code is explained below and in folders Data, Explained Variability and Plots.
The processed scRNA-seq, snRNA-seq and Visium ST data are available for browsing and download via our website www.lungcellatlas.org. The dataset (raw data and metadata) is available on the Human Cell Atlas Data Portal and on the European Nucleotide Archive (ENA) under accession number PRJEB52292 and BioStudies accession S-SUBS17. The Visium data are publicly available on ArrayExpress with the accession number E-MTAB-11640. Imaging data can be downloaded from European Bioinformatics Institute (EBI) BioImage Archive under accession number S-BIAD570. Additional data were accessed to support analysis and conclusions, which can be accessed through National Centre for Biotechnology Information Gene Expression Omnibus GSE136831, and GSE134174 and the HLCA integration, which can be accessed at https://github.com/LungCellAtlas/HLCA.
First, clone the repository
$ git clone https://github.com/elo073/5loclung.git
Next, access the data portal (https://5locationslung.cellgeni.sanger.ac.uk/cellxgene.html) and download the H5AD object under "All data". Save it in 5loclung/Data
Finaly, run the following commands:
# Access the script's folder
$ cd 5loclung/Explained\ Variability/
# Write count matrices
$ python convert_h5ad.py
$ python convert_h5ad_smg.py
# Execute scripts for explained variability
$ Rscript run.R
$ Rscript run_smg_sc.R
$ Rscript tun_smg_sn.R
The plots will be saved in the 'Plots' folder