Analysis code used in Galeano Nino et al., Effect of the intratumoral microbiota on spatial and cellular heterogeneity in cancer. 2022
The code in this repository is organized to reflect the description in the Methods section of Galeano Nino et al., Effect of the intratumoral microbiota on spatial and cellular heterogeneity in cancer. 2022.
10X Visium Scans associated with manuscript submission are uploaded to AWS and Zenodo.
Tiff files can be accessed via: https://fh-pi-bullman-s-eco-public.s3.us-west-2.amazonaws.com/DataTransfer/Galeano_Nino_et_al_visium_scans/CRC_OSCC_visium_tiff.tar.gz and https://doi.org/10.5281/zenodo.7419806
Please note for sample CRC_16
, the slide id is V10S15-020
and area code is D1
; for sample OSCC_2
, the slide id is V11A07-022
and area code is A1
.
We also uploaded fastq files to AWS for your convenience: https://fh-pi-bullman-s-eco-public.s3.us-west-2.amazonaws.com/DataTransfer/Galeano_Nino_et_al_visium_scans/CRC_OSCC_visium_fastq.tar.gz
All of the analysis code documented in this repository was run on the shared computing cluster
maintained at the Fred Hutchinson Cancer Research Center between May 2020 and August 2022.
The software dependencies used by these scripts are provided using the EasyBuild installation
maintained by the Fred Hutch Scientific Computing group.
Those software dependencies are loaded into the environment with the ml
command (e.g. ml CellRanger/6.1.1
).
Prior to running the analysis scripts, reference databases were downloaded for PathSeq (December 2020)
and CellRanger (January 2022).
The location of those reference databases is provided to the analysis scripts using the environment variables pathseqdb
and cellrangerdb
.
- Identification of microbial reads within 10x Visium spatial transcriptomic data generated by 10x Space Ranger Count (
Visium_pipeline.sh
) - Bioinformatic analysis of 10x Visium spatial transcriptomic data (
Visium.R
) - summarize numbers of bacteria reads and UMIs in 10X Visium data (
validate_and_count.py
) The folder used as outputs from the previous steps should be provided as an argument to theVisium_pipeline.sh
script.
CRC_16.visium.raw_matrix.genus.csv
andOSCC_2.visium.raw_matrix.genus.csv
contain bacteria UMI counting matrix that can be used as metadata in visium data processCRC_16.visium.raw_matrix.validate.csv
andOSCC_2.visium.raw_matrix.validate.csv
contain validation data that can be used as the input ofvalidate_and_count.py
- All of the input data for this analysis is provided in FASTQ format generated by the CellRanger
mkfastq
command - The folder containing those FASTQ files is set to the environment variable
raw_data_folder
- Identification of microbial reads within single cells GEX libraries (
patient_samples_GEX_pipeline.sh
andcell_culture_samples_GEX_pipeline.sh
) - INVADEseq bacterial 16S rRNA gene libraries (
patient_samples_16s_pipeline.sh
andcell_culture_16s_pipeline.sh
). The variablegex_bam_path
should be set to the output folder from thepatient_samples_GEX_pipeline.sh
andcell_culture_samples_GEX_pipeline.sh
script. - Combine and deduplication of microbial metadata from step 1 & 2 (
merge_metadata.py
andmetadata_dedup.py
). The folder used as outputs from the previous steps should be provided as an argument to themerge_metadata.py
script.
headneck_gex_16s_mix_dedup.csv
HT_29_gex_16s_mix_dedup.csv
HCT_116_csv_gex_16s_mix_dedup.csv
contain bacteria UMI counting matrix that can be used as Seurat object metadata in single cell process.
- Seurat data processing, Harmony integration, SingleR annotation and copyKAT predication (
patient_samples_Seurat.r
andcell_culture_Seurat.r
) - Differentially expression analysis and GSEA (
DE.r
)