DBiT-seq

This is a public repository for all code connected to DBiT-Seq (microfluidic Deterministic Barcoding in Tissue for spatial omics sequencing).

Please cite: Yang et al. High-Spatial-Resolution Multi-Omics Atlas Sequencing of Mouse Embryos via Deterministic Barcoding in Tissue. bioRxiv 2019: doi: https://doi.org/10.1101/788992.

Schematic workflow

foo bar

All raw and processed files are available at GEO (GSE137986)

Pre-processing

This is code for quality control and reformating the read file for compatibility with st-pipeline.

In our datasets, read2 contains the barcode and UMI, so we need to reformat the read file for compatibility with st-pipeline.

foo bar

To reformat the read file, run

perl reformat.pl -indir 01.rawdata -outdir 02.reformatdata -sample 10t

Generate gene expression matrix

To generate the expression matrix file, run st-pipeline(v1.7.2):

sample=10t
FW=/02.reformatdata/$sample/$sample.R1.fastq.gz
RV=/02.reformatdata/$sample/$sample.R2.fastq.gz
MAP=/database/GRCm38_86/StarIndex
ANN=/database/GRCm38_86/gencode.vM11.annotation.gtf
CONT=/database/GRCm38_86/ncRNA/StarIndex
ID=barcodes.xls
OUTPUT=/03.stpipeline/$sample
mkdir -p /03.stpipeline/$sample
TMP=/03.stpipeline/$sample/tmp
mkdir -p /03.stpipeline/$sample/tmp
EXP=$sample

st_pipeline_run.py \
  --output-folder $OUTPUT \
  --temp-folder $TMP \
  --umi-start-position 16 \
  --umi-end-position 26 \
  --ids $ID \
  --ref-map $MAP \
  --ref-annotation $ANN \
  --expName $EXP \
  --htseq-no-ambiguous \
  --verbose \
  --mapping-threads 16 \
  --log-file $OUTPUT/${EXP}_log.txt \
  --two-pass-mode \
  --no-clean-up \
  --contaminant-index $CONT \
  --disable-clipping \
  --min-length-qual-trimming 30 \
  $FW $RV
convertEnsemblToNames.py \
  --annotation /database/GRCm38_86/gencode.vM11.annotation.gtf \
  --output /03.stpipeline/$sample/$sample\_stdata.updated.tsv \
  /03.stpipeline/$sample/$sample\_stdata.tsv

To visulize the expression map, run a modified st_qa-new.py script:

st_qa-new.py --input-data /03.stpipeline/$sample/$sample\_stdata.updated.tsv

To derive the expression map of individual mRNA genes, we conducted global normalization by “Scran” and then use log scale transformation:

st_data_plotter.py --normalization Scran --show-genes Notch1 --image-files 10t.png --counts-table-files 10t.under-tissue.tsv --use-log-scale --dot-size 8

Image alignment in Adobe Illustrator(AI)

Now that we have an stdata file containing all the gene expression data for all the squares, we would like to remove all the squares that are not located under the tissue.

  1. Transform the location and it's reads count from the expression matrix to Scalable Vector Graphics(SVG) format. SVG images and their related behaviors are defined in XML text files, which means they can be freely edited, searched, indexed and scripted.
  2. Open the image file and SVG file in Illustror, manully alignment the squares on image and SVG files.
  3. Turn off visibility of the image layer, use Selection Tool to select all all the squares that are not located under the tissue, and delete them.
  4. Save the SVG image into a XML text file, which is contain all the locations of the squares that under the tissue.
  5. Extract these location from the expression matrix data.

foo bar

Spatial differential expression analysis

This is code for differential expression analysis.

Figure 2G: use the “st_qa.py” scripts in st-pipeline to do the quality assemssment

Figure 3B: Spatially variable genes generated by SpatialDE was used to conduct the clustering analysis, Non-negative matrix factorization(NMF) was performed using the NNLM pacakges in R, after the raw values were log-transformed, we chose k of 11 for the mouse embryo DBiT-seq transcriptome data obtained at a 50μm pixel size. For each pixel, the largest factor loading from NMF was used to assign cluster membership. NMF clustering of pixels was plotted by tSNE using the package “Rtsne” in R.