Upstream processing for single cell ASAPseq for viral vector detection. This pipeline is based on a previous pipeline that was designed for processing of sequencing techniques with antibody derived tags (see scc-proc)
Example configuration is shown below.
- If ATAC mode is enabled, the sample name MUST match with the sample name used to the demultiplex the fastq files during cellranger-atac mkfastq.
- Multiple samples can be added to the config assuming they follow the same antibody mixture and bead setup.
- All fastq files must be in the fastq_dir path
- Specific filenames are provided in the
samples
attribute of the yaml file that correspond to the ADT files - Order of fastq files in the
samples
attribute corresponds to thecbc
,umi
,tag
numbering system (refer to kallisto documentation).
- Specific filenames are provided in the
- In the general settings:
run_modes
section will toggle different modalities to be run.threads
andmemory
allow rule-specific thread/memory setting respectively. This pipeline is designed for LSF based submission but the lsf.yml file can be edited as needed (see below)
paths:
atac_fastq_dir: ""
cellranger_ref_dir: ""
cellranger_exe: "/home/wuv/pkg/cellranger-atac-2.1.0/bin/cellranger-atac"
adt_fastq_dir: ""
adt_catalog: "adt_panel.csv"
allowlist: "737K-cratac-v1.revcomp.txt"
amulet_dir: "AMULET"
autosomes: "hg38_autosomes.txt"
denylist: "hglft_genome_411bb_f9b580.bed"
out_dirs:
atac: "." # this currently cannot be changed
haystack: "haystack_out"
adt: "adt_out"
amulet: "amulet_out"
general:
run_modes:
atac: True
atac_level: "namesort"
adt: True
library:
cbc: [1,0,16]
umi: [0,0,10]
tag: [2,0,15]
threads:
cellranger_count: 12
namesort: 8
make_bus: 4
sort_bus: 8
amulet: 2
memory:
cellranger_count: 120000
namesort: 96000
make_bus: 36000
amulet: 36000
samples:
- name: "A"
adt_fastqs: ["A_S2_R1_001.fastq.gz", "A_S2_R2_001.fastq.gz", "A_S2_R3_001.fastq.gz"]
This file allows for specific LSF submission settings based on rules.
__default__:
- "-q normal"
cellranger_count:
- "-q denovo"
namesort:
- "-q denovo"
Refer to config above
Comma separted file with barcodes and descriptions. Example below:
CD40,CTCAGATGGAGTATG
CD44,AATCCTTCCGAATGT
CD48,CTACGACGTAGAAGA
CD21,AACCTAGTAGTTCGG
Allowlist file containing accepted cell barcodes. One per line. Example below:
GTCTGCTATGTCTA
GATGATGCATAGAA
- Create conda env and set up LSF job submission for Snakemake (follow instructions here).
conda create --name <env> --file env.yml
- Run Snakemake
bsub -e snek.e -o snek.o snakemake --configfile=config.yaml --profile=lsf -s <path-to-Snakefile>
The overall process will look like this (to be added):
- Downstream analysis of your own choosing. Output files are as follows (to be added)...
- Conda
- Snakemake
- Kallisto
- Kallisto KITE featureMap.py
- Bustools
- kallisto | bustools KITE protocol
- I made my Snakemake pipeline heavily based on their pipeline.