/CASB

Scripts for CASB: A concanavalin A-based sample barcoding strategy for single-cell sequencing

Primary LanguagePythonMIT LicenseMIT

CASB

Scripts for CASB: A concanavalin A-based sample barcoding strategy for single-cell sequencing

Four python scripts are available for cell barcode and sample barcode extraction. fastq.gz format is compatible.

Requirements

  • python2.7
  • pysam 0.16

extract_barcodes_CASB.py for CASB 10X scRNA-seq

python ~/bin/extract_barcodes_CSBS.py -h

usage: extract_barcodes_CSBS.py [-h] [-r1 READ1] [-r2 READ2] [-b BARCODE] [-s SAMPLE] [-o OUT]

optional arguments:
  -h, --help            show this help message and exit
  -r1 READ1, --read1 READ1
                    reads 1
  -r2 READ2, --read2 READ2
                    reads 2
  -b BARCODE, --barcode BARCODE
                    scRNA barcodes,default generated by cellranger (outs/filtered_feature_bc_matrix/barcodes.tsv.gz)
  -s SAMPLE, --sample SAMPLE
                    sample barcodes, a tab seperated txt file
  -o OUT, --out OUT     output file prefix name

The SAMPLE file has 3 columns, one demo is as below.

HAP1-0h	No.1	GGTTTACT
HAP1-1h	No.2	TTTCATGA
HAP1-2h	No.3	CAGTACTG
HAP1-4h	No.4	TATGATTC
HAP1-6h	No.5	CTAGGTGA
HAP1-8h	No.6	CGCTATGT
HAP1-12h	No.7	ACAGAGGT

This script should be used after cellranger has generated the 'outs/filtered_feature_bc_matrix/barcodes.tsv.gz' file. And this script will generate 2 files: one is OUTFILE which contains the ALL cell-sample mapping info; the other is the filtered cell-sample mapping info (OUTFILE.scRNA_flt). Both files have 4 tab separated columns (cell barcdoe, UMI, sample barcode, count) as below.

AAACGGGTCGGCCGAT        TAGAAATTAA      GCATCTCC        1
AGCATACTCCGCAAGC        TATCGTGCAG      CGCTATGT        1
AGTGTCAGTTACGCGC        GGTGTGGTTG      GGTTTACT        1
TCTGGAAAGAGCCTAG        AATGGGTGTC      AATAATGG        1
CACATTTTCAGGCGAA        TCTTAAGGGG      GCATCTCC        1
AGAGCTTGTTTGACAC        CCGCATATTG      GCATCTCC        3
TTAGGACTCCTGCCAT        AATCGAGGGT      GCATCTCC        1
ACCGTAAAGACAGACC        AAAGGTGTCC      ACAGAGGT        2

extract_barcodes_CASB_multiplex.py for CASB combinatorial indexing 10X scRNA-seq

usage: extract_barcodes_CASB_multiplex.py [-h] [-r1 READ1] [-r2 READ2] [-b BARCODE] [-s SAMPLE] [-o OUT]

optional arguments:
  -h, --help            show this help message and exit
  -r1 READ1, --read1 READ1
                    reads 1
  -r2 READ2, --read2 READ2
                    reads 2
  -b BARCODE, --barcode BARCODE
                    scRNA barcodes,default generated by cellranger (outs/filtered_feature_bc_matrix/barcodes.tsv.gz)
  -s SAMPLE, --sample SAMPLE
                    sample barcodes
  -o OUT, --out OUT     output file prefix name

The SAMPLE file has only 1 column, one demo for a 4x4 combinatorial indexing is as below.

ACAGAGGT
AGTGGAAC
CGCTATGT
CTAGGTGA
GAAACCCT
GCATCTCC
GGTTTACT
GTAATCTT
GTCCGGTC
TATGATTC
TCTTAAAG
TTTCATGA

This script should be used after cellranger has generated the 'outs/filtered_feature_bc_matrix/barcodes.tsv.gz' file. And this script will generate 2 files: one is OUTFILE which contains the ALL cell-sample mapping info; the other is the filtered cell-sample mapping info (OUTFILE.scRNA_flt).

parse4fastq_split_CASB_atac.py for CASB 10X snATAC-seq

python ~/bin/parse4fastq_split_CABS_atac.py -h
usage: parse4fastq_split_CABS_atac.py [-h] [-i1 READ0] [-r1 READ1] [-r2 READ2] [-r3 READ3] [-o OUTFILE] [-p PREFIX]

optional arguments:
  -h, --help            show this help message and exit
  -i1 READ0, --read0 READ0
                    reads 0
  -r1 READ1, --read1 READ1
                    reads 1
  -r2 READ2, --read2 READ2
                    reads 2
  -r3 READ3, --read3 READ3
                    reads 3
  -o OUTFILE, --outfile OUTFILE
                    output file
  -p PREFIX, --prefix PREFIX
                    output fastq prefix

SAMPLE file is not requred in this step for snATAC-seq. This script will generate 5 files: one is OUTFILE which contains cell-sample mapping info; four are CASB-free fastq.gz files for subsequential analysis (e.g., cellranger-atac count).

parse2fastq_CASB_atac_plate.py for plate-based CASB snATAC-seq

SAMPLE file is not requred in this step for snATAC-seq. This script will generate 3 files: one is OUTFILE which contains cell-sample mapping info; two are CASB-free fastq.gz files for subsequential analysis.

python ~/bin/parse2fastq_CABS_atac_plate.py -h
usage: parse2fastq_CABS_atac_plate.py [-h] [-r1 READ1] [-r2 READ2] [-o OUTFILE] [-p PREFIX]

optional arguments:
  -h, --help            show this help message and exit
  -r1 READ1, --read1 READ1
                    reads 1
  -r2 READ2, --read2 READ2
                    reads 2
  -o OUTFILE, --outfile OUTFILE
                    output file
  -p PREFIX, --prefix PREFIX
                    output fastq prefix

Data

The CASB scRNA-seq and snATAC-seq are available at NCBI GEO(GSE153116).

Citation

Fang, L., Li, G., Zhu, Q., Cui, H., Li, Y., Sun, Z., Liang, W., Wei, W., Hu, Y., & Chen, W. (2020). CASB: A concanavalin A-based sample barcoding strategy for single-cell sequencing. BioRxiv. https://doi.org/10.1101/2020.10.15.340844