Four python scripts are available for cell barcode and sample barcode extraction. fastq.gz format is compatible.
- python2.7
- pysam 0.16
python ~/bin/extract_barcodes_CSBS.py -h
usage: extract_barcodes_CSBS.py [-h] [-r1 READ1] [-r2 READ2] [-b BARCODE] [-s SAMPLE] [-o OUT]
optional arguments:
-h, --help show this help message and exit
-r1 READ1, --read1 READ1
reads 1
-r2 READ2, --read2 READ2
reads 2
-b BARCODE, --barcode BARCODE
scRNA barcodes,default generated by cellranger (outs/filtered_feature_bc_matrix/barcodes.tsv.gz)
-s SAMPLE, --sample SAMPLE
sample barcodes, a tab seperated txt file
-o OUT, --out OUT output file prefix name
The SAMPLE file has 3 columns, one demo is as below.
HAP1-0h No.1 GGTTTACT
HAP1-1h No.2 TTTCATGA
HAP1-2h No.3 CAGTACTG
HAP1-4h No.4 TATGATTC
HAP1-6h No.5 CTAGGTGA
HAP1-8h No.6 CGCTATGT
HAP1-12h No.7 ACAGAGGT
This script should be used after cellranger has generated the 'outs/filtered_feature_bc_matrix/barcodes.tsv.gz' file. And this script will generate 2 files: one is OUTFILE which contains the ALL cell-sample mapping info; the other is the filtered cell-sample mapping info (OUTFILE.scRNA_flt). Both files have 4 tab separated columns (cell barcdoe, UMI, sample barcode, count) as below.
AAACGGGTCGGCCGAT TAGAAATTAA GCATCTCC 1
AGCATACTCCGCAAGC TATCGTGCAG CGCTATGT 1
AGTGTCAGTTACGCGC GGTGTGGTTG GGTTTACT 1
TCTGGAAAGAGCCTAG AATGGGTGTC AATAATGG 1
CACATTTTCAGGCGAA TCTTAAGGGG GCATCTCC 1
AGAGCTTGTTTGACAC CCGCATATTG GCATCTCC 3
TTAGGACTCCTGCCAT AATCGAGGGT GCATCTCC 1
ACCGTAAAGACAGACC AAAGGTGTCC ACAGAGGT 2
usage: extract_barcodes_CASB_multiplex.py [-h] [-r1 READ1] [-r2 READ2] [-b BARCODE] [-s SAMPLE] [-o OUT]
optional arguments:
-h, --help show this help message and exit
-r1 READ1, --read1 READ1
reads 1
-r2 READ2, --read2 READ2
reads 2
-b BARCODE, --barcode BARCODE
scRNA barcodes,default generated by cellranger (outs/filtered_feature_bc_matrix/barcodes.tsv.gz)
-s SAMPLE, --sample SAMPLE
sample barcodes
-o OUT, --out OUT output file prefix name
The SAMPLE file has only 1 column, one demo for a 4x4 combinatorial indexing is as below.
ACAGAGGT
AGTGGAAC
CGCTATGT
CTAGGTGA
GAAACCCT
GCATCTCC
GGTTTACT
GTAATCTT
GTCCGGTC
TATGATTC
TCTTAAAG
TTTCATGA
This script should be used after cellranger has generated the 'outs/filtered_feature_bc_matrix/barcodes.tsv.gz' file. And this script will generate 2 files: one is OUTFILE which contains the ALL cell-sample mapping info; the other is the filtered cell-sample mapping info (OUTFILE.scRNA_flt).
python ~/bin/parse4fastq_split_CABS_atac.py -h
usage: parse4fastq_split_CABS_atac.py [-h] [-i1 READ0] [-r1 READ1] [-r2 READ2] [-r3 READ3] [-o OUTFILE] [-p PREFIX]
optional arguments:
-h, --help show this help message and exit
-i1 READ0, --read0 READ0
reads 0
-r1 READ1, --read1 READ1
reads 1
-r2 READ2, --read2 READ2
reads 2
-r3 READ3, --read3 READ3
reads 3
-o OUTFILE, --outfile OUTFILE
output file
-p PREFIX, --prefix PREFIX
output fastq prefix
SAMPLE file is not requred in this step for snATAC-seq. This script will generate 5 files: one is OUTFILE which contains cell-sample mapping info; four are CASB-free fastq.gz files for subsequential analysis (e.g., cellranger-atac count).
SAMPLE file is not requred in this step for snATAC-seq. This script will generate 3 files: one is OUTFILE which contains cell-sample mapping info; two are CASB-free fastq.gz files for subsequential analysis.
python ~/bin/parse2fastq_CABS_atac_plate.py -h
usage: parse2fastq_CABS_atac_plate.py [-h] [-r1 READ1] [-r2 READ2] [-o OUTFILE] [-p PREFIX]
optional arguments:
-h, --help show this help message and exit
-r1 READ1, --read1 READ1
reads 1
-r2 READ2, --read2 READ2
reads 2
-o OUTFILE, --outfile OUTFILE
output file
-p PREFIX, --prefix PREFIX
output fastq prefix
The CASB scRNA-seq and snATAC-seq are available at NCBI GEO(GSE153116).
Fang, L., Li, G., Zhu, Q., Cui, H., Li, Y., Sun, Z., Liang, W., Wei, W., Hu, Y., & Chen, W. (2020). CASB: A concanavalin A-based sample barcoding strategy for single-cell sequencing. BioRxiv. https://doi.org/10.1101/2020.10.15.340844