single-cell-genetics/cellsnp-lite

Generating Single VCF file for Each scRNA-Seq Sample

Opened this issue · 3 comments

Hello,
I am relatively new in variant calling using scRNA-Seq. I have 17 datasets from 17 patients. I want to call the variants for each patient. I only need the list of variants in each sample.
Can I use cellranger output bam file "possorted_genome_bam.bam" as pseudobulk as suggested in manual:

# 10x scRNA-seq sample in a pseudo-bulk manner cellsnp-lite -s $BAM -O $OUT_DIR -p 10 --minMAF 0.1 --minCOUNT 20 --cellTAG None --UMItag UB --gzip

Thank you in advance

hxj5 commented

Hi, thanks for the question.

Yes, you can call the variants in a pseudobulk manner on the cellranger BAM file. However, it is recommended to subset the BAM file first, to filter the reads from invalid cells with poor sequencing qualities. I quote the manual:

"To genotype 10x scRNA-seq data in a pseudo-bulk manner with cellsnp-lite mode 1b (or mode 2b), it is recommended to subset the BAM file first, by extracting the alignment records with valid cell barcodes only. Here the valid cell barcodes are typically the cell barcodes stored in the cellranger output folder filtered_gene_bc_matrices, which are the cells with high-quality sequencing data."

Hi, thanks for the question.

Yes, you can call the variants in a pseudobulk manner on the cellranger BAM file. However, it is recommended to subset the BAM file first, to filter the reads from invalid cells with poor sequencing qualities. I quote the manual:

"To genotype 10x scRNA-seq data in a pseudo-bulk manner with cellsnp-lite mode 1b (or mode 2b), it is recommended to subset the BAM file first, by extracting the alignment records with valid cell barcodes only. Here the valid cell barcodes are typically the cell barcodes stored in the cellranger output folder filtered_gene_bc_matrices, which are the cells with high-quality sequencing data."

Thank you for your reply. I will filter the barcodes.
I have another question, I want to use a reference fasta (with faidx) with cellsnp-lite, is fastq file enough by itself or a specific version is required? I will use the same fasta that I used as CellRanger reference but because of the "--refseq" option I wanted to be sure.

Thank you

hxj5 commented

The FASTA file the same as cellranger reference is good for --refseq option. In general, the genomic build version of the FASTA file should be the same as the BAM file, e.g., both are hg38 or hg19.