/iCoV19

Intra-host variability in SARS-CoV-2 transcriptomes.

MIT LicenseMIT

iCoV19

DOI:10.1093/nar/gkab1297

Spatio-temporal catalouging of intra-host variability in SARS-CoV-2 transcriptomes to provide insights on the consequence of intra-host Single Nucleotide Variations (iSNVs) on virus evolution. Read the manuscript in Nucleic Acids Research.

Spectrum of iSNVs in samples across populations

Pipeline

Trim adaptor sequences and filter low-quality reads from downstream analysis

$ Trimmomatic-0.39/trimmomatic-0.39.jar PE <Sample_1.fastq> <Sample_2.fastq> <Sample_1_paired.fastq> <Sample_1_unpaired.fastq> <Sample_2_paired.fastq> Sample_2_unpaired.fastq> ILLUMINACLIP:Trimmomatic-0.39/adapters/TruSeq3-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:30 MINLEN:100

Get reads quality metrics using FastQC

$ fastqc <Sample_1_paired.fastq> -o <Output Dir>
$ fastqc <Sample_2_paired.fastq> -o <Output Dir>

Align reads to the human genome (GRCh37)

$ hisat2 -x <GRCh37.primary_assembly.genome> -1 <Sample_1_paired.fastq> -2 <Sample_2_paired.fastq> -S <Sample_Human_Aligned.sam> -p 16 --dta-cufflinks --summary-file <Log File>

Convert human genome aligned SAM to BAM

$ samtools sort -@ 8 <Sample_Human_Aligned.sam> -o <Sample_Human_Aligned.bam> -O BAM

Filter out the unmapped reads

$ samtools view -u -f 4 -F 264 <Sample_Human_Aligned.bam> > <temp1.bam>
$ samtools view -u -f 8 -F 260 <Sample_Human_Aligned.bam> > <temp2.bam>
$ samtools view -u -f 12 -F 256 <Sample_Human_Aligned.bam> > <temp3.bam>
$ samtools merge -u - <temp[123].bam> | samtools sort -n -o <Sample_Human_Unaligned.bam>

Convert unmapped reads to FASTQ

$ bamToFastq -i <Sample_Human_Unaligned.bam> -fq <Sample_Human_Unaligned_1.fastq> -fq2 <Sample_Human_Unaligned_2.fastq>

Align unmapped reads to the SARS-CoV-2 genome (NC_045512.2)

$ bwa mem <NC_045512.2.fa> <Sample_Human_Unaligned_1.fastq> <Sample_Human_Unaligned_2.fastq> > <Sample_SARSCoV2_Aligned.sam>

Convert SARS-CoV-2 genome aligned SAM to BAM

$ samtools sort -@ 8 <Sample_SARSCoV2_Aligned.sam> -o <Sample_SARSCoV2_Aligned.bam> -O BAM

Remove duplicate reads

$ picard.jar MarkDuplicates I=<Sample_SARSCoV2_Aligned.bam> O=<Sample_SARSCoV2_Aligned_deduplicated.bam> M=<Sample_SARSCoV2_Aligned_duplicates.txt> REMOVE_DUPLICATES=true USE_JDK_DEFLATER=true USE_JDK_INFLATER=true

Generate SARS-CoV-2 genome aligned consensus FASTA

$ bcftools mpileup -f <NC_045512.2.fa> <Sample_SARSCoV2_Aligned_deduplicated.bam> | bcftools call -c | vcfutils.pl vcf2fq | seqtk seq -aQ64 -q20 -n N > <Sample.fasta>

Get reads quality metrics using QualiMap

$ qualimap_v2.2.1/qualimap bamqc -bam <Sample_SARSCoV2_Aligned_deduplicated.bam> -outdir <Output Dir>

Get reads quality metrics using Flagstat

$ samtools flagstat <Sample_SARSCoV2_Aligned_deduplicated.bam> > <Sample_SARSCoV2_Aligned.stats>

Get SARS-CoV-2 genome aligned sample coverage

$ genomeCoverageBed -ibam <Sample_SARSCoV2_Aligned_deduplicated.bam> > <Sample_SARSCoV2_Aligned.coverage>

Call iSNVs in SARS-CoV-2 genome aligned reads

$ samtools index <Sample_SARSCoV2_Aligned_deduplicated.bam> <Sample_SARSCoV2_Aligned_deduplicated.bam.bai>
$ python2.7 reditools2.0/src/cineca/reditools.py -f <Sample_SARSCoV2_Aligned_deduplicated.bam> -S -s 0 -os 4 -r <NC_045512.2.fa> -m <homopolymers_NC_045512.2.txt> -c <homopolymers_NC_045512.2.txt> -q 33 -bq 30 -mbp 15 -Mbp 15 -o <Sample_REDItools.txt>

Filter samples and bases to get high-quality iSNVs

Sample qualifying criterion:
 - Error rate < 0.005

iSNV qualifying criteria:
 - Number of minor allele reads ≥ 5
 - Base coverage ≥ 20
 - Minor allele frequency ≥ 0.01