deepomicslab/SpecHLA

full-length or exon typing

bsb2014 opened this issue · 9 comments

SpecHLA publication suggests that full-length typing outperforms exon typing. I am wondering if the reconstructed gene sequences/full-length (-u 0) are better than the reconstructed exon sequences (-u 1). Do the reads from noncoding regions (introns) improve phasing? Thanks

Do I need to care about the message below that popped up during the full-length typing (-u 0)? Thanks

Use of uninitialized value $hash{"HLA_DRB1_1"} in split at /home/src/SpecHLA/script/whole/annoHLA.pl line 318.
Use of uninitialized value $hash{"HLA_DRB1_2"} in split at /home/src/SpecHLA/script/whole/annoHLA.pl line 318.

Hi, the reads from noncoding regions (introns) can provide the linkage information between exons, thereby improving typing performance. And don't worry about the warning message, it has no impact.

The warning message
"Use of uninitialized value $hash{"HLA_DRB1_1"} in split at /home/src/SpecHLA/script/whole/annoHLA.pl line 318.
Use of uninitialized value $hash{"HLA_DRB1_2"} in split at /home/src/SpecHLA/script/whole/annoHLA.pl line 318." often occurred with failure of DRB1 typing. Could you please let me know what the message means? Thanks

Could you also explain what do ‘‘Bowtie,’’ ‘‘Exon,’’ ‘‘Whole.norealign,’’ ‘‘Whole,’’ and ‘‘Whole.SV’’ modes mean? Thanks

I found the answer, but it is not clear to me if Exon=Novoalign + exon? (It would be better if some aligner could replace Novoalign that is not free)

If read binning with Bowtie2 + exon typing +15-20x read coverage + 150bp, how much accuracy for 2-field HLA typing? Thanks

Hi,

  • The warning is caused by the strict requirement of Perl, we have removed the warning in the latest commit.
  • The default parameters are Novoalign + whole + realign + no SV. So, the mode name means its difference with the default parameters. E.g., exon means Novoalign + exon + realign + no SV. realign indicates using the database to link the unphased blocks.
  • We have not performed Bowtie2 + exon typing. But the accuracy of Bowtie2 + whole + 20x typing is roughly 0.8 in simulated data.

Many thanks for your helpful replies. I tested the SpecHLA with Novoalign 4. The Novoalign seems to treat Illumina reads as Sanger (see below). Is it normal? Thanks.

"# Interpreting input files as Sanger FASTQ."