HiFi reads: Is it better to perform assembly before taxonomic and functional identification?
Closed this issue · 4 comments
Hello
I am a beginner and I have a question about metagenomic analysis using HiFi PacBio long reads. In short read metagenomics I have seen in some papers who suggest doing taxonomic and functional profiling after assembly, to increase the precision. I was wondering if with long reads we can directly use the raw reads for profiling or it is still better to perform assembly first.
Thank you
For the profiling, I believe the HiFi reads are accurate and long enough to be used directly. I think short read studies often profile after assembly for better gene recovery, and for HiFi, most reads already span full length of genes or operons. I haven't done functional analysis. My first try would be to assemble and analyze the high quality MAGs, then map and extract the reads not represented by them & check these reads separately.
For getting MAGs, assembled with all reads. Read partition based on profiling then assemble will probably do worse. In non-human gut samples, a substantial portion of reads will have no taxonomy annotation anyway (e.g. SRR14289618 has 58% reads marked as unidentified on SRA).
For the profiling, I believe the HiFi reads are accurate and long enough to be used directly. I think short read studies often profile after assembly for better gene recovery, and for HiFi, most reads already span full length of genes or operons. I haven't done functional analysis. My first try would be to assemble and analyze the high quality MAGs, then map and extract the reads not represented by them & check these reads separately.
For getting MAGs, assembled with all reads. Read partition based on profiling then assemble will probably do worse. In non-human gut samples, a substantial portion of reads will have no taxonomy annotation anyway (e.g. SRR14289618 has 58% reads marked as unidentified on SRA).
Thanks for your response. I understand that it is ok to directly use the raw HiFi reads for profiling. I was wondering if doing assembly of HiFi reads before functional profiling will increase the precision of profiling (like what we do for short reads) or it doesn't have any effect? I mean if I want to do both functional profiling and reconstructing MAGs, is it better to perform assembly and then functional profiling/MAGs? or it is better to do them separately?
The HiFi reads should have the complete kmer- and gene-level information. If your profiling do not need long range info (e.g. whether two operons belong to the same species, structural variations) or variant calling without external reference genomes, maybe there is no need to assemble? I could be wrong...