reads correct with different depth but same results
tianjio opened this issue · 6 comments
Hi,
After applying the same correction parameters (for data with lengths greater than 100 kb), it is observed that the corrected output data of the 68X ONT dataset and the 45X ONT dataset show a reduction in coverage to approximately 22X~23X.I don't quite understand the reason for this result, could you please help me to answer it ?
The default correction targets 40x corrected coverage based on the genome size you provided which always ends up a bit lower due to trimming during correction. The report output by canu will give details on the read quality/read overlaps/and expected correction length. If you want to target more data for correction increase the corOutCoverage parameter from the default of 40 to whatever you would like to target.
-- original original
-- raw reads raw reads
-- category w/overlaps w/o/overlaps
-- -------------------- ------------- -------------
-- Number of Reads 1290708 32623
-- Number of Bases 199546206431 3526326367
-- Coverage 62.358 1.102
-- Median 137243 104854
-- Mean 154602 108093
-- N50 153475 105431
-- Minimum 100000 0
-- Maximum 896875 266020
--
-- --------corrected--------- ----------rescued----------
-- evidence expected expected
-- category reads raw corrected raw corrected
-- -------------------- ------------- ------------- ------------- ------------- -------------
-- Number of Reads 1314802 680694 680694 2804 2804
-- Number of Bases 202115581569 128633150876 128000057219 325824003 321230356
-- Coverage 63.161 40.198 40.000 0.102 0.100
-- Median 136305 171657 170633 117679 114659
-- Mean 153723 188973 188043 116199 114561
-- N50 152569 185228 184456 119088 116162
-- Minimum 100000 131592 131591 100021 100008
-- Maximum 896875 851143 851130 229161 131016
--
-- --------uncorrected--------
-- expected
-- category raw corrected
-- -------------------- ------------- -------------
-- Number of Reads 639833 639833
-- Number of Bases 74113557919 54806947791
-- Coverage 23.160 17.127
-- Median 114557 107468
-- Mean 115832 85658
-- N50 115908 114395
-- Minimum 0 0
-- Maximum 896875 896874
--
-- Maximum Memory 9872844314
My correct report is the above content. For example, if I increase the corOutCoverage parameter to 63, will this parameter increase the original read coverage of the correction to 63?
Yes, it looks like the reads have overlaps they are just shorter and being excluded from correction. There's no harm in setting coverage higher than your input (like 100) as it will obviously stop adding reads once you hit the total set.
Thanks.According to your suggestion, I increased the depth of reads for correction. It still needs some time to run and there are no results yet.
I have another question . I am testing the correction results of ONT haplotype data of homologous polyploid by different software. For canu software, could you give me some suggested parameters? The accuracy rate of ONT data is 96% or above.
Correction will always collapse similar haplotypes, typically below 1-2% divergence. I wouldn't recommend correction if your final goal is a diploid assembly and you have divergence below that. Given the high accuracy of the data, I'd use uncorrected ONT data instead following the quick start info here: https://canu.readthedocs.io/en/latest/quick-start.html#assembling-with-multiple-technologies-and-multiple-files which still does some much more conservative correction of the ONT data that would be haplotype preserving. Unfortunately, the corrected reads in this case are in hompolymer-compressed space.
Thanks.