-m leads to 0 reads mapped in output
kgaonkar6 opened this issue · 3 comments
Hello,
I ran my tumor and normal bam files through telseq in per-read-group and with -m mode to compare the outputs. And it looks like I have a issue in the -m mode and read-group based analysis
-
In the -m mode I get only 0 or 1 reads in the Mapped column which seems odd since I do see reads mapped in the read-group mode analysis.
-
In the per read-group mode I get UNKNOWN in LENGTH_ESTIMATE columns. From previous issues it looks like it's because for that read-group I didn't get any reads for TEL5 and higher. Does that seem odd since there should be random reads aligned with TEL repeats in interstitial telomeric sequences that show up here?
Please find the corresponding output files with this issue:
Hi there,
Thanks for your interest in telseq and sending through some example data.
-
This looks like a bug. I would recommend running telseq without the "-m". This way all the results are written out and merging can be done afterwards.
-
Yes, when no telomeric reads in library the estimate would be unknown. It does look odd to me that there is no reads with 5 telomeric repeats in libraries that have 10s of millions reads. It'd be worth checking if any preprocessing applied to those libraries. The variation in duplication rate look comparable within each sample, with a few exceptions. I guess you could take a weighted average of the length estimates using only those read groups that have a similar duplication rate.
Zhihao
Thank you for the detailed reply.
I was wondering if the duplication issue with the few read groups is unique to our samples or is this generally observed in analyzing for telomere length estimation. And can you also elaborate how would you define comparable duplication rate here?
Thanks,
Krutika
Sure. Those libraries with 0 duplication rate and >25% duplicate rate look like outliers to me. They could be fine for other analysis but are problematic for the telseq approach. I think you could take libraries with a duplication rate within the interquantile range across all libraries, and compute a weighted average just using those.
Zhihao