Input for cartesian plot

Question

Input for cartesian plot

Closed this issue 3 years ago · 2 comments

I haven't run merfin yet for the illumina data I have yet, but wasn't entirely clear on the usage of the cartesian plot scripts. The input for cartesian_plot.R is the output of simplify_dump.sh, and the input for that should be $1=illumina.dump and $2=hifi.dump?

Answer 1 · 2021-03-31T09:50:42.000Z

I tried it anyway with with cut ... $illum.dump | paste $hifi.dump - | ..., so the axes may be flipped from the labels.

This was using the merged hap1 + hap2 fasta file with hifi and short reads, but the short reads had fairly lower coverage (~16x).

There is an approximate R of -0.03, but the top three values below accounted for ~ 61% of all points, and so probably bias that heavily.

3185301570      0.00    0.00
348440924       0.00    -1.00
143715045       0.00    1.00

It is interesting that the two axes are pretty heavily populated, but not the diagonal. I guess this may demonstrates that kmer bias for hifi is pretty independent of kmer bias for short reads?

Answer 2 · 2021-07-14T03:51:44.000Z

Hi @ASLeonard , just saw this now. Sorry for the silence!

Yes, as far as I can tell, the k-mer bias was independent, so to speak.
The different error modes in HiFi and Illumina seem to be the cause of this;
we found homopolymer and microsatellite contraction in HiFi reads and the long-known GC biases in Illumina reads as shown here in T2T-CHM13.