wdecoster/NanoPlot

Confusion regarding arrow formatted files

TBradley27 opened this issue · 6 comments

Hello,

In the documentation for cramino, it states that a file in arrow format can be produced which can then be used with NanoPlot.

However, the documentation for NanoPlot does not describe how this arrow file can be used

Hi,

Oh yes, I see that it is poorly described. Thanks for letting me know. The arrow files are, confusingly enough, the same as feather, but I have now specified that in the documentation.
So you can use NanoPlot with --arrow to specify the arrow input files.

Best,
Wouter

Many thanks for this!

That is very helpful.

Just a very quick minor note, it would also be helpful if there was a column for arrow/feather formatted data for the table in the 'plots generated' section of the README

Many thanks again!
Thomas

Hmm, no, that wouldn't be accurate. An arrow format is essentially the dataframe of features, and different plots can be generated depending on how the file was created.

Thanks, that makes sense

I generated an arrow formatted file from a sorted bam file. When I ran the arrow formatted file through NanoPlot using --feather I was returned a report that didn't include plots relating to read quality scores or to mapping quality scores - which is different behaviour to when I passed the sorted bam file directly to NanoPlot using --bam

Yes, that is as expected. In my opinion, read quality scores are less informative than sequence identity scores. Therefore, cramino doesn't extract/calculate them, and they're not in the arrow file. It is a matter of being efficient. If you care a lot about mapping quality, you could also use https://github.com/wdecoster/make_arrow

Thanks for that, I will check it out. As the original issue has been fixed, I am happy for this issue to be closed