Confusion regarding arrow formatted files
TBradley27 opened this issue · 6 comments
Hello,
In the documentation for cramino, it states that a file in arrow format can be produced which can then be used with NanoPlot.
However, the documentation for NanoPlot does not describe how this arrow file can be used
Hi,
Oh yes, I see that it is poorly described. Thanks for letting me know. The arrow files are, confusingly enough, the same as feather, but I have now specified that in the documentation.
So you can use NanoPlot with --arrow
to specify the arrow input files.
Best,
Wouter
Many thanks for this!
That is very helpful.
Just a very quick minor note, it would also be helpful if there was a column for arrow/feather formatted data for the table in the 'plots generated' section of the README
Many thanks again!
Thomas
Hmm, no, that wouldn't be accurate. An arrow format is essentially the dataframe of features, and different plots can be generated depending on how the file was created.
Thanks, that makes sense
I generated an arrow formatted file from a sorted bam file. When I ran the arrow formatted file through NanoPlot using --feather
I was returned a report that didn't include plots relating to read quality scores or to mapping quality scores - which is different behaviour to when I passed the sorted bam file directly to NanoPlot using --bam
Yes, that is as expected. In my opinion, read quality scores are less informative than sequence identity scores. Therefore, cramino doesn't extract/calculate them, and they're not in the arrow file. It is a matter of being efficient. If you care a lot about mapping quality, you could also use https://github.com/wdecoster/make_arrow
Thanks for that, I will check it out. As the original issue has been fixed, I am happy for this issue to be closed