juliema/aTRAM

atram_framer.py summary stats

Closed this issue · 0 comments

Hi Rafe and Julie

Can you add a function like that in atram_stitcher.py to generate summary stats (summary_stats_per_taxon.csv and summary_stats_per_gene.csv)? That was a very useful function, and if stitcher is not run those stats aren't known- stats will be potentially even more useful here for informing hypotheses about ploidy and paralogy.

Summary_stats_per_taxon.csv would include counts of number of genes with X+ long contigs rather than % exons, like:

Taxon,NumGenes_with_at_least_one_contig,Num_genes_with_2+_long_contigs*,Num_genes_with_3+_long_contigs,Num_genes_with_4+_long_contigs

For Summary_stats_per_gene.csv, the most useful output would be:
Locus, Taxon, Number_of_long_contigs*

*We can define long contigs as the longest contig + any contigs that are >70% of the longest contig length (this would be considering the contigs after filtering for unique and low cov by framer)