arzwa/wgd

Extract CDS for WGD events

Closed this issue · 2 comments

This tool helped my analysis a lot, Arthur. I have a question to understand the output files. How can I derive the CDS assigned to WGD events from the wgd mix output tsv? I see the gene families but don't see an obvious way to extract the corresponding CDS pair per row.

Concrete on my data:
content of the GMM mix output:
image
content of the ksd output for gene family 1:
image
Can I extract the CDS pairs of the ksd output from the rows in the mix output? (I might compare the stats like alignment cov, id and length, but is there a more unique way in doing it?)

Let me know if you need more information.

arzwa commented

The mixture modeling tools use as data the node-averaged Ks values, which are the Ks values estimated for nodes in the gene family trees. So each Family-Node combination (row) in the wgd mix output corresponds to a bunch of gene pairs in the relevant family that have this node as most recent common ancestor. The associated pairs you can find in the ksd output. So the way to get pairs for a mixture component (which I guess corresponds with a putative WGD) is to identify the relevant rows of the mixture output and then identify the gene pairs for those Family - Node combinations. Does that make it somewhat clear?

Thanks a lot, that clarified it. There are quite a few entries in the ksd output of one of the genomes I look at with empty values in the columns 'node' and 'distance', just to let you know in case this is not intended.