Extract CDS for WGD events
Closed this issue · 2 comments
This tool helped my analysis a lot, Arthur. I have a question to understand the output files. How can I derive the CDS assigned to WGD events from the wgd mix
output tsv? I see the gene families but don't see an obvious way to extract the corresponding CDS pair per row.
Concrete on my data:
content of the GMM mix
output:
content of the ksd
output for gene family 1:
Can I extract the CDS pairs of the ksd
output from the rows in the mix
output? (I might compare the stats like alignment cov, id and length, but is there a more unique way in doing it?)
Let me know if you need more information.
The mixture modeling tools use as data the node-averaged Ks values, which are the Ks values estimated for nodes in the gene family trees. So each Family
-Node
combination (row) in the wgd mix
output corresponds to a bunch of gene pairs in the relevant family that have this node as most recent common ancestor. The associated pairs you can find in the ksd
output. So the way to get pairs for a mixture component (which I guess corresponds with a putative WGD) is to identify the relevant rows of the mixture output and then identify the gene pairs for those Family - Node
combinations. Does that make it somewhat clear?
Thanks a lot, that clarified it. There are quite a few entries in the ksd
output of one of the genomes I look at with empty values in the columns 'node' and 'distance', just to let you know in case this is not intended.