heche-psb/wgd

Suggestions on the result

Closed this issue · 14 comments

Thank you for the convenient tool!

I have successfully performed the analysis, however, I am a bit confused about the result interpretation and final presentation of the result in a standard way.

I followed these commands

  1. wgd dmd --globalmrbh SPECIES_cds Zea_mays_cds Amborella_trichopoda_cds Musa_acuminata_cds --cds -n 90
  2. wgd ksd wgd_dmd/global_MRBH.tsv --extraparanomeks ../wgd_ksd/SPECIES_cds.tsv.ks.tsv -sp speciestree.nw -o wgd_globalmrbh_ks --spair "SPECIES_cds;Musa_acuminata_cds" --spair "SPECIES_cds;Amborella_trichopoda_cds" --spair "SPECIES_cds;Zea_mays_cds" --spair "SPECIES_cds;SPECIES_cds" --reweight --plotkde
  3. wgd viz -d wgd_globalmrbh_ks/global_MRBH.tsv.ks.tsv -sp speciestree.nw --extraparanomeks ../wgd_ksd/SPECIES_cds.tsv.ks.tsv --spair "SPECIES_cds;Musa_acuminata_cds" --spair "SPECIES_cds;Amborella_trichopoda_cds" --spair "SPECIES_cds;Zea_mays_cds" --spair "SPECIES_cds;SPECIES_cds" --reweight --plotkde

the results from 2nd and 3rd are attached

I wanted to know why I am not getting the SPECIES_CDS paranome in the 2nd figure (SPECIES_cds_Corrected.ksd.averaged.pdf)? and can we use this 2nd figure to infer that SPECIES_CDS and Musa_acuminata_cds shared the same WGD event which happened after the divergence of SPECIES_CDS with Zea mays and Amborella?

Thanks
SPECIES_cds_Corrected.ksd.weighted.pdf
SPECIES_cds_Corrected.ksd.averaged.pdf

I guess you didn't get SPECIES_CDS paranome in the 2nd figure is because the safe gene id in the Ks file is not in accordance with the file name "SPECIES_CDS". Could you make sure that for instance, your file name is "SPECIES_CDS" and the safe gene id in your Ks file is like "SPECIES_CDS_0/1/2/.."

Thank you, I will check that. However, by looking at the figure I conclude that the Y-axis is not scalable. As the paranome of SPECIES_CDS has high homologous pair value as compared to species pair orthologs. Isnt it?
Further, by means of "safe gene ids" you meant my gene ids should have the initials as the name of the file? like for Musa_acuminata_cds file the gene ids should be >Musa_acuminata_cds_pt00012 ?

I have already changed the y limit to be 1.1*max height of histogram in this repository. But it's hard to tell whether this change is better than the original one or not because it might truncate the fitted curve. You may give a try. No, your original gene ids can be any shape. The safe gene ids are inherently produced by the program itself. Issues might emerge when you infer ksd using file name "SPECIES_CDS" and then do other analysis using a new file name "SPECIES_CDS1/2" or etc. My point is to highlight that it's better to keep your cds file name always unchanged in all the analysis.

Actually I have tried to limit the Y-axis, but it truncated the plot. Further, regarding the above-mentioned query, can we use this 2nd figure to infer that SPECIES_CDS and Musa_acuminata_cds shared the same WGD event that happened after the divergence of SPECIES_CDS with Zea mays and Amborella?

Both the node weighted and averaged Ks results can be used to shed lights on the placement of WGD. The 2nd figure seems to have no infomation about the WGD peak. If the WGD peak is indisputably older than 1.17, you can for sure claim that SPECIES_CDS and Musa_acuminata_cds shared one WGD.

Thank you for the clarification, I have observed through the plotting the ks distribution of paranome of SPECIES_CDS and its ks value is 0.5. which is younger to SPECIES_CDS and Musa_acuminata_cds ortholog pair. does it mean that SPECIES_CDS has suffered a WGD which is not shared by its closely related Musa_acuminata_cds?

Thanks

Yes.

is there a way to convert homologous pairs plot to density plot?

Yes, but I'm not sure if many users really need it or not.

Could you please add it as an option, beacuse in the case of my species the homologous pairs scale is way too high, which might also be the case for others also.

Thanks

Hi
any update on the density plot?

I implemented two more options in wgd viz, --adjustortho (default False) and --adjustfactor (default 0.5), with which you can adjust the height of orthologous Ks distribution relatively to the height of paralogous Ks distribution according to the ratio of respective highest bar. Simply transforming into density can not aviod the issue of equal scalablility because there will be very high dense regions going far beyond others unless standardization to a specific scale, for instance whole paranome, which in turn will be no difference to just implementing the adjustment above. Thus, I prefer to let users adapt the relative height on their own with the two options. Another minor new option --okalpha (default 0.5) could set the opacity of orthologous Ks distribution in mixed plot.

Thanks for the update
but on running with these parameters its showing this error
ValueError: array must not contain infs or NaNs

Can you share the full command, log and perhaps data?