gtonkinhill/panstripe

Questions about the core estimate

Closed this issue · 1 comments

Dear author,

I want to express my gratitude to you for developing such a great tool for downstream pan-genome analysis. I have a few questions that I would like to consult with you:

  1. In the output (like fit$summary), does "core estimate" refer to the rate of gene gain and loss, i.e., the slope in the plot generated by the "plot_pangenome_curve" function? If so, why is the value of "core estimate" in the plot generated by the "plot_pangenome_params" function different from the original value? From my observation of your source code, it seems that you did some data transformation. Which value do you recommend to use (e.g., for reporting in the paper)?
  2. By running the "plot_pangenome_curve" function, I found that the range of the number of gene gain and loss events seems to be too small (1-4) for my dataset (staphylococcus aureus genomes). Do you think this is possible (The corresponding figure is below.)?

Thank you very much for your time and help. I look forward to hearing back from you.

Best regards,
Tonny
image

Hi,

  1. At the moment the fit$summary table reports the parameter estimates directly from the GLM model. The default Tweedie model assumes a log-linear relationship and thus the parameters are reported in the log scale. To make this more interpretable in the figures we currently take the exponential so that a one-unit increase in the core branch length relates to the expected change in the number of gene gain and loss events.

  2. This is a bit trickier to answer. The core parameter focuses on the gene gain and loss rates you might expect at internal branches and is less affected by rare (singleton) genes that appear only at the terminal branches. The changes observed at terminal branches are incorporated into the tip parameter which can also account for differences in annotation error rates between datasets.

In general, I would advise using the plot_pangenome_params function instead as this is generally more informative.