awohns/unified_genealogy_paper

List of to-dos

Opened this issue · 2 comments

Just a place to collect things to improved about the current paper. They can be ticked off / deleted when done, or if we decide any are bad ideas. Please add any below, or in follow-up comments

  • Should we mention the possibility of using tsdate for e.g. viral sequences with historical samples (very relevant)? We certainly don't want to imply that it's only for human data. Allied to this, should we use the term "historical samples" rather than "ancient samples"? We could say "historical samples (for example, from ancient DNA)" on the first mention.
  • Where possible, supplementary plots should use blue points for tsinfer/tsdate, red for GEVA, and green for Relate. The scaling plot should conform to this for the line colours (perhaps we should use blue dotted for tsinfer, blue dashed for tsdate, and blue solid for tsinfer+tsdate, compared to solid red for GEVA and solid green for Relate) then in the supplementary plots we can say "colours as per fig S2". Figs where this is relevant include tgp_muts_frequency_wbackmutations and supp_figure9_mutation_average_age.
  • We need to check through the text so that we aren't implying that we are actively aiming for inference of ancestral positions in the maps. This will be a subtle thing, and needs careful wording.
  • The intensity of the coast outlines on the maps should be reduced substantially, or even omitted altogether, as the black lines obscure the ancestor locations.
  • The colour scale on the line plot (map b) of ancestor locations needs rescaling so that it is clear that the lightest (central) origin of the lines is within Africa, and so that the blue-purple is not simply at the tips of the lines.
  • Should we have an actual distribution plotted for one slice through the "prior evaluation" plot? I find it convincingly normal-looking when the actual conditional times are plotted on a log scale, and this is a strong point in favour of using the log-normal.
  • Plot titles on the ancient constraints plots need to be changed so that they are obviously "Tsdate", "Relate", and "GEVA". They currently repeat "Ancient Derived Variant Lower Bound", which is a bit tedious. Perhaps that text could be used along the diagonal line?
  • Add a "running mean" for age of alleles in the ancient constraints plots

Items which would be nice to have (not 100% necessary)

  • TMRCAs in the past 10,000 years (similar to clustermap already produced)
  • 1kg chr20 date estimates using gamma instead of lognormal
  • Emphasise that figure 4 b. is not a map of the routes of human migration, but an average of ancestor positions. So there might be a huge variance in some cases.