petrelharp/ftprime_ms

Discussion lacks narrative

Closed this issue · 9 comments

The discussion doesn't really have any narrative flow at the moment, and is pretty obviously a collection of things that we didn't know where to put.

  1. What do we do with the ana-fits paragraph? We clearly need to say something like this, but it's out of place in the discussion at the moment. Where else could it go? Perhaps at the end of the 'Recording the pedigree forwards in time' section?

  2. The parallelisation paragraph is sticking out a bit too. Unless we fill this out with some further discussion about why we'd need to do these large simulations or something, I think we should find a different place for it.

I don't have a concrete idea for the actual narrative of the discussion yet; I thought I'd start the discussion here and see what you guys thought.

Number 1 could to in intro, as it is part of an overview of existing literature.

For number 2, GWAS-sized samples are one application. And, pop-gen is just getting bigger, at least in some systems.

Currently, the introduction (1 pg) does:

  1. set-up: coal theory and fwds simulation
  2. big simulations are still hard
  3. recording pedigree and adding neutral muts later might help
  4. overview of the paper

and the discussion (1 pg) does:

  1. here's what we did
  2. previous work: ana-fits (and I propose merging the ARG bit with this)
  3. advantages to tree sequences: storage, speed, prior history
  4. this is easy to parallelize
  5. possible application to phylogenetics
  6. "nedigree"

"The ARG bit" being this ?

The idea of a tree sequence is closely related to the \emph{ancestral recombination graph},
or {ARG} \citep{griffiths1991two,griffiths1997ancestral},
which also describes the embellished pedigree.
The ARG has been the subject of substantial study
under the assumptions of coalescent
theory~\citep{wiuf1997number,wiuf1999ancestry,marjoram2006coalescent,wilton2015smc}.
However, the properties of the ARG as a computational structure have not
been studied and, despite several efforts to standardise a common
format~\citep{morin2006netgen,mcgill2013graphml}, % TODO check these refs against others in msprime paper
ARGs are rarely used in practise.
In contrast, the algorithmic properties of tree sequence
algorithms have been explored in detail~\citep{kelleher2016efficient},
contributing substantially to the efficiency of the \msprime{} coalescent simulator.

good call

I guess the alternatives are these:

  • in the intro: 3a. to the intro saying anafits and the 08 paper have tried this but are insufficient because X,Y
  • in the discussion: merging the ARG bit into 2 to say "the tree sequences are related to ARGs. people have tried to record ARGs but they don't do Z, W"

I'm fine with either.

Oh and the phrase "much more limited" in discussing the 08 paper would be nicer if a bit more concrete

A similar but much more limited method for
discarding this information also appears in \citet{padhukasahasram2008exploring}.

I like the suggestion of moving prior work to the intro. I think the reason it ended up in the discussion was because we wanted to contrast what they did to what we do, which is easier after the reader knows what we do.

I could write more about how big simulations are important for the parallelization section. But, I think the point is fairly tangential: it's in the category of "things I generally want to say" but aren't actually important for this paper.

So: I think my proposal is to move the 'previous work' to the Intro, as Kevin says, and remove the 'parallelization' bit. Then the discussion would be much more tightly focused. I'll give this a go in a PR.

So: I think my proposal is to move the 'previous work' to the Intro, as Kevin says, and remove the 'parallelization' bit. Then the discussion would be much more tightly focused. I'll give this a go in a PR.

This sounds good.

So: I think my proposal is to move the 'previous work' to the Intro, as Kevin says, and remove the 'parallelization' bit. Then the discussion would be much more tightly focused. I'll give this a go in a P

+1

done; thanks all.