Discussion lacks narrative

The discussion doesn't really have any narrative flow at the moment, and is pretty obviously a collection of things that we didn't know where to put.

What do we do with the ana-fits paragraph? We clearly need to say something like this, but it's out of place in the discussion at the moment. Where else could it go? Perhaps at the end of the 'Recording the pedigree forwards in time' section?
The parallelisation paragraph is sticking out a bit too. Unless we fill this out with some further discussion about why we'd need to do these large simulations or something, I think we should find a different place for it.

I don't have a concrete idea for the actual narrative of the discussion yet; I thought I'd start the discussion here and see what you guys thought.

Number 1 could to in intro, as it is part of an overview of existing literature.

For number 2, GWAS-sized samples are one application. And, pop-gen is just getting bigger, at least in some systems.

Currently, the introduction (1 pg) does:

set-up: coal theory and fwds simulation
big simulations are still hard
recording pedigree and adding neutral muts later might help
overview of the paper

and the discussion (1 pg) does:

here's what we did
previous work: ana-fits (and I propose merging the ARG bit with this)
advantages to tree sequences: storage, speed, prior history
this is easy to parallelize
possible application to phylogenetics
"nedigree"

"The ARG bit" being this ?

ftprime_ms/forwards_paper.tex

Lines 322 to 335 in 60b31af

    
           The idea of a tree sequence is closely related to the \emph{ancestral recombination graph}, 
        
           or {ARG} \citep{griffiths1991two,griffiths1997ancestral}, 
        
           which also describes the embellished pedigree. 
        
           The ARG has been the subject of substantial study 
        
           under the assumptions of coalescent 
        
           theory~\citep{wiuf1997number,wiuf1999ancestry,marjoram2006coalescent,wilton2015smc}. 
        
           However, the properties of the ARG as a computational structure have not 
        
           been studied and, despite several efforts to standardise a common 
        
           format~\citep{morin2006netgen,mcgill2013graphml}, % TODO check these refs against others in msprime paper 
        
           ARGs are rarely used in practise. 
        
           In contrast, the algorithmic properties of tree sequence 
        
           algorithms have been explored in detail~\citep{kelleher2016efficient}, 
        
           contributing substantially to the efficiency of the \msprime{} coalescent simulator.

good call

I guess the alternatives are these:

in the intro: 3a. to the intro saying anafits and the 08 paper have tried this but are insufficient because X,Y
in the discussion: merging the ARG bit into 2 to say "the tree sequences are related to ARGs. people have tried to record ARGs but they don't do Z, W"

I'm fine with either.

Oh and the phrase "much more limited" in discussing the 08 paper would be nicer if a bit more concrete

ftprime_ms/forwards_paper.tex

Lines 962 to 963 in 60b31af

    
           A similar but much more limited method for 
        
           discarding this information also appears in \citet{padhukasahasram2008exploring}.

I like the suggestion of moving prior work to the intro. I think the reason it ended up in the discussion was because we wanted to contrast what they did to what we do, which is easier after the reader knows what we do.

I could write more about how big simulations are important for the parallelization section. But, I think the point is fairly tangential: it's in the category of "things I generally want to say" but aren't actually important for this paper.

So: I think my proposal is to move the 'previous work' to the Intro, as Kevin says, and remove the 'parallelization' bit. Then the discussion would be much more tightly focused. I'll give this a go in a PR.

So: I think my proposal is to move the 'previous work' to the Intro, as Kevin says, and remove the 'parallelization' bit. Then the discussion would be much more tightly focused. I'll give this a go in a PR.

This sounds good.

So: I think my proposal is to move the 'previous work' to the Intro, as Kevin says, and remove the 'parallelization' bit. Then the discussion would be much more tightly focused. I'll give this a go in a P

+1

done; thanks all.

	The idea of a tree sequence is closely related to the \emph{ancestral recombination graph},
	or {ARG} \citep{griffiths1991two,griffiths1997ancestral},
	which also describes the embellished pedigree.
	The ARG has been the subject of substantial study
	under the assumptions of coalescent
	theory~\citep{wiuf1997number,wiuf1999ancestry,marjoram2006coalescent,wilton2015smc}.
	However, the properties of the ARG as a computational structure have not
	been studied and, despite several efforts to standardise a common
	format~\citep{morin2006netgen,mcgill2013graphml}, % TODO check these refs against others in msprime paper
	ARGs are rarely used in practise.
	In contrast, the algorithmic properties of tree sequence
	algorithms have been explored in detail~\citep{kelleher2016efficient},
	contributing substantially to the efficiency of the \msprime{} coalescent simulator.

	A similar but much more limited method for
	discarding this information also appears in \citet{padhukasahasram2008exploring}.