*BEAST issues

Question

*BEAST issues

rvosa opened this issue 10 years ago · 5 comments

There are some issues with the *BEAST implementation:

it appears that we actually never have clade alignments with multiple sequences per species. This defeats the purpose of the multispecies, multilocus coalescent. A possible way to address this is to add, e.g. as an option, that during smrt clademerge for each merged sequence we look for up to X similar sequences (i.e. same species, same seed gi) and profile align those against the existing clade alignment.
there should be an option smrt cladeinfer --append, which appends log files and tree files to a previously existing run, so that users can raise ESS incrementally. Two things to keep in mind here: i) check to make sure that the translation tables in tree files are identically sorted so that the tree descriptions can be appended verbatim to the existing nexus file ii) burnin is properly removed before appending.

Answer 1 · 2015-05-28T10:53:32.000Z

We set sortTranslationTable="true" in the BEAST xml, which means we should be able to append tree descriptions verbatim.

Answer 2 · 2015-05-28T20:41:10.000Z

--append makes the next available suffix for the *.nex and *.log files (e.g. *.nex.1). At the end of the run the additional tree results are appended to the stem file sans --burnin.

Answer 3 · 2015-05-29T21:19:38.000Z

The problem with including more haplotypes in *BEAST is that we have a separate CLADE_MAX_DISTANCE setting, so we can't re-use the results from smrt orthologize, because these are filtered using BACKBONE_MAX_DISTANCE. We basically need to redo the orthologize, which is why most of its functionality has been moved to SequenceGetter, as of e1f4973

Answer 4 · 2015-05-31T16:33:24.000Z

Perhaps the easiest way to do this is to implement something like smrt clademerge --enrich, which adds up to CLADE_HAPLOTYPES (e.g. 3) similar sequences to each species while verifying that CLADE_MAX_DISTANCE is not exceeded.

Answer 5 · 2015-06-03T12:27:19.000Z

As of e5be3f9 this now more or less works.