bbdecompose minimum number of Markers
cromanpa94 opened this issue · 7 comments
Dear all,
I'm having troubles when runningsmrt bbdecompose
. My backbone tree has 150 tips, and bbdecompose
recognizes three clades. Unfortunately, there's no output file for any of this clades:
INFO: Species in clade0:
INFO: Species in clade1:
INFO: Species in clade2:
INFO: DONE, results written into working directory .
I have changed the parameters for that less strict using CLADE_MAX_DISTANCE
and CLADE_MIN_DENSITY
values but the problem continues:
...INFO: Filtering clade for clade 2 alignments, min density : 1, max distance : 0.8
WARN: Could not find sufficient data for species in clade 2. Skipping clade with taxa ...
Finally, the only solution to this issue that I have found was in the following link, but they suggest to change CLADE_MIN_MARKERS
default to 1. Is it the right solution? Where can I find this option? I can't find it in smrt-config.
https://groups.google.com/forum/#!msg/supersmart-users/roGoPTl-9YY/BHjVaNW6GQAJ
Thanks in advance!
Cristian
Dear Christian,
Which values for CLADE_MIN_DENSITY did you chose? In your output, the value of 1 is high and means that the markers have to be shared by all species in the clade.
The default is 0.2, so I wonder if you will get species for the clades for that value?
Best Regards,
Hannes
On 15 Jun 2016, at 19:49, Cristian Román notifications@github.com wrote:
Dear all,
I'm having troubles when runningsmrt bbdecompose. My backbone tree has 150 tips, and bbdecompose recognizes three clades. Unfortunately, there's no output file for any of this clades:
INFO: Species in clade0:
INFO: Species in clade1:
INFO: Species in clade2:
INFO: DONE, results written into working directory .I have changed the parameters for that less strict using CLADE_MAX_DISTANCE and CLADE_MIN_DENSITY values but the problem continues:
...INFO: Filtering clade for clade 2 alignments, min density : 1, max distance : 0.8
WARN: Could not find sufficient data for species in clade 2. Skipping clade with taxa ...Finally, the only solution to this issue that I have found was in the following link, but they suggest to change CLADE_MIN_MARKERS default to 1. Is it the right solution? Where can I find this option? I can't find it in smrt-config.
https://groups.google.com/forum/#!msg/supersmart-users/roGoPTl-9YY/BHjVaNW6GQAJ https://groups.google.com/forum/#!msg/supersmart-users/roGoPTl-9YY/BHjVaNW6GQAJ
Thanks in advance!
Cristian—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub #100, or mute the thread https://github.com/notifications/unsubscribe/AD4YhumPSRqPqGI4xJSf6eFcdRNj_n5Aks5qMDs9gaJpZM4I2np_.
Hi Hannes,
Thanks for your answer! I have just changed the CLADE_MIN_DENSITY value:
CLADE_MAX_DISTANCE=0.8
CLADE_MIN_DENSITY=0.1
CLADE_MIN_COVERAGE=1
CLADE_MAX_COVERAGE=10
CLADE_MAX_HAPLOTYPES=3
I'm obtaining the folder for Clades 1 & 2, but not so for Clade 0. In the attached txt I'm including the command line from my smrt bbdecompose
analysis. I'm very confused about this issue, is there any other alternative?
Sincerely,
Cristian
Hi Christian,
just to clarify: the wider issue is that there is simply comparatively
little data available for your taxa. Alignments are only going to be
written if all the conditions are met:
- enough homologous sequences to cover at least the requested number of
taxa in the genus - good alignments: not too conserved, not too divergent
When I look at what's in NCBI for some of these taxa and I see like 3
nucleotide records for a particular species you have to be pretty fortunate
for these records to end up in a useable alignment. Ultimately nothing is
written because there is not enough data, which is something the pipeline
can't "fix".
(Of course, tweaking the parameters might help, but only to a point.)
Rutger
On Wed, Jun 15, 2016 at 11:17 PM, Cristian Román notifications@github.com
wrote:
Hi Hannes,
Thanks for your answer! I have just changed the CLADE_MIN_DENSITY value:
CLADE_MAX_DISTANCE=0.8
CLADE_MIN_DENSITY=0.1
CLADE_MIN_COVERAGE=1
CLADE_MAX_COVERAGE=10
CLADE_MAX_HAPLOTYPES=3I'm obtaining the folder for Clades 1 & 2, but not so for Clade 0. In the
attached txt I'm including the command line from my smrt bbdecompose
analysis. I'm very confused about this issue, is there any other
alternative?Sincerely,
CristianCommand_line.txt
https://github.com/naturalis/supersmart/files/317237/Command_line.txt—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#100 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AAGf-j7ql2l8eT-sM12MW_Zd5khcujitks5qMGvkgaJpZM4I2np_
.
Hi Rutger,
Seems pretty fine your answer. I thought that the problem might be due to information availability (PhyLoTA). Thank you for clarifying. I have two more questions which might resolve this issue:
- Is there any way to estimate the backbone tree with all sequences? I mean, It would only require estimating the topology using the complete superalignment, but
smrt bbmerge
uses only two representatives per genus (~90 tips). If I use the-e -1
option, I'm only obtaining near 150 tips. - If the answer to the previous question is No, can I generate the BEAST xml without having decomposed the backbone, run BEAST and then use
smrt cladegraft
to reconstruct the species level phylogeny?
Thanks for being so attentive and sorry if I'm asking something wrong. I'm starting to work and understand the SUPERSMART!
Sincerely,
Cristian
Hi Christian,
To get the maximum number of tips and sequences for the backbone supermatrix with the -e -1 option you could set a high value for BACKBONE_MAX_DISTANCE and
BACKBONE_MIN_COVERAGE to 1. Depending on your data, you might get poor convergence with a Bayesian inference on your data, if your matrix turns out to be very sparse.
At the moment, it is unfortunately not possible within SUPERSMART to generate a BEAST xml from all alignments that are generated.
Did you use a root taxon or a list of species names as input for smrt taxize which you could post? From your output I see that your backbone decomposed into one huge clade and two very small ones, so I suppose your posteriors/bootstrap values in your backbone must be quite low.
Best,
Hannes
On 16 Jun 2016, at 00:03, Cristian Román notifications@github.com wrote:
Hi Rutger,
Seems pretty fine your answer. I thought that the problem might be due to information availability (PhyLoTA). Thank you for clarifying. I have two more questions which might resolve this issue:
Is there any way to estimate the backbone tree with all sequences? I mean, It would only require estimating the topology using the complete superalignment, but smrt bbmerge uses only two representatives per genus (~90 tips). If I use the -e -1 option, I'm only obtaining near 150 tips.
If the answer to the previous question is No, can I generate the BEAST xml without having decomposed the backbone, run BEAST and then use smrt cladegraftto reconstruct the species level phylogeny?
Thanks for being so attentive and sorry if I'm asking something wrong. I'm starting to work and understand the SUPERSMART!
Sincerely,
Cristian—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub #100 (comment), or mute the thread https://github.com/notifications/unsubscribe/AD4YhlWltFcyQbNHm2cou0FZjgWlVXdcks5qMHa9gaJpZM4I2np_.
Dear Hannes,
That's a nice alternative. At the moment, I'll try to estimate the topology using ExaML or RAxML in SUPERSMART and use it as starting tree for BEAST analysis (not on SUPERSMART). Thank you for your suggestion.
And sure!, I'm trying to estimate Brassicaceae phylogeny, rooting only with Capparis. I'm only testing but you're right. I'll use an species list. Clade 0 could be Capparis...
smrt taxize --root_taxa Brassicaceae,Capparis --binomials_only
Thank you! I'll post in short my results.
Sincerely,
Cristian
Nothing to do here, closing this.