Re-run with subset of samples

Question

Re-run with subset of samples

shlomobl opened this issue 2 years ago · 2 comments

Hi,

Is there a way to run part of the pipeline on a subset of the initial genomes, without running it all over again?

For example, if there are some "outliers" that I want to remove from the final analysis/report/trees, and run only the core-SNP etc. parts?

Thanks!

Answer 1 · 2022-06-17T07:40:26.000Z

Hi @shlomobl

I see... You can definitely do that but it requires some expertise by dealing with the rmarkdown files generated by tormes (if your aim is that those samples not to appear in the interactive report). In any case, some analysis must need to be re-done (such as the trees) as they are widely affected by each sample and samples cannot be simply removed form the output.

Maybe the easiest way now is to remove those samples from the metadata file and to speed up the tormes run by:

making all samples as GENOME in such files (as you have already the assemblies if this would be your second run)
use the option only_gene_prediction to skip annotation (as you might already have the annotation results if you need them)

As I told you here #56, we are developing a major release of the pipeline and this issue is one of the improvements we are willing for, as more users requested similar features.

Sorry for not being more helpful at this time! I will update you as soon as we release the new version!

Best,
Narciso

Answer 2 · 2022-10-11T07:27:11.000Z

Hi, Good news (major release)! I will try with both options you suggested. Thanks! S.

…

On Fri, Jun 17, 2022 at 10:40 AM Narciso Martin Quijada < ***@***.***> wrote: Hi @shlomobl <https://github.com/shlomobl> I see... You can definitely do that but it requires some expertise by dealing with the rmarkdown files generated by tormes (if your aim is that those samples not to appear in the interactive report). In any case, some analysis must need to be re-done (such as the trees) as they are widely affected by each sample and samples cannot be simply removed form the output. Maybe the easiest way now is to remove those samples from the metadata file and to speed up the tormes run by: - making all samples as GENOME in such files (as you have already the assemblies if this would be your second run) - use the option only_gene_prediction to skip annotation (as you might already have the annotation results if you need them) As I told you here #56 <#56>, we are developing a major release of the pipeline and this issue is one of the improvements we are willing for, as more users requested similar features. Sorry for not being more helpful at this time! I will update you as soon as we release the new version! Best, Narciso — Reply to this email directly, view it on GitHub <#54 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABKL3XN6DPSDC5QIHFVMPALVPQTXNANCNFSM5YP4OXGQ> . You are receiving this because you were mentioned.Message ID: ***@***.***>

-- *Dr. Shlomo Blum, DVM PhD* *KSVM Bacteriology and Mycology Lecturer* *Head of Dept. of Bacteriology and Mycology* *Kimron Veterinary Institute* *POB 12* *Bet Dagan, 50250* *Israel* *Tel.: +972-3-9681680* *Mob.: +972-50-6241862*