nmquijada/tormes

Re-run with subset of samples

shlomobl opened this issue · 2 comments

Hi,

Is there a way to run part of the pipeline on a subset of the initial genomes, without running it all over again?

For example, if there are some "outliers" that I want to remove from the final analysis/report/trees, and run only the core-SNP etc. parts?

Thanks!

Hi @shlomobl

I see... You can definitely do that but it requires some expertise by dealing with the rmarkdown files generated by tormes (if your aim is that those samples not to appear in the interactive report). In any case, some analysis must need to be re-done (such as the trees) as they are widely affected by each sample and samples cannot be simply removed form the output.

Maybe the easiest way now is to remove those samples from the metadata file and to speed up the tormes run by:

  • making all samples as GENOME in such files (as you have already the assemblies if this would be your second run)
  • use the option only_gene_prediction to skip annotation (as you might already have the annotation results if you need them)

As I told you here #56, we are developing a major release of the pipeline and this issue is one of the improvements we are willing for, as more users requested similar features.

Sorry for not being more helpful at this time! I will update you as soon as we release the new version!

Best,
Narciso