NorwegianVeterinaryInstitute/ALPPACA

Add fastani - evaluation diversity - and clustering

Closed this issue · 6 comments

Hi as discussed yesterday

We can add a fastani clustering

What should be nice I think would be to make it such as it can guide users to choose one of the Phylogenetic tracks if they are uncertain of the diversity within their original dataset

I have som R scrips lying around to do that:

  • getting fastani in, getting stats out, and clustering by 2 different ways
    So I can clean them and make a simple nf script out if that, integrate with alppaca.

What we can also consider in the feature, is to provide some helper scripts,

Eg, for automatically splitting subsets in the data set and creating sub datasets where you can rerun phylogenetic analyses on subsets (eg i had one automatically creating all the new nf configs, though not sure how we could integrate in the way the pipeline is built now)

But the problem will be to keep it KISS for users.

Let's hope for many bad weather Sundays this winter ;)

conda version: bioconda::fastani-1.33
docker image: quay.io/repository/biocontainers/fastani:1.33--h0fdf51a_1

Hmmm, did we check if FastANI give a high enough resolution within the same species to be able to see enough difference? Wouldn't the percentage be too high within the same species anyway?

Medium CPU/Memory requirements

One against all is best option, one process per isolate included. Input should be normal list of all genomes with regular input channel, then same list as reference list

Could not get the input channel correct from the input file, as the fastANI process is only running it on the first element of the tuple. I asked on Nextflow discussions to get a solution for this:
nextflow-io/nextflow#3667

Added now in version 2.1.0