Add bakta
fmalmeida opened this issue · 8 comments
Study the best way to implement Bakta in the pipeline.
It will be nice to provide the users with the option to choose the base annotation with Prokka or Bakta, depending on their needs.
Check if it will be possible to add it.
Bakta outputs are extremely similar to Prokka, however, their annotation is more reliable. Therefore, the addition seems to be very straightforward:
- Create a module for bakta so users can use either prokka or bakta
- If using bakta, select the outputs that are similar to the ones produced by prokka and are used throughout the pipeline, thus, the rest of the pipeline would be exactly the same, using the GFF and TSV from bakta or prokka
One thing to think is:
- Bakta depends on a heavy database, thus, it would not be adequate to put it into the docker image
- Therefore, to add bakta to the pipeline, the pipeline itself must be reconfigured to have a module that create all the databases that are used throughout the pipeline
- Then, make the pipeline receive a parameter setting path to this database, which would be easier to users to make them up to date
- This would also make the docker images only possess the tools, and not the database files, making them smaller, and also making it possible to use the pipeline with different profiles such as: conda, docker or singularity
Recapitulating:
To add bakta it would be necessary to:
- make the pipeline use tools from conda, docker or singularity with the databases being set in a custom user path
- create a module to automatically download and format the databases for the pipeline
- re-configure the pipeline to use the database files from this database directory provided by the user
- add bakta
Now that pipeline has been restructured, this issue can become a reality.
Since bakta database is huge, instead of downloading and formatting with the pipeline users will have to download themselves as each system or institute will have a way to handle such massive download.
Thus, if users want to annotate and trigger bakta, they will have to simply:
- Download the database
- Set path to bakta database with
--bakta_db
When using this parameter, the pipeline should automatically trigger bakta instead of prokka.
Finally, after very much time, workflow is now properly running from top to bottom when using bakta
. For release, it is now required to:
- Update the docs to explain about
bakta
option. How to use it? What to expect? - Update version on manifest
- Update automatic reports so they understand when user used
prokka
orbakta
. Check if everything is well rendered. - Automatic report, when using
prokka
must understand when pipeline run using additional hmm libraries forprokka
, and which ones were used (from the ones possible when building databases). - To think. If using
bakta
, there is addional parsing of outputs that we can do to give users more information in outputs?
Almost ready.
- requires running at least two annotations to evaluate how final results look like, so changes can be merged
- And make sure docs are up to date
try to roll it up in the next 3 days
Something is wrong with bakta docker image. When running it, it is complaining about diamond.
With some -9 exit code.
Execution tests were finished. Now building new docker images, to check whether scripts and reports are properly updated so release can be made.
Finally done 🥳