BioContainers/tools-metadata

The tools metadata repo

Closed this issue · 1 comments

Dear @bgruening @osallou and Biocontainers contributors:

During the implementation of the registry and API we have realized about the lack of metadata for some of our tools. This lack of metadata makes difficult to search and find the right tool. So far we can have two approaches to increase the metadata in Biocontainers:

1- First, promote more annotations inside the dockerfile or the conda recipe. This approach will make the conda/dockerfile difficult to read and maintain. In addition, not everyone has the metadata at the moment of the creation of the conda/docker recipe and we should not stop the process of creation.

2- Provide an interface for people to annotate, the containers/tools using bio.tools. This is a good approach for non-developers but it doesn't scale for the developer that wants to quickly annotate a file with the metadata in for example GitHub and not going to another web page to perform the annotation.

I'm proposing a third solution to complement these two appraoches and increase the level of metadata. We can promote in both communities (biocontainers & bioconda) a file format descriptor.yaml (that ideally should leave with the conda/biocontainers recipe but for now, we can store in this repository). I propose the following roadmap:

1- The first thing we should do is to agree and discuss the file format (see an example, https://github.com/BioContainers/tools-metadata/tree/master/locarna) and then we start promoting and adding some of them.

2- Have a look into file formats for tools descriptor and see which one we can reuse (CWL, GALAXY or bio.tools are for example options of file formats).

3- We need to agree after the structure on how to involve both community conda/biocontainers and all the documentation and pipelines on the project. That means, for example, that during the building process we can add an additional validation asking for this file.

Please let me know your thoughts.

I am not really positive with this.
Dockerfile/conda provide basic metadata, I do not see which kind of additional metadata is needed. Extra ones should be in bio.tools to avoid an additional annotation effort with risk of duplication.

developper will only provide basic metadata, others will be provided by users via bio.tools.