gdcc/swc.gdcc.io

Biogrids

Opened this issue · 0 comments

Biogrids (https://biogrids.org), based at HMS, curates, distributes and supports research software for bioinformatics, imaging and scientific visualization on linux/mac.
This project has an ongoing effort to improve research reproducibility by integrating Dataverse into existing software distribution infrastructure.
At present, this allows users to install software titles and dependencies by version and easily report version and dependency information used for a research project.
For example, researcher A uses a default ExampleResearchTitle for their project. When writing it up, the researchers can use biogrids tooling to report versions used (ex ExampleResearchTitle@0.1), allowing Researcher B to install the same version used as well as the required dependencies (ex ExampleResearchTitle@0.1 depends on java@1.8.0_144 and OlderResearchTitle@0.8, which in turn depends on php@5.6.19; Researcher B gets the required dependencies by specifying ExampleResearchTitle@0.1).

This does not currently allow a researcher to cite the software version used by a globally persistent identifier; the objective of this Biogrids effort is to change that.
We've developed a provisional metadata block (inspired partly by CodeMeta), and used existing Dataverse APIs to integrate the software distribution pipeline with software dataset creation, update and publication in Dataverse.
For future use, this metadata block currently supports singularity (https://www.sylabs.io/singularity/) container URLs for software titles and dependencies.