Encourage the use of COMBINE archives as exchange format for the model and its execution
Opened this issue · 6 comments
Description of the issue:
The SBML is a declarative file format that specifies model components, structure, and interaction of those components. But it does not directly specify how to run that model or how to directly reproduce the figures in a scientific paper from the model. Depending on which solver is used to run a model or in which framework a model is interpreted, the results may diverge.
By using the additional format SED-ML (Simulation Experiment Description Markup Language), it becomes possible to specify how to interpret and run a model, including the typical steps in a simulation life cycle.
To make the use of two separate files less cumbersome for the user, the COMBINE archive format allows wrapping both in a ZIP-based archive together with a manifest file that specifies the relationship between model and SED-ML script. Further data can be added to that archive, e.g., annotation glossaries, original publications, image files with pathways, or SBGNML files for defining pathway maps.
Expected feature/value/output:
Instead of SBML, the exchange format of COBRA tools would become a COMBINE archive file (typically with extension OMEX). It would contain the SBML file with the model, possibly annotations in a separate file, a SED-ML file that specifies how to execute the model, and perhaps more.
Current feature/value/output:
The steps to run the model would be encoded in the SED-ML file allowing third-party software to execute the same steps, hence improving the interoperability of various software and reproducibility of the results.
Reproducing these results:
There are implementations available in Python and other languages to access content within COMBINE archives and to read/write the manifest file.
Interesting idea @draeger.
As a concept, a COMBINE archive is a great step forward to solve problems in modelling. However, being a ZIP limits what it can achieve when compared to versioning (git) and infrastructure (GitHub). I see some advantages if there would be a way to combine (no pun intended) the two approaches.
For situations like these, I default to the 6 thinking hats method. It's easier in person, but in my experience it works well in writing too.
White hat - facts:
- the
COMBINE
archive is a ZIP - a
COMBINE
archive needs hosting - the SBML format is XML based
- the SED-ML format is XML based
- XML formats can be versioned by git
- the SBML format is a requirement of
standard-GEM
- a release on GitHub is a zip (of the repository state)
- hosting a release on GitHub is free
Red hat - emotion:
standard-GEM
is very lightweight, the addition ofCOMBINE
might be too much
Black hat - judgement:
- git is not meant for versioning binaries (zip)
- git LFS can version binaries but it adds requirements/complexity
- releases on GitHub are not permanent (but with Zenodo they could be #14)
Contributions are need; it would be great if you could label ideas with a hat color, too.
You can create additional artefacts that can become part of a release. I could envision each release (tag that is also on Zenodo then) to provide the following separately:
- Stand alone model as SBML (plus whatever formats are desired)
- COMBINE archive that wraps the model, key data (such as growth and essentiality), and instructions for reproducing key simulation results
- A zip of the repository state at release
Green hat - possibilities (building on what @Midnighter described above):
- presently, the standard requires the use of a
model/
folder as the location for various model files, including SBML, but this could be expanded to include other files that would normally belong in a COMBINE archive, particularly SED-ML - when creating a new release on GitHub, a COMBINE archive could be supplied as an additional artifact, which would be automatically published on Zenodo
Looking at the contents of the COMBINE archive (section 3.3), Table 1 in the showcase and the example repository, the archive consists of:
- manifest.xml
This file contains essentially a listing of the file tree with the file formats.standard-GEM
imposes a requirement regarding the main directories, extensions and some file names. Adopting a similar manifest instandard-GEM
would be redundant. - authorship information
In any git-based versioning system, this information is provided by author or committer, and is deeply embedded on platforms such as GitHub. Moreover, as models are curated over time, a list of authors/contributors would not be rich enough to be linked to actual contributions (commits). - fixed file tree
There is some overlap here, and we should aim to increase the compatibility if possible. The directories specified by COMBINE are:
3.1. documentation/ : files that describe and document the model and/or experiment
Instandard-GEM
, documentation is provided more closely with the element it documents, ie within data/ and code/ folders.
3.2 model/ : files that encode and visualise the biological system
Essentially the same approach here.
3.3 experiment/ : files that encode the in silico setup of the experiment
3.4 result/ : files that result from running the experiment
Like mentioned in the previous post, I think something should be done regarding 3.3 and 3.4. @yahanma has taken a similar approach by creating an analysis/ directory over at vna-GEM.
Also a follow-up on the idea of automatically creating COMBINE archives, it feels like work in this direction is already started through CombineArchiveWeb, where instead of uploading file by file, one could point directly to a repository that follows standard-GEM
.
Following up on the CombineArchiveWeb idea, it looks like it is possible to create archives from a Git repository:
Here is what I think would need to be done in order to close the issue:
- create an empty folder called
analysis
with a Readme saying that the folder is meant to contain experiments and results - add a reference (
can
) to.standard-GEM.md
in the Releases section to encourage attaching a COMBINE archive to a release
@draeger what else would you recommend so this issue can be resolved?
Are there any thoughts from the watchers of this issue?
I think, this is very nice. Have you tried it out? Possibly, a build script could also wrap a bunch of files in an archive and write the manifest file during a local execution. But a webservice can certainly do the same (note: it will require data transmission).