openforcefield/protein-ligand-benchmark

Versioning

richardjgowers opened this issue · 3 comments

I think versioning is a little tricky for this repo, as it serves two functions, of providing software and data. Ideally once #18 is done, the version of this repo would refer to the API that you can use to access the data, and the version of the data would tell you if it is "the same" or if it has been modified/fixed in a way that would potentially alter results (e.g. results from data v1.x should all be comparable, but not to a 2.x version of data). We'll likely also want to start partitioning data so that the versions are scoped onto the data correctly.

In crafting the first version of the manuscript, there was also discussion about annotating the test systems with labels that highlight distinct challenges, with the idea that different subsets of systems and ligands could be chose in, for example, one wants to include or exclude charge changes, or ring-opening/closing transformations, or binding sites with metals. In principles, there could be a mixture of ways we select systems, such as

  • specific versions for each system
  • labels or tags that apply to different systems and a certain accession date
  • named and versioned groupings of systems that may additional be assignable DOIs

This might be a bit tedious, but what about dropping the individual datasets on zenodo? That way updates to the datasets are independent from updates to this repo, also adding a new dataset is as simple as:

  1. Adding the dataset in the desired structure to zenodo
  2. Adding a small yaml file here with a link to the zenodo location and a short description (authors, contents, etc...)

The approach we decided to take on 2022.07.26 is to:

  • have a single release version for the whole repository
  • maintain a single changelog for the repository, with use of consistent directory names for referencing targets as changes are made to them or their ligands