VascoSch92/sequentium

CI and packaging suggestions

Opened this issue · 3 comments

Some suggestions on packaging and CI, for discussion.

development flow

  • move to GitHub flow, main plus dev, and release on GitHub trigger
  • requires changing the GHA trigger in release.yml workflow to on: release: types: [published] (that's it)
  • ensure CI triggers on PR to main, I think that's anyway how it is at current

packaging

  • I would move all information from setup.py to pyproject.toml, it is easier to maintain there
  • should __version__ not be exported by sequence, e.g., in sequence.__init__? Right now the version number is sequence.__version__.__version__, but it is espected by a single __version__ afaik

testing

I would refactor the tests to make the extension contract simpler, i.e., not require users to add a test, at the same level of coverage. This could be done as follows:

  • SequenceTestSuite is already great, but it requires the developer to add a test in a file somewhere.
  • instead, it should automatically collect all sequences from sequentium, so developers need to add only the sequence class
  • the ground_truth data etc could be added to the sequence class itself, as it is specific to the sequence class. It is useful not just for testing, but can be used for caching, in cases where numbers are hard to compute.
    • on a side note, sequences can have a cache mode and a compute mode which could be switched up to configuration. State-of-art frameworks which handle sequences - e.g., digits of pi - such as wolfram/mathematica do this as well, it's a good idea.

requires changing the GHA trigger in release.yml workflow to on: release: types: [published] (that's it)

This trigger make the release happens when the PR is closed (and mergerd) ?

I would move all information from setup.py to pyproject.toml, it is easier to maintain there

Yes I was thinking in doing that and possibly use poetry to release

should version not be exported by sequence, e.g., in sequence.init? Right now the version number is sequence.version.version, but it is espected by a single version afaik

Probably it is better just to have the version number in the init.py

About Testing:
My idea was more or less as you said: there is a file (yml or toml or python script) where you add just the informations about your sequence (name, class name, is_finite, ground_truth) and then a script create all the tests using the SequenceTestSuite, i.e. for every sequence a sequenceTestSuite is triggered. But i didn't menage to make it happen. So you have to add your test class extending from SequenceTestSuite.

This trigger make the release happens when the PR is closed (and mergerd) ?

No, it happens when you go in the "releases" manager and press the green button on a release, details here:
https://docs.github.com/en/repositories/releasing-projects-on-github/managing-releases-in-a-repository

It's a separate, graphical user interface based workflow, so it is near impossible to trigger by accident (e.g., merge/push/git console magic).

But i didn't menage to make it happen.

I wrote machinery for this idea precisely in scikit-base, it's fully automated and applicable to any base class with a few lines, see example in #45 (tsbootstrap). In scikit-base, the magic happens in skbase.testing boilerplate, it collects all classes from the package and runs them through a test class via pytest fixtures.

We use this in sktime and skpro for regular testing, and 3rd parties running the check_estimator utility for checking 3rd party module extensions use this under the hood, so it's quite near the end of the debugging curve (although "optimization" is still ongoing).