scverse/scverse-io

Single or multiple packages?

grst opened this issue · 2 comments

grst commented

Option 1: one package that contains functions for multiple modalities (e.g. transcriptomics, AIRR, ATAC, ...).
Option 2: one package per modality, e.g. gex-io, airr-io, epigenetics-io

Pro one package

  • joint maintenance -> reduced overhead and bus factor
  • central place for users to look for IO functions
  • potential for multimodal IO functions that generate MuData objects directly

Pro multiple packages

  • clear who's responsible for releasing the package
  • no need to deal with optional dependencies
  • Can release packages independently more quickly

Optional dependencies

  • Could be handled with dependency groups.

    pip install scverse-io
    

    installs only the basic functions.

    then there's e.g.

    scverse-io[airr]
    

    and

    scverse-io[all]
    

    The latter would be broadly advertised in the README.

  • Maybe it's not as bad after all. Most IO functions should get away the same packages reading csv/mtx/json/h5.

  • @flying-sheep made the point that if an optional dependency is not installed, the error message should be just a plain
    message with installation instructions rather than a full stack-trace that will be intimidating for beginners.

grst commented

One observation here is that already for a package for transcriptomics only, we would have optional dependencies, e.g. for loom or excel (see #5).

So we anyway need a mechanism for dealing with that and I don't think it's a good argument against a single package.

Added one pro for multiple packages (edited) but I really think that a single package will be easier for everyone. scanpy also isn't shipping leidenalg and it hasn't really been an issue. Nor PAGA or some others.