Single or multiple packages?

Question

Single or multiple packages?

grst opened this issue a year ago · 2 comments

Option 1: one package that contains functions for multiple modalities (e.g. transcriptomics, AIRR, ATAC, ...).
Option 2: one package per modality, e.g. gex-io, airr-io, epigenetics-io

Pro one package

joint maintenance -> reduced overhead and bus factor
central place for users to look for IO functions
potential for multimodal IO functions that generate MuData objects directly

Pro multiple packages

clear who's responsible for releasing the package
no need to deal with optional dependencies
Can release packages independently more quickly

Optional dependencies

Could be handled with dependency groups.
```
pip install scverse-io
```
installs only the basic functions.

then there's e.g.
```
scverse-io[airr]
```
and
```
scverse-io[all]
```
The latter would be broadly advertised in the README.
Maybe it's not as bad after all. Most IO functions should get away the same packages reading csv/mtx/json/h5.
@flying-sheep made the point that if an optional dependency is not installed, the error message should be just a plain
message with installation instructions rather than a full stack-trace that will be intimidating for beginners.

Answer 1 · 2023-04-12T10:52:43.000Z

One observation here is that already for a package for transcriptomics only, we would have optional dependencies, e.g. for loom or excel (see #5).

So we anyway need a mechanism for dealing with that and I don't think it's a good argument against a single package.

Answer 2 · 2023-04-12T11:18:25.000Z

Added one pro for multiple packages (edited) but I really think that a single package will be easier for everyone. scanpy also isn't shipping leidenalg and it hasn't really been an issue. Nor PAGA or some others.