Reduction of strong dependencies in Project.toml

Question

Reduction of strong dependencies in Project.toml

Closed this issue 10 months ago · 9 comments

I think the number of strong dependencies in the package needs to be reduced to a minimum to avoid compatibility problems, based on what you want the core package to do. As @felixhorger mentioned, one way of doing this is to separate the package into various simpler packages with minimal dependencies (which I am a fan of!). Another way of doing this, which is easier to implement, is to use package extensions.

Having said that, there are some packages that ultimately should not be in the Project.toml of the main package, like Literate.jl and Documenter.jl as these correspond to the Project.toml of the docs (the same for the tests, and the generation of the figures for the paper). This comment also applies to MRIGradients.jl (whose name is a little confusing because it only appears to do GIRF correction).

The main point that I want to come across is that if, as a user, I want to do only a GIRF-corrected recon, having my k-space, GIRF, and coils estimated my way, I do not expect a package called GIRFReco.jl to use DL tools like Flux.jl for B0 estimation and coil estimation, or to require Nifti.jl, MAT.jl, Unitful.jl, and others. In the specific case of Flux.jl, it may not be needed at all, and the gradient could be computed manually (I only saw it being used in pcg_ml_est_fieldmap and sens_smooth). Arguably, functions that do B0 and coil sensitivity estimation should not be part of this package at all.

Going back to the package extensions, the idea is to define a function only if the package is explicitly imported and added as a weak dependency. A simple example is the function readGIRFFile (from MRIGradients.jl) that depends on MAT.jl (for some reason, MAT.jl is also a dependency of GIRFReco.jl?). Only if the user does this

using MRIGradients, MAT

Then you get the function readGIRFFile defined in /ext/MRIGradientsMATExt.jl as

module MRIGradientsMATExt
    import MRIGradients
    using MAT
    # definition of readGIRFFile with more arguments, inside MRIGradients the function 
    # should be defined as an empty function: function readGIRFFile end
    MRIGradients.readGIRFFile( ...)
        ...
    end
end

that should also be added to the Project.toml

[weakdeps]
MAT = "..."

[extensions]
MRIGradientsMATExt = "MAT"

For the case of Flux.jl, if it is truly necessary, I think the best way of doing this would be to add Flux as a weak dependence of MRIFieldmaps.jl and MRICoilSensitivities.jl. In that way, if I want to do a field map estimation, I do

using MRIFieldmaps
# My code that uses MRIFieldmaps
data = ... # read data somehow
B0 = estimate_field_map(data, ..) #It is probably not called like this

and if I want to use the functions defined with Flux

using MRIFieldmaps, Flux
# My code that uses MRIFieldmaps
data = ... # read data somehow
B0 = estimate_field_map(data, ..) #I can estimate B0 the "traditional way"
B0 = estimate_field_map_flux(data, ..) #But also using Flux for the gradient!

Hope this is useful,
Cheers!

Answer 1 · 2023-10-19T21:34:15.000Z

Dear Carlos @cncastillo,

Thank you so much for this detailed feedback, we will go through the dependencies carefully and see which ones can be dropped/transformed into weak dependencies.

Answer 2 · 2023-10-20T03:28:01.000Z

Dear Lars @mrikasper,

I will reply to the JOSS comment here.

First of all, I just want to clarify that this package has done a fantastic job, and I understand what a thankless job it is to publish an open-source package. Making open-source software useful for the community requires some painful, and not really publishable, laborious steps: documentation, testing, following the language guidelines, etc.

I do not think the modularization/simplification of packages is Julia-specific or reconstruction-specific, just that people have realized that making a package more flexible and small (with fewer dependencies) makes it more useful and easier to maintain in the long run. This also makes packages easier to understand and increases the possibility of them being used (if properly documented). Just take a look at the package in NPM is-odd that has over 18M downloads. That happened with MRIReco.jl, which started as a big package to be then subdivided into useful simpler submodules, and my package KomaMRI.jl, which I also subdivided and I continue to do so to accommodate for more use cases (specifically moving IO to a submodule and GPU-related packages as a package extension). Both cases could be similar to yours, with an umbrella package that does everything in the pipeline, and specific packages that do each step, giving freedom to the user to install just a part of the pipeline. As you have improved the coil sensitivity and B0 map estimation from preexisting packages, those could be their own submodules.

Besides the philosophical view about coding practices and beliefs, there are some practical benefits, maybe Julia-specific, of developing packages in this way. If a package updates and breaks one part of your code, that error is restrained within the module, making it easier to fix and patch. Moreover, it makes dealing with package compatibility much easier ([compat] section of Project.toml), and avoids problems with installation conflicts (I can selectively install what I need). Finally, it also has some performance benefits, as a package with fewer dependencies uses less disk space, installs faster, compiles faster, and loads faster (especially if you use a large package like CUDA).

In my experience, every new dependency adds a new possible point of failure, and every test finds a new bug.

Cheers!

Answer 3 · 2023-10-21T21:49:48.000Z

First of all, I just want to clarify that this package has done a fantastic job, and I understand what a thankless job it is to publish an open-source package. Making open-source software useful for the community requires some painful, and not really publishable, laborious steps: documentation, testing, following the language guidelines, etc.

I can only repeat that! :)

Answer 4 · 2023-10-22T02:22:58.000Z

Thanks @felixhorger and @cncastillo, @mrikasper I will start going through the dependencies. Many of these dependencies arise from how we use the pipeline and package ourselves (i.e we use its Project.toml as a base environment for our reconstruction work and ad-hoc development, we should probably stop doing this). I think this should be straightforward to clean, I should have it done early this week.

Answer 5 · 2023-10-22T18:05:03.000Z

Relevant to this is also @mrikasper's comment in the JOSS review

This is what our submission is mainly about (a pipeline for...)

I think a reasonable thing to do is to modularise your pipeline into abstract steps, potentially outsource into separate packages (e.g. the B0 field estimation), and provide an overarching function which starts the whole pipeline, given the dictionary of parameters. It can then read from ISMRMRD and write into NIFTI files, exactly as it is now. I recommend to add one version of that function where arrays are provided instead of reading them from file, and arrays are returned instead of writing to NIFTI. Further, I recommend that all subparts of the pipeline follow the same principle of working on arrays rather than files.

With this setup, beginners are happy because they have a plug and play version and advanced users are happy because it's modular and they can exchange parts of the pipeline and are not forced to provide data in a specific format or always write to disk. Happy to hear your opinions on this :)

Answer 6 · 2023-12-19T08:30:36.000Z

@felixhorger @cncastillo We’ve been thinking hard about how best to implement your suggestions surrounding packageization and dependencies into our pipeline. As our repo is intending to provide a complete pipeline for image reconstruction, further separating out into many small subpackages will atomize it, making it difficult to meet the needs of implementation of our application, and the needs of access of the potential users. There are already many subpackages in the MRI Julia ecosystem; our repo has tried the best to utilize these subpackages, and it may not be necessary for further compartmentalization. To provide a concise demo as quick-start for the new users, we need a comprehensive recipe or example of integrating everything together for real-world reconstructions.

We have elected to pursue Option 1 as suggested in Issue #14 for reducing the dependencies and also to streamline the running of the examples. This will achieve the target of a significantly more lightweight GIRFReco.jl.

Meanwhile, we also considered the suggestions of strictly using Arrays in our pipeline. For the purpose of reusing the currently available modules/classes, we chose to facilitate all k-space data with RawAcquisitionData and AcquisitionData types, which are provided by MRIBase in MRIReco.jl. They are well-built modules providing all necessary functionality methods and interfaces for processing k-space data, along with their metadata which is necessary for our pipeline.

Furthermore, MRIReco.jl, one of the packages that our pipelines depends on, is providing the flexibility for users to generate their own coil sensitivities, B0 maps, etc. based on their own data. Our repo’s major target is to provide a pipeline or recipe through some brief demos, as well as the utility functions to glue the recipe together. The users are free to provide their own (coil and/or B0) maps, and to replace the processing methods outlined in the demos to extend them for their own use cases.

Would this approach be alright?

Answer 7 · 2023-12-21T09:56:01.000Z

Option 1 sounds good to me!
Using RawAcquisitionData and AcquisitionData is fine because MRIReco.jl made this kind of a standard.
One point that I nonetheless want to flag is the utils folder (as the name suggests it's functionality GIRFReco.jl needs but doesn't fit into a clear category):

utils/Utils.jl: this file contains quite diverse functionality, you already noticed that because you named it "Utils", i.e. it's hard to find a matching name (contains e.g. plotting, manipulating trajectory and raw data, high-level functionality for applying GIRF corrections). I think this should be easy to split into separate files within GIRFReco.jl. For example utils/plot.jl, ismrmrd.jl, and utils/applyGIRF.jl.
utils/fieldMapEstimator.jl: I suggest you outsource that because it is quite self-contained (apart from estimateB0Maps() which has to remain in GIRFReco.jl) and it would be nice to have this functionality outside of GIRFReco.jl. This functionality probably has to go somewhere else eventually (MRIReco?), but for now I propose it's alright to put it in a separate package. Is there any barrier to this I'm not seeing?
utils/variationalSmoother.jl: Same as before, it's self-contained and could go into another package, until the opportunity arises to incorporate it into e.g. MRIReco's MRICoilSensitivities.jl.

I think doing that won't be a great effort (~1h?), and doesn't disagree with the major target of providing the pipeline as the main functionality but would have a positive effect on new users/developers of GIRFReco.jl.

Happy holidays everyone! :)

Answer 8 · 2023-12-26T22:19:31.000Z

Hi! I hope everyone had a good Christmas 🎄.

We have elected to pursue Option 1 as suggested in Issue #14 for reducing the dependencies and also to streamline the running of the examples. This will achieve the target of a significantly more lightweight GIRFReco.jl.

Option 1 sounds good to me too.

There are already many subpackages in the MRI Julia ecosystem; our repo has tried the best to utilize these subpackages, and it may not be necessary for further compartmentalization.

Regarding the package separation, I agree with @felixhorger that B0- and coil-related functions should not be in this package, and as said, the purpose of the package remains intact.

Cheers!

Answer 9 · 2024-01-12T23:36:23.000Z

We have worked on this in #14 and have been able to remove quite a few dependencies, and are now in the process of consolidating our changes with the documentation workflow and writing tests for CI and coverage. We have also elected to remove the variational smoothing code from the repo. More or less finished, closing for now until we confirm that we can further reduce dependencies.