Load and Save Function
Closed this issue ยท 8 comments
For the purposes of reproducibility and applying the same fit model to data between sessions. The data involved in exporting/importing a model to disk should look something like:
- A
json
with model settings - A
.npy
file that stores the original flatfield/darkfield images
The save
function should take 1 required and two optional inputs:
path: Path
- The directory to save the files tomodel_params: str = "BaSiC.json
- The name of thejson
file to exportmodel_weights: str = "BaSiC.npy"
- The name of thenpy
file to export
The load
function should take two required inputs:
model_params: Path
- The full path to thejson
filemodel_weights: Path
- The full path to thenpy
file
Should a single .npy
store both profiles (I'm thinking about a case where only flatfield
is calculated)? Maybe it would be cleaner to save one file per profile.
Are you thinking as a standalone function or as a method of BaSiC
?
I see that stardist
and cellpose
use something like model_dir
. Should we use a folder to organize settings/profiles and just load from that?
E.g.
# being explicit with `save_model` may help the user know that
# they are saving the settings and profiles, not just the profiles
# or even the corrected images
basic.save_model(folder_path)
basic.load_model(folder_path)
There's a few ways we could do it. One file could hold both flatfield/darkfield, just in different layers of the array. When you say one file per profile, do you mean combining the flatfield/darkfield images into one file? Or do you mean including the json parameters as well?
I think load and save should be class methods. So you would do BaSiC.load
and BaSiC.save
. Alternatively, save
could just be an instance method, but if load
was an instance method it would override whatever the instance was initialized with and I don't like that idea.
I like the idea of using a model directory. That's what pytorch/tensorflow effectively do. I'd be on board with that.
I was considering the case where a model did not calculate darkfield
.
Here are the options I see:
- save
profiles.npy
whereprofiles.shape
is(X, Y, 2)
.profiles[..., 0]
is always flatfield, and `profiles[..., 1] is always the darkfield. If darkfield is not calculated, it could be all zeros. - same as above expect
profiles.shape
is(X, Y)
when only flatfield is calculated. flatfield.npy
and optionally adarkfield.npy
if it was calculated
I was thinking about the latter (instance methods). What about the settings-override is troublesome?
basic = BaSiC()
basic.fit(data)
# save, class method
BaSiC.save_model(folder_path, basic.settings, [basic.flatfield, basic.darkfield])
# save, instance method
basic.save_model(folder_path)
# load option 1, class method
basic = BaSiC.load_model(folder_path) # returns instance of basic
# load option 2, instance method
basic = BaSiC()
basic.load_model(folder_path)
# load option 3, __init__ method
basic = BaSiC(model_dir=folder_path)
That looks good to me. My preference for saving data is to go with option 1, since it's fewer files and you always know what the shape of the array should be. It's also so little extra storage space that saving a layer of 0s is trivial in terms of space.
Alright, I'm happy with option 1 for file saving.
Which of the load options are you thinking?
I think the class method makes the most sense intuitively. Otherwise it's initializing a model then overwriting parameters when loading.
Is there some task remaining for this? @tdmorello