peng-lab/BaSiCPy

Load and Save Function

Closed this issue ยท 8 comments

For the purposes of reproducibility and applying the same fit model to data between sessions. The data involved in exporting/importing a model to disk should look something like:

  1. A json with model settings
  2. A .npy file that stores the original flatfield/darkfield images

The save function should take 1 required and two optional inputs:

  1. path: Path - The directory to save the files to
  2. model_params: str = "BaSiC.json - The name of the json file to export
  3. model_weights: str = "BaSiC.npy" - The name of the npy file to export

The load function should take two required inputs:

  1. model_params: Path - The full path to the json file
  2. model_weights: Path - The full path to the npy file

Should a single .npy store both profiles (I'm thinking about a case where only flatfield is calculated)? Maybe it would be cleaner to save one file per profile.

Are you thinking as a standalone function or as a method of BaSiC?

I see that stardist and cellpose use something like model_dir. Should we use a folder to organize settings/profiles and just load from that?

E.g.

# being explicit with `save_model` may help the user know that
# they are saving the settings and profiles,  not just the profiles
# or even the corrected images
basic.save_model(folder_path)
basic.load_model(folder_path)

There's a few ways we could do it. One file could hold both flatfield/darkfield, just in different layers of the array. When you say one file per profile, do you mean combining the flatfield/darkfield images into one file? Or do you mean including the json parameters as well?

I think load and save should be class methods. So you would do BaSiC.load and BaSiC.save. Alternatively, save could just be an instance method, but if load was an instance method it would override whatever the instance was initialized with and I don't like that idea.

I like the idea of using a model directory. That's what pytorch/tensorflow effectively do. I'd be on board with that.

I was considering the case where a model did not calculate darkfield.

Here are the options I see:

  1. save profiles.npy where profiles.shape is (X, Y, 2). profiles[..., 0] is always flatfield, and `profiles[..., 1] is always the darkfield. If darkfield is not calculated, it could be all zeros.
  2. same as above expect profiles.shape is (X, Y) when only flatfield is calculated.
  3. flatfield.npy and optionally a darkfield.npy if it was calculated

I was thinking about the latter (instance methods). What about the settings-override is troublesome?

basic = BaSiC()
basic.fit(data)

# save, class method
BaSiC.save_model(folder_path, basic.settings, [basic.flatfield, basic.darkfield])

# save, instance method
basic.save_model(folder_path)

# load option 1, class method
basic = BaSiC.load_model(folder_path) # returns instance of basic

# load option 2, instance method
basic = BaSiC()
basic.load_model(folder_path)

# load option 3, __init__ method
basic = BaSiC(model_dir=folder_path)

That looks good to me. My preference for saving data is to go with option 1, since it's fewer files and you always know what the shape of the array should be. It's also so little extra storage space that saving a layer of 0s is trivial in terms of space.

Alright, I'm happy with option 1 for file saving.

Which of the load options are you thinking?

I think the class method makes the most sense intuitively. Otherwise it's initializing a model then overwriting parameters when loading.

Is there some task remaining for this? @tdmorello

Closed by #43