mehta-lab/dynamorph

create a docs folder with detailed info on each pipeline stage

bryantChhun opened this issue · 1 comments

Before 1.0-beta release:

A useful resource would be a centralized folder that contains multiple documents, one for each stage of the pipeline.

  • preprocess
  • segmentation
  • patch
  • VAE training and encoding
  • dimensionality reduction

For example, the dim_reduction module has the following usage:
config file
The configuration file now accepts a list of input, output full paths to directories. The file_name_prefixes is a list of string prefixes. The weights_dir is a single directory in which pca_model.pkl is written as a result of PCA fitting (fit_model = True). The conditions is a list of strings describing experimental conditions. This value is only used during plotting after fitting.

details
For fit_model: True:

  1. loops over all directories listed in config's input_dirs
  2. loops over all prefixes in config's file_name_prefixes
  3. [aggregate all data]: searches for <prefix>_latent_space_after.pkl files in the input dirs and concatenates them in a vector list for subsequent PCA fitting
  4. Fitting will write a model pca_model.pkl to the config's weights_dir directory.
  5. Fitting will write a figure PCA.png to the config's weights_dir directory
  6. finally, will loop over all pairs of input_dirs and output_dirs in the config:
  7. will run inference on all individual <prefix>_latent_space_<suffix>.pkl in input_dir folder, where suffix='after' hardcoded. And where the supplied model is the one generated from step 4 above.
  8. output of each inference is <prefix>_latent_space_after_PCAed.pkl and saved to each corresponding output_dir from 6

For fit_model: False:

  1. loops over all pairs of directories listed in config's input_dirs / output_dirs
  2. loops over all prefixes in config's file_name_prefixes
  3. assumes the weights_dir supplied in the config is a directory, and looks for the pca_model.pkl file there.
  4. runs inference on <prefix>_latent_space_<suffix>.pkl where suffix=after is hardcoded.
  5. writes the transformed vectors to <prefix>_latent_space_<suffix>_PCAed.pkl in the corresponding output_dir directory