create a docs folder with detailed info on each pipeline stage
bryantChhun opened this issue · 1 comments
bryantChhun commented
Before 1.0-beta release:
A useful resource would be a centralized folder that contains multiple documents, one for each stage of the pipeline.
- preprocess
- segmentation
- patch
- VAE training and encoding
- dimensionality reduction
bryantChhun commented
For example, the dim_reduction
module has the following usage:
config file
The configuration file now accepts a list of input, output full paths to directories. The file_name_prefixes is a list of string prefixes. The weights_dir is a single directory in which pca_model.pkl is written as a result of PCA fitting (fit_model = True). The conditions is a list of strings describing experimental conditions. This value is only used during plotting after fitting.
details
For fit_model: True:
- loops over all directories listed in config's
input_dirs
- loops over all prefixes in config's
file_name_prefixes
- [aggregate all data]: searches for
<prefix>_latent_space_after.pkl
files in theinput dirs
and concatenates them in a vector list for subsequent PCA fitting - Fitting will write a model
pca_model.pkl
to the config's weights_dir directory. - Fitting will write a figure
PCA.png
to the config's weights_dir directory - finally, will loop over all pairs of
input_dirs
andoutput_dirs
in the config: - will run inference on all individual
<prefix>_latent_space_<suffix>.pkl
in input_dir folder, wheresuffix='after'
hardcoded. And where the supplied model is the one generated from step 4 above. - output of each inference is
<prefix>_latent_space_after_PCAed.pkl
and saved to each correspondingoutput_dir
from 6
For fit_model: False:
- loops over all pairs of directories listed in config's
input_dirs / output_dirs
- loops over all prefixes in config's
file_name_prefixes
- assumes the
weights_dir
supplied in the config is a directory, and looks for thepca_model.pkl
file there. - runs inference on
<prefix>_latent_space_<suffix>.pkl
wheresuffix=after
is hardcoded. - writes the transformed vectors to
<prefix>_latent_space_<suffix>_PCAed.pkl
in the corresponding output_dir directory