Nuclei Metric Learning
Metric learning for worms nuclei.
Scripts
consolidate_worms_dataset
Call: ./src/scripts/consolidate_worms_dataset -c default.toml -i path_to_30WormsImagesGroundTruthSeg
Creates .hdf datasets for each worm. Gets rid of worm names and mis-matched label numbering by unifying these with
universe.txt
and worm_names.txt
. Most setting are set in .toml config file (check default.toml), such as
worms_dataset
that here is used as path to output dataset, but throughout the project is used as input path.
HDF keys:
volumes/raw
: raw input, [140x140x1166] uint8, without any modification e.g. normalizationvolumes/nuclei_seghyp
: instace labeling, [140x140x1166] uint16, labels no meaning, just a number to distinguish between instancesmatrix/con_seghyp
: center of nuclei, each row corresponds to the label involumes/nuclei_seghyp
, [max (nuclei_instances), 3] float32volumes/gt_nuclei_labels
: ground truth labels, segmentations the same as nuclei_instances, here just the invalid segmentations are removed, and also relabeled according touniverse.txt
, [140x140x1166] uint16matrix/gt_con_labels
: same ascon_instances
but forgt_nuclei_labels
, all fixed size of [559x3] float32 , missing labels are np.array([0.0, 0.0, 0.0])
consolidate_cpm_dataset
Call: ./src/scripts/consolidate_cpm_dataset -c default.toml -i path_to_root_kolmogorov_sol_format_both_directions -i2 path_to_nucleinames_corresponding_to_QAP_sols_labeling -i3 path_to_nuclei_name_labels_in_30WormsImageGroundTruthInstanceSeg
Creates cpm dataset, default in ./data/processed (defined in default.toml). .pkl file containing a dictionary with keys '{w1id}-{w2id}' where w1id<w2id and value is a dict of consistent pairwise matchings.
Models
convnet_models: (No Use For Now) a conventional VGG-style network with some conv layers + some fc layers. For extracting embeddings based on patches.
unet: unet model for pixel-wise embeddings.
Project Organization
├── LICENSE
├── README.md <- The top-level README for developers using this project.
├── data
│ ├── external <- Data from third party sources.
│ ├── interim <- Intermediate data that has been transformed.
│ ├── processed <- The final, canonical data sets for modeling.
│ └── raw <- The original, immutable data dump.
├── models <- Trained and serialized models, model predictions, or model summaries
│
├── experiments <- keep experiment results
│
├── experiments_cfg <- config files to reproduce experiments
│
├── notebooks <- Jupyter notebooks. Naming convention is a number (for ordering),
│ the creator's initials, and a short `-` delimited description, e.g.
│ `1.0-jqp-initial-data-exploration`.
│
├── reports <- Generated analysis as HTML, PDF, LaTeX, etc.
│ └── figures <- Generated graphics and figures to be used in reporting
│
├── requirements.txt <- The requirements file for reproducing the analysis environment, e.g.
│ generated with `pip freeze > requirements.txt`
│
├── setup.py <- makes project pip installable (pip install -e .) so src can be imported
├── src <- Source code for use in this project.
│ ├── lib <- useful code for the project
│ │ │
│ │ ├── data <- data related code, creating dataset code chunks, or typical DL datasets
│ │ │
│ │ ├── modules <- model definition, together with train, fit, evaluate methods if it requires
│ │ │ specialized code
│ │ ├── utils <- general utility functions, summary writing, plotting and visualizations
│ │
│ └── scripts <- end-point scripts for different tasks
│ │ └── consolidate_dataset.py
│ │ └── train.py
│ │ └── evaluate.py
│ │
│ └── test <- scripts to help writing code and testing them, not the typical UnitTest but sth similar
Project based on the cookiecutter data science project template. #cookiecutterdatascience