Pathology Search Comparison
In order to run each of the models, you should change the working directory to the corresponding directory (e.g. cd yottixel
).
database
Here, you can find two notebooks: gdc_search.ipynb
and sample.ipynb
. gdc_search.ipynb
uses NCI's GDC API to retrieve all the .SVS slides in TCGA project that have a primary_site
of rither "Breast", "Brain", "Bronchus and lung", "Colon", or "Liver and intrahepatic bile ducts". sample.ipynb
would randomly sample 50 to 75 slides from each category to create the dataset that would be used to create the databases of each method. This sampled dataset is stored as sampled_metadata.csv
.
Yottixel
First you have to generate the mosaics from database slides and test slides. To do so, you can use either patching.py
or parallel_patching.py
. The only difference is that the latter uses multiprocessing for a faster runtime.
python patching.py --data_dir [TCGA_DATA_DIR] --metadata_path './sampled_metadata.csv' --save_dir './PATCHES'
Or
python parallel_patching.py --data_dir [TCGA_DATA_DIR] --metadata_path './sampled_metadata.csv' --save_dir './PATCHES' --num_processes 16
Next step would be to use KimiaNet to extract features from each patch in the mosaics. This can be achieved by running:
python feature_extraction.py --patch_dir './PATCHES/' --extracted_features_save_adr './extracted_features.pickle' --batch_size 256 --use_gpu True
This will save the extracted features to extracted_features.pickle
.
For the slides in the test set, you should follow the same two steps to calculate and save extracted_test_features.pickle
. The rest of the process can be followed in search.ipynb
notebook.
RetCCL
python feature_extraction.py --batch_size 256
python parallel_patching.py --metadata_path ./sampled_metadata_okay.csv --num_processes 16