Hyperparameter selection

Question

Hyperparameter selection

RYY0722 opened this issue 2 years ago · 10 comments

Hi there, thanks a lot for the great work!

I am adapting your algorithm on a mixture of TCGA-GBM and some private data. Would you please give me some suggestions about hyperparameter choosing? Since it does take some time to do extensive parameter-choosing experiments, would you give some directions when I am trying? Thank you very much!

Answer 1 · 2023-04-18T11:50:18.000Z

Hi, thanks for you interest in ZoomMIL! If you want to avoid extensive experiments, I would use the default learning rate, optimizer & scheduler as given in the code and mainly tune the hyperparameter K (number of selected patches).

You can compute the average number of patches N (at the lowest magnification) across the images of your dataset and play around with some values for K, e.g., in the range [N/16, N/8, N/4, N/2] to get an intuition for the parameter. Then you can go finer if desired.

Answer 2 · 2023-04-18T11:56:09.000Z

Thanks a lot for your reply and suggestions!
I notice that you set different settings for the three datasets. May I know your concerns? Which settings should I get started with? Thank you!

Answer 3 · 2023-04-18T12:09:24.000Z

I assume you refer to which/how many magnification levels to choose. It depends on your data. If the areas & patterns of interest are large and can already be detected at low magnifications, you can use a setting similar to the one we chose for BRIGHT. If the areas are small, you may have to start at higher magnifications (as in CAMELYON16)

Answer 4 · 2023-04-18T12:10:13.000Z

I see. Thank you very much!!

Answer 5 · 2023-05-09T03:48:15.000Z

Hi @kevthan I tried your codes out and it works great! Thanks a lot for the great work!
I am working on a problem that focuses more on clinical practicality, which means that I need to push the accuracy higher and higher. Therefore, I am starting to tune the hyper-parameters. I notice that apart from the magnifications, the parameters in GaussianTissueMask and tissue_threshold vary greatly as well. I guess the dataset I am currently working on might be different from others, which may harm the performance. May I know how to select these parameters?

In addition, I have another question regarding the magnification. As the current knowledge about the problem that I am working on suggests that smaller structures, like cell morphology and tissue characteristics, are more important. However, paying more attention only to those features in the hyper-parameter settings might prevent us to find novel biomarkers. Therefore, I plan to use a large range of magnifications and then use the attention weights to figure out which ones are important and summarize the important structures or appearance (hopefully not only local but global). Instead of trying different combinations, may I use a dense and wide range of magnification, like [1.25, 2.5, 5, 10, 20, 40]? Apart from a higher computation burden, will this potentially harm the model performance as well? Or maybe you have observed that the model can handle this kind of large-range information with proper attention? Have you tested a wide range of magnification settings? Would you please share some insights regarding this?

Thank you very much!!!

Answer 6 · 2023-05-10T14:23:27.000Z

Hi @RYY0722,

We tuned the parameters for GaussianTissueMask and tissue_threshold mainly based on visual inspection. I would suggest you to visualize the masked images and then tune tissue_threshold (and/or kernel_size in GaussianTissueMask) such that the masking is tight but doesn't remove actually relevant tissue.

Your approach is interesting. We haven't tested the model with this many magnification levels. I guess it may be more difficult to train because from a combinatorial perspective each added magnification level / top-k selection increases the total number of possible "patch selections". Alternatively you could also try the same range but without some intermediate magnifications. In general, I would choose a rather high K to reduce the complexity and avoid discarding too much information already at a low magnification.

Answer 7 · 2023-05-11T00:58:50.000Z

Got it! Thank you very much for the sharing!

Answer 8 · 2023-05-11T03:06:14.000Z

Hi @kevthan ! If I change the k greatly, shall I change the k_sigma accordingly? When you are selecting the k_sigma, what kind of fold-change or step size di you use? Thank you!

Answer 9 · 2023-05-11T05:10:39.000Z

Hi @RYY0722, you don't necessarily have to change k_sigma. We always used the default value of 0.05.

Answer 10 · 2024-04-10T03:19:07.000Z

Got it! Thank you very much for the sharing!
Hello, could you help me take a look at this issue. Thank you.#12