isyangshu/MambaMIL

CLAM Preprocessing

Closed this issue · 10 comments

I want to ask about the parameters used to obtain the feature vectors from CLAM.

I'm referring specifically to the BLCA dataset.
If I use the default parameters, some WSIs have less than 437 patches. 437 is the number of patches sampled for survival analysis in Table 4.
This is an example output log:
batch 99, loss: 0.8604, label: 3, event_time: 29.5300, risk: -2.5116, bag_size: 274
batch 199, loss: 0.2883, label: 2, event_time: 18.7900, risk: -3.0559, bag_size: 313
Epoch: 0, train_loss_surv: 1.4153, train_loss: 1.4153, train_c_index: 0.4685
Epoch: 0, val_loss_surv: 1.3301, val_loss: 1.3301, val_c_index: 0.4093
Validation loss decreased (inf --> 0.409305). Saving model ...

batch 99, loss: 0.1626, label: 2, event_time: 21.0200, risk: -3.2254, bag_size: 335
batch 199, loss: 1.8531, label: 0, event_time: 6.7700, risk: -2.7776, bag_size: 268
Epoch: 1, train_loss_surv: 1.2493, train_loss: 1.2493, train_c_index: 0.5632
Epoch: 1, val_loss_surv: 1.2451, val_loss: 1.2451, val_c_index: 0.4983
Validation loss decreased (0.409305 --> 0.498314). Saving model ...

batch 99, loss: 0.2588, label: 1, event_time: 11.9600, risk: -2.6213, bag_size: 292
batch 199, loss: 1.1709, label: 1, event_time: 8.9400, risk: -2.2951, bag_size: 206
Epoch: 2, train_loss_surv: 1.1817, train_loss: 1.1817, train_c_index: 0.5791
Epoch: 2, val_loss_surv: 1.2481, val_loss: 1.2481, val_c_index: 0.5563
Validation loss decreased (0.498314 --> 0.556305). Saving model ...

437 here in Table 4 means the number of WSIs we use for BLCA dataset not the number of patches

BLCA has 373 DX1 type WSIs. I assume you use some of the other type of WSIs.
In general, do you have one WSI per patient?

In most works, they only use the DX1 WSIs, so I'm quite confused how to benchmark what I am doing.

Also, do you sample the same number of patches per WSI?

not only DX1 but also DX2, DX3.

So for some patients, are you using multiple WSIs (ie DX1, and DX2, DX3)?

We use not only DX1 but also DX2, DX3. You can check the WSIs we use in the dataset_csv and we do not sample the same number of patches per WSI. So during training the batch size is always 1

Okay I understand the point about DX1,DX2, DX3.
So do you sample all patches that each pt file contains?

Is the loading operation done by the class Generic_MIL_Survival_Dataset?
Screenshot 2024-05-14 144551

Yes, we sample all patches that each pt file contains and we use lass Generic_MIL_Survival_Dataset to load. For more details, you can check the code.

I understand. One more question:
If I run the same model with the same random seed, I get the same c-index.
In order to obtain the standard deviation in the results, do you use different random seeds and how many?

Actually, we use 10-fold cross validation and use the same random seed.

Okay, I understand thank you.