Slide prediction in Camelyon16
szc19990412 opened this issue · 6 comments
Hello, thanks for your amazing work!In the process of reproducing the results of the paper, I encountered some problems, which I hope can be replyed.
For the slide prediction in Camelyon16, I didn't find code on how to predict from heatmap to slide level. According to the paper, I refer to the code here: For a Slide, I extracted 28 features based on the heatmap, and then fed into the random forest for training, but did not get a good result. So there will be some tricks to train the RandomForestClassifier?If you can open source the code for this part, I believe it will be of great help!
Looking forward to your reply!
I used the basic RandomForestClassifier model and found that the performance of the model on the test set was not good
clf=RandomForestClassifier(n_estimators=100, n_jobs=-1, random_state=42)
clf.fit(X,y) # train
y_train_pred=clf.predict(X)
y_test_pred=clf.predict(X_test)
from sklearn import metrics
# Model Accuracy, how often is the classifier correct?
print("Train Accuracy:",metrics.accuracy_score(y, y_train_pred))
print("Test Accuracy:",metrics.accuracy_score(y_test, y_test_pred))
Train Accuracy: 1.0
Test Accuracy: 0.6363636363636364
Hi, thanks for your reply!
For the patch-level classifier, I used a vit-tiny model based on the ImageNet pretrained weight.
I used the SVM classifier, but again I didn't get good results. Maybe it's because I'm not taking self-supervised weights?
Here is the SVM training code:
import numpy as np
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline
import pandas as pd
data = pd.read_csv('/data111/shaozc/Camelyon16/data_sheet_for_random_forest.csv', index_col=0)
#---->split train/test
data.index = data['name']
data['name'] = data['name'].map(lambda x: x.split('_')[0])
mask = data['name']=='test'
df_test = data.loc[mask]
df_test.reset_index(inplace=True, drop=True)
df_train = data.loc[~mask]
df_train.reset_index(inplace=True, drop=True)
X_train = df_train.iloc[:,2:].values
y_train = df_train.iloc[:,1].values
X_test = df_test.iloc[:,2:].values
y_test = df_test.iloc[:,1].values
clf = make_pipeline(StandardScaler(), SVC(gamma='auto', probability=True, random_state=42))
clf.fit(X_train, y_train)
from sklearn import metrics
print("Train Accuracy:",metrics.accuracy_score(y_train, clf.predict(X_train)))
print("Test Accuracy:",metrics.accuracy_score(y_test, clf.predict(X_test)))
print("Train Auc:",metrics.roc_auc_score(y_train, clf.predict_proba(X_train)[:, 1]))
print("Test Auc:",metrics.roc_auc_score(y_test, clf.predict_proba(X_test)[:, 1]))
Here is the csv file : data_sheet_for_random_forest.csv
By the way, for the third step of semi-supervised training, I think there is a problem here.
Because train_tumor_idx
include all training indices, unlabeled_train_idx
should be separated from tumor_labeled_train_idx
.
tumor_unlabeled_train_sampler = SubsetRandomSampler(list(set(train_tumor_idx)-set(tumor_labeled_train_idx)))
normal_unlabeled_train_sampler = SubsetRandomSampler(list(set(train_normal_idx)-set(normal_labeled_train_idx)))
Hi, thanks for your reply! For the patch-level classifier, I used a vit-tiny model based on the ImageNet pretrained weight. I used the SVM classifier, but again I didn't get good results. Maybe it's because I'm not taking self-supervised weights? Here is the SVM training code:
import numpy as np from sklearn.svm import SVC from sklearn.preprocessing import StandardScaler from sklearn.pipeline import make_pipeline import pandas as pd data = pd.read_csv('/data111/shaozc/Camelyon16/data_sheet_for_random_forest.csv', index_col=0) #---->split train/test data.index = data['name'] data['name'] = data['name'].map(lambda x: x.split('_')[0]) mask = data['name']=='test' df_test = data.loc[mask] df_test.reset_index(inplace=True, drop=True) df_train = data.loc[~mask] df_train.reset_index(inplace=True, drop=True) X_train = df_train.iloc[:,2:].values y_train = df_train.iloc[:,1].values X_test = df_test.iloc[:,2:].values y_test = df_test.iloc[:,1].values clf = make_pipeline(StandardScaler(), SVC(gamma='auto', probability=True, random_state=42)) clf.fit(X_train, y_train) from sklearn import metrics print("Train Accuracy:",metrics.accuracy_score(y_train, clf.predict(X_train))) print("Test Accuracy:",metrics.accuracy_score(y_test, clf.predict(X_test))) print("Train Auc:",metrics.roc_auc_score(y_train, clf.predict_proba(X_train)[:, 1])) print("Test Auc:",metrics.roc_auc_score(y_test, clf.predict_proba(X_test)[:, 1]))
Here is the csv file : data_sheet_for_random_forest.csv
Have you solved this issue?