JoyHuYY1412/DDE_CIL

There seems to be information leakage during testing

libo-huang opened this issue · 3 comments

My concrete question is:
Is it an appropriate setting that uses all training data from previous tasks to calculate the accuracy of the current incremental model?

As shown in lines 497 and 502 below, the class_means are obtained from matrix D & D2, and DDE_CIL uses class_means to calculate the final output accuracy.
However, the matrix D & D2 are obtained from the prototypes (line 474), which contain all training datasets of the learned tasks and the current task.
We can track that with two steps.

evalset.test_data = prototypes[iteration2*args.nb_cl+iter_dico].astype('uint8')
evalset.test_labels = np.zeros(evalset.test_data.shape[0]) # zero labels
evalloader = torch.utils.data.DataLoader(evalset, batch_size=eval_batch_size,
shuffle=False, num_workers=2)
num_samples = evalset.test_data.shape[0]
mapped_prototypes = compute_features(tg_feature_model, evalloader, num_samples, num_features)
D = mapped_prototypes.T
D = D/np.linalg.norm(D,axis=0)
# Flipped version also
evalset.test_data = prototypes[iteration2*args.nb_cl+iter_dico][:,:,:,::-1].astype('uint8')
evalloader = torch.utils.data.DataLoader(evalset, batch_size=eval_batch_size,
shuffle=False, num_workers=2)
mapped_prototypes2 = compute_features(tg_feature_model, evalloader, num_samples, num_features)
D2 = mapped_prototypes2.T
D2 = D2/np.linalg.norm(D2,axis=0)
# iCaRL
alph_icarl = alpha_dr_herding[iteration2,:,iter_dico]
alph_icarl = (alph_icarl>0)*(alph_icarl<nb_protos_cl+1)*1.
X_protoset_cumuls.append(prototypes[iteration2*args.nb_cl+iter_dico,np.where(alph_icarl==1)[0]])
Y_protoset_cumuls.append(order[iteration2*args.nb_cl+iter_dico]*np.ones(len(np.where(alph_icarl==1)[0])))
alph_icarl = alph_icarl/np.sum(alph_icarl)
class_means[:,current_cl[iter_dico],0] = (np.dot(D,alph_icarl)+np.dot(D2,alph_icarl))/2
class_means[:,current_cl[iter_dico],0] /= np.linalg.norm(class_means[:,current_cl[iter_dico],0])
# Normal NCM
alph_NCM = np.ones(dictionary_size)/dictionary_size
class_means[:,current_cl[iter_dico],1] = (np.dot(D,alph_NCM)+np.dot(D2,alph_NCM))/2
class_means[:,current_cl[iter_dico],1] /= np.linalg.norm(class_means[:,current_cl[iter_dico],1])

1. Firstly, the prototypes in line 474 is constructed by X_train_total as shown in line 182 below,

prototypes = np.zeros((args.num_classes,dictionary_size,X_train_total.shape[1],X_train_total.shape[2],X_train_total.shape[3]))
for orde in range(args.num_classes):
prototypes[orde,:,:,:,:] = X_train_total[np.where(Y_train_total==order[orde])]

2. Further, the X_train_total (line 138) is obtained from the CIFAR100 trainset (line 128),

trainset = torchvision.datasets.CIFAR100(root='./data', train=True,
download=True, transform=transform_train)
testset = torchvision.datasets.CIFAR100(root='./data', train=False,
download=True, transform=transform_test)
evalset = torchvision.datasets.CIFAR100(root='./data', train=False,
download=False, transform=transform_test)
# Initialization
dictionary_size = 500
nb_inc = int((args.num_classes-args.nb_cl_fg)/args.nb_cl +1)
X_train_total = np.array(trainset.train_data)
Y_train_total = np.array(trainset.train_labels)

I hold that using all training data from previous tasks to calculate the accuracy seems to be information leakage.

My concrete question is: Is it an appropriate setting that uses all training data from previous tasks to calculate the accuracy of the current incremental model?

As shown in lines 497 and 502 below, the class_means are obtained from matrix D & D2, and DDE_CIL uses class_means to calculate the final output accuracy. However, the matrix D & D2 are obtained from the prototypes (line 474), which contain all training datasets of the learned tasks and the current task. We can track that with two steps.

evalset.test_data = prototypes[iteration2*args.nb_cl+iter_dico].astype('uint8')
evalset.test_labels = np.zeros(evalset.test_data.shape[0]) # zero labels
evalloader = torch.utils.data.DataLoader(evalset, batch_size=eval_batch_size,
shuffle=False, num_workers=2)
num_samples = evalset.test_data.shape[0]
mapped_prototypes = compute_features(tg_feature_model, evalloader, num_samples, num_features)
D = mapped_prototypes.T
D = D/np.linalg.norm(D,axis=0)
# Flipped version also
evalset.test_data = prototypes[iteration2*args.nb_cl+iter_dico][:,:,:,::-1].astype('uint8')
evalloader = torch.utils.data.DataLoader(evalset, batch_size=eval_batch_size,
shuffle=False, num_workers=2)
mapped_prototypes2 = compute_features(tg_feature_model, evalloader, num_samples, num_features)
D2 = mapped_prototypes2.T
D2 = D2/np.linalg.norm(D2,axis=0)
# iCaRL
alph_icarl = alpha_dr_herding[iteration2,:,iter_dico]
alph_icarl = (alph_icarl>0)*(alph_icarl<nb_protos_cl+1)*1.
X_protoset_cumuls.append(prototypes[iteration2*args.nb_cl+iter_dico,np.where(alph_icarl==1)[0]])
Y_protoset_cumuls.append(order[iteration2*args.nb_cl+iter_dico]*np.ones(len(np.where(alph_icarl==1)[0])))
alph_icarl = alph_icarl/np.sum(alph_icarl)
class_means[:,current_cl[iter_dico],0] = (np.dot(D,alph_icarl)+np.dot(D2,alph_icarl))/2
class_means[:,current_cl[iter_dico],0] /= np.linalg.norm(class_means[:,current_cl[iter_dico],0])
# Normal NCM
alph_NCM = np.ones(dictionary_size)/dictionary_size
class_means[:,current_cl[iter_dico],1] = (np.dot(D,alph_NCM)+np.dot(D2,alph_NCM))/2
class_means[:,current_cl[iter_dico],1] /= np.linalg.norm(class_means[:,current_cl[iter_dico],1])

1. Firstly, the prototypes in line 474 is constructed by X_train_total as shown in line 182 below,

prototypes = np.zeros((args.num_classes,dictionary_size,X_train_total.shape[1],X_train_total.shape[2],X_train_total.shape[3]))
for orde in range(args.num_classes):
prototypes[orde,:,:,:,:] = X_train_total[np.where(Y_train_total==order[orde])]

2. Further, the X_train_total (line 138) is obtained from the CIFAR100 trainset (line 128),

trainset = torchvision.datasets.CIFAR100(root='./data', train=True,
download=True, transform=transform_train)
testset = torchvision.datasets.CIFAR100(root='./data', train=False,
download=True, transform=transform_test)
evalset = torchvision.datasets.CIFAR100(root='./data', train=False,
download=False, transform=transform_test)
# Initialization
dictionary_size = 500
nb_inc = int((args.num_classes-args.nb_cl_fg)/args.nb_cl +1)
X_train_total = np.array(trainset.train_data)
Y_train_total = np.array(trainset.train_labels)

I hold that using all training data from previous tasks to calculate the accuracy seems to be information leakage.

Thank you for reading our codes in detail, as you must find our paper novel and interesting.

from the code segments you showed, I think those codes are copied from LUCIR (https://github.com/hshustc/CVPR19_Incremental_Learning), and I think the matrix D & D2 are used to calculate NCM accuracy. Usually, the NCM accuracy is also calculated in an ideal case and not reported as the final result.

I hold that I did not make unfair comparisons :)

Thanks for your rapid reply and your interesting causal perspective of CIL.

Another consultation is whether the proposed method's experiments with no replay data are achieved by setting the number of prototypes per class at the end to zero.
For example, setting the parameter, --nb_protoss, in the file cifar100-class-incremental/class_incremental_cosine_cifar100.py to $0$ for the experiments on CIFAR100.

Thanks for your rapid reply and your interesting causal perspective of CIL.

Another consultation is whether the proposed method's experiments with no replay data are achieved by setting the number of prototypes per class at the end to zero. For example, setting the parameter, --nb_protoss, in the file cifar100-class-incremental/class_incremental_cosine_cifar100.py to 0 for the experiments on CIFAR100.

Yes, you are right.