covartech/PRT

M-ary dataSet in feature selection

anaritam opened this issue · 9 comments

Hi,

Is there a way to use feature selection with a dataSet with 3 classes (besides prtFeatSelStatic) ?

I'm using a DataSet with 7 features and 3 classes, I would like my code to choose from all 7 features, the ones that work better.

Thanks,
Ana

Hi Ana,

Something like the following should help:

ds1 = prtDataGenMarysSimpleSixClass;
ds2 = prtDataGenMarysSimpleSixClass;
dsTotal = catFeatures(ds1,ds2); %total of 4 features, 6 classes
nFolds = 3;

% Find the best 3 features using a KNN classifier:
knn = prtClassKnn;
featSel = prtFeatSelSfs('nFeatures',3,'evaluationMetric',@(ds)prtEvalPercentCorrect(knn,ds,nFolds));
featSel = featSel.train(dsTotal);

dsSelected = featSel.run(dsTotal);
plot(dsSelected) %Has the best 3 features!

Hi,
Can't use what you said in my data :x
This is what I have:

dataSet = prtDataSetClass(features_train,labels_train);
nStdRemove = prtOutlierRemovalNStd('runMode','removeObservation');
nStdRemove = nStdRemove.train(dataSet);
dataSetNew = nStdRemove.run(dataSet);

featSel = prtFeatSelSfs; % Create a feature selction object
featSel.nFeatures = 3; % Select only one feature of the data
featSel = featSel.train(dataSetNew); % Train the feature selection object
outDataSet = featSel.run(dataSetNew);

features_train is a nSamples x 7 matrix and labels_train is a nSamples x 1 matrix

My code can't run the last code line and says

"Error using prtClass/determineMaryOutput (line 310)
M-ary classification is not supported by this classifier. You will need to use prtClassBinaryToMaryOneVsAll() or an equivalent M-ary emulation classifier."

Hello,

I think you need to do two things:

  1. Specify a classifier that can handle M-ary data (e.g., prtClassKnn)
  2. Specify an evaluation that scores multi-class outputs (e.g., prtEvalPercentCorrect)

For example:

knn = prtClassKnn;
featSel = prtFeatSelSfs('nFeatures',3,'evaluationMetric',@(ds)prtEvalPercentCorrect(knn,ds,nFolds));
featSel = featSel.train(dsTotal);

Ok I manage to do this. I used

featSel = prtFeatSelSfs('nFeatures',nFeatures_used,'evaluationMetric',@(ds)prtEvalPercentCorrect(prtClassMap,ds));
featSel = featSel.train(dataSet);
outDataSet = featSel.run(dataSet);

my question now is: I can use this outDataSet like this:
classifier_7 = prtClassMap+ prtDecisionMap;
classifier_7 = classifier_7.train(outDataSet); % Train
classified_7 = run(classifier_7, dataSet_test);

in order to test the classifier?

Hi,

You need to also run:

OutDataSet_test = featSel.run(dataSet_test);
[...]
classified_7 = run(classifier_7, OutDataSet_test );

To apply the feature selection to your test dataset, otherwise the two data sets will have different numbers of features.

-Pete

What I did was actually this

selectedFeatures = featSel.selectedFeatures;
dataSet_test=retainFeatures(dataSet,selectedFeatures);

classifier_7 = prtClassMap+ prtDecisionMap;
classifier_7 = classifier_7.train(outDataSet); % Train
classified_7 = run(classifier_7, dataSet_test);

It's the same thing right?

Yes, that looks right.

I keep having this error

Error using prtRvMvn/logPdf (line 184)
SIGMA must be symmetric and positive definite;

Error in prtRv/runAction (line 249)
DataSet = DataSet.setObservations(Obj.logPdf(DataSet));

Error in prtAction/run (line 250)
dsOut = runAction(self, dsOut);

Error in prtClassMap/runAction (line 119)
logLikelihoods(:,iY) = getObservations(run(self.rvs(iY), ds));

Error in prtAction/run (line 250)
dsOut = runAction(self, dsOut);

Error in prtAction/crossValidate (line 369)
outputDataSetCell{uInd} = trainedAction.run(testDs);

Error in prtAction/kfolds (line 553)
[outputs{:}] = self.crossValidate(ds,keys);

Error in prtUtilEvalParseAndRun (line 35)
Results = classifier.kfolds(dataSet,nFolds);

Error in prtEvalPercentCorrect (line 58)
results = prtUtilEvalParseAndRun(classifier,dataSet,nFolds);

Error in @(ds)prtEvalPercentCorrect(prtClassMap,ds)

Error in prtFeatSelSfs/trainAction (line 149)
cPerformance(i) = Obj.evaluationMetric(tempDataSet);

Error in prtAction/train (line 221)
self = trainAction(self, ds);

whenever I try to use more than 2 features in the prtFeatSelSfs function. Don't understand it so I can't solve it...

Thanks for you help,
Ana

Hello,

This is technically a new issue, so please start a new issue for additional comments. But it sounds like your features are not linearly independent, or you have too few observations for at least one class in your data set.

prtRvMvn is trying to learn a covariance matrix from your data - e.g.,

cov(X(Y == 1,:))

And the result of this needs to be positive semi-definite, or it's impossible to learn a Multi-Variate Normal Gaussian variable...

You might try using a simpler classifier - e.g., KNN, which does not require a full-rank covariance matrix...