IvanoLauriola/MKLpy

Need a suggestion.

Yuvi-416 opened this issue · 9 comments

I know that cross_value_score function internally uses K-fold method, but I want to use leave one out cross-validation method. So my question is how can I do it?
Your suggestion might help me.

Hi,
that part (i.e. model selection) is currently not documented at all.
We'll provide a concrete set of examples by tomorrow, thanks for the notification

Hi there,
Thanks for your quick reply.
I am suffering from another problem (Please go through it once). I have uploaded one zip file for you. In that zip folder you will find two code files (MKL-try-3 and MKL-try-4). When I run MKL-try-3 (using sklearn) I don't get any error, it shows me the output, but whenever I try to run the same code with EasyMKl function as shown in MKL-try-4, I get the error as saying;

Traceback (most recent call last):
File "C:/Users/yubra/PycharmProjects/FRO_COM_NEUROSCI/MKL-CODE-TRy/MKL-try-4.py", line 68, in
clf1 = clf.fit(X_train, y_train)
File "C:\Users\yubra\Anaconda3\envs\FRO_COM_NEUROSCI_1\lib\site-packages\MKLpy\algorithms\base.py", line 65, in fit
self._prepare(KL, Y)
File "C:\Users\yubra\Anaconda3\envs\FRO_COM_NEUROSCI_1\lib\site-packages\MKLpy\algorithms\base.py", line 58, in _prepare
self.KL, self.Y = check_KL_Y(KL,Y)
File "C:\Users\yubra\Anaconda3\envs\FRO_COM_NEUROSCI_1\lib\site-packages\MKLpy\utils\validation.py", line 66, in check_KL_Y
KL = check_KL(KL)
File "C:\Users\yubra\Anaconda3\envs\FRO_COM_NEUROSCI_1\lib\site-packages\MKLpy\utils\validation.py", line 58, in check_KL
check_squared(KL[0])
File "C:\Users\yubra\Anaconda3\envs\FRO_COM_NEUROSCI_1\lib\site-packages\MKLpy\utils\validation.py", line 31, in check_squared
check_array(K)
File "C:\Users\yubra\Anaconda3\envs\FRO_COM_NEUROSCI_1\lib\site-packages\sklearn\utils\validation.py", line 556, in check_array
"if it contains a single sample.".format(array))
ValueError: Expected 2D array, got 1D array instead:
array=[2.5781695 3.50904135 2.9517885 2.9519358 3.50891378 2.95198938
2.95210093 3.5088704 3.50895293 3.50883719 2.95186438 3.50894411
2.95194918 3.50902544 2.95173438 3.50876015 2.9519695 2.95178601
3.50895847 3.50926653 1.03961973 2.95191633 2.95209571 1.0396713
3.33200838 2.95190355 2.95163525 3.33205359 3.50886603 2.95196699
2.95200147 2.95175799 3.48493449 3.30411415 3.48201502 2.92988108
3.47990248 1.02861174 3.47834003 3.48379858 3.47438345 3.478586
3.48432221 2.93096387 3.48224838 3.30813808 3.48426933 3.46642392
2.9280971 3.48186123 3.47941113 2.92747446 2.92832666 2.92819417
2.92562444 2.92159013 2.92417815 3.48161812 2.92830958 2.92641703
3.48098095 3.4754099 3.47435292 2.92645064 3.48468625 3.48613185
1.03198013 3.47150035].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

Process finished with exit code 1

MKL-CODE-TRy.zip

Please give me some suggestion.

Ok, I quickly checked your code, and there are two main errors.

Firstly, in both your files you apply a train_test_split to a kernel. That function (sklearn.model_selection.train_test_split) is not designed to split a kernel, but it can split a samples matrix. The result is really different.

correct way
Assume you have 10000 data-points with 100 features.
If you want to split them into training (70%) and test (30%) you can use that method, receiving two samples matrices of 7000x100 and 3000x100.
Then, you can compute the training and test kernels obtaining two kernel matrices of 7000x7000 and 3000x7000.

what I found in the file
If you compute the kernel with the whole samples matrix you have a kernel matrix of 10000x10000 entries.
If you pass that matrix to train_test_split you receive two matrices of 7000x10000 and 3000x10000.
You can easily understand that the shapes are wrong, but what happens?
Answer: train_test_split treats the input as a matrix of shape n_samples x n_features, but your kernel is n_samples x n_samples! So you cannot use train_test_split to divide a kernel matrix!

Additionally, if you want to use a precomputed kernel with the sklearn.svm.SVC classifier, you need to specify this!

SVC(C=10, kernel='preocmputed')

Otherwise, the SVC classifier treats the input as a samples matrix, but in your code the input is supposed to be a kernel. In other words, your code in MKL-try-3.py does not explode when you run it, but the result computed is wrong.
Check the scikit-learn tutorials about how to combine precomputed kernels with GridSearchCV! It may be not trivial.

Concerning the second error, I found that you're passing a single matrix to EasyMKL (remember that here you also have the same split error previously described). A MKL algorithm wants a list of kernels and not a single one.
I suggest you to check out our tutorials on readthedocs.
Even if the complete documentation is still not available, I think those tutorials are mature enough to clarify your doubts.

Thanks four your notification, your error helps us to improve our error messages and exceptions handling!

Hello there,
Once again thank you for your awesome explanation, your explanation really helped me a lot.
Till today, I was working on a 0.4.3 version, so there was no problem at all for me to load .CSV file and everything was fine. But today while I was trying to use polynomial metrics from MKL, for that I need to upgrade my MKLpy version ton 0.5 or 0.5.1. So, I upgraded it, but now the problem for me is that I can't load the CSV file into it.
So, please would you suggest to me some steps on how to load the CSV file into it in a simple way.
Looking for your reply.

please provide us additional details i.e. the code that generates the error and the error message.

That error is really annoying. I carefully checked the code and tried to replicate the error.
Specifically, I used the latest version of MKLpy with torch 1.5.0.

I tried to read a CSV file to simulate your code, and everything worked.
My suggestion is to check the values of X1 after
X1 = dataset1.iloc[:, 1:].values
Be sure that X1 is not an ndarray of objects.
In case you can print X1 and show me the output

Concerning your last question, yes you can.
MKL is widely used in information fusion to combine kernels (linear, polynomial, or whatever) developed with different feature sets. So basically you can create a kernel for each view of your problem and then you can use a MKL algorithm to combine them.

I observed that you used multiple times in your code something like k = k1+k2+k3+...
That's the key of the problem!
You've made a kernel that is the simple summation of base kernels. Then, you tried to use a MKL algorithm like a simple SVM. The MKL algorithm wants a list of kernels as input and not a single one.
In other words, you do not have to set the combination (k7 = k1+k2) a priori.
The correct way is
EasyMKL().fit([k1, k2], Ytr1)

Does this solve your problem?

n.b. check that your gridsearch treats the input of the SVC as a kernel!

I tried your code and I found the error.
Basically, in the script MKL-using-homogenous-polynomial you define k1 and k2 as two lists of kernels. The cross-validation works as you're passing k1 as input.
However, this instruction
clf = EasyMKL(learner=base_learner, lam=best_results['lam']).fit([k1, k2], Ytr1)
contains a mistake. Specifically, your're passing [k1,k2] that is a list of lists of kernels, and not a list of kernels.
If you want to use kernels from both k1 and k2 you can concatenate the two lists
clf = EasyMKL(learner=base_learner, lam=best_results['lam']).fit(k1+k2, Ytr1)

In the other script, i.e. MKL-using-polynomial you're using cross-validation with a single kernel and not with a list of kernels (in that script, k1 is a kernel)