Gscorreia89/pyChemometrics

PLSDA seems to lose "X" at some point

bpsut opened this issue · 2 comments

bpsut commented

Hello, I just found your repo and was trying to use it to run some cross-validation on multiclass PLS-DA, but was having some issues. I get the following error:

Error

TypeError Traceback (most recent call last)
Cell In[100], line 24
22 #clf = PLSRegression()
23 clf = pyChemometrics.ChemometricsPLSDA(ncomps=2)
---> 24 clf.fit(X_train, y_train2)
25 clf.predict(X_train)

File ~/mambaforge/envs/pirc/lib/python3.9/site-packages/pyChemometrics/ChemometricsPLSDA.py:180, in ChemometricsPLSDA.fit(self, x, y, **fit_params)
178 if self.n_classes > 2:
179 R2Y = ChemometricsPLS.score(self, x=x, y=dummy_mat, block_to_score='y')
--> 180 R2X = ChemometricsPLS.score(self, x=x, y=dummy_mat, block_to_score='x')
181 else:
182 R2Y = ChemometricsPLS.score(self, x=x, y=y, block_to_score='y')

File ~/mambaforge/envs/pirc/lib/python3.9/site-packages/pyChemometrics/ChemometricsPLS.py:386, in ChemometricsPLS.score(self, x, y, block_to_score, sample_weight)
384 xscaled = deepcopy(self.x_scaler).fit_transform(x)
385 # Calculate total sum of squares of X and Y for R2X and R2Y calculation
--> 386 xpred = self.x_scaler.transform(ChemometricsPLS.predict(self, x=None, y=y))
387 tssx = np.sum(np.square(xscaled))
388 rssx = np.sum(np.square(xscaled - xpred))

File ~/mambaforge/envs/pirc/lib/python3.9/site-packages/pyChemometrics/ChemometricsPLS.py:431, in ChemometricsPLS.predict(self, x, y)
428 # Predict X from Y
429 elif y is not None:
430 # Going through calculation of U and then X = Ub_uW'
--> 431 u_scores = ChemometricsPLS.transform(self, x=None, y=y)
432 predicted = np.dot(np.dot(u_scores, self.b_u), self.weights_w.T)
433 if predicted.ndim == 1:

TypeError: wrapped() missing 1 required positional argument: 'X'

when I try to run the following dummy code:

import pyChemometrics

X_train = np.array([[1,1,1,0,0,0],
                    [1,1,1,0,0,0],
                    [0,0,1,1,0,0],
                    [0,0,1,1,0,0],
                    [0,0,0,0,1,1],
                    [0,0,0,0,1,1],
                    [0,1,0,1,0,1],
                    [0,1,0,1,0,1]])
y_train = np.array([[1,0,0,0],
                    [1,0,0,0],
                    [0,1,0,0],
                    [0,1,0,0],
                    [0,0,1,0],
                    [0,0,1,0],
                    [0,0,0,1],
                    [0,0,0,1]])
y_train2 = np.array([0,0,1,1,2,2,3,3])

#clf = PLSRegression()
clf = pyChemometrics.ChemometricsPLSDA(ncomps=2)
clf.fit(X_train, y_train2)
clf.predict(X_train)

I also noticed that using a one-hot encoded array for y does not seem to work because np.unique() doesn't seem to understand the unique rows. As it stands your code keeps returning that y_train has 2 unique classes.

When I used the PLS, I met the same issue "ChemometricsPLS.transform() missing 1 required positional argument: 'X'."

@gettingthestars I pushed a fix for this, let me know if its working for you now. I am also doing some patching for the multi-class setting, but that is not yet fully fixed.