Polynomial kernel approximation for large samples

Question

Polynomial kernel approximation for large samples

Closed this issue 5 years ago · 6 comments

Hi @jeffrey-hokanson, I found this library last week and I think it's great. I would like to compare the polynomial approximation to machine learning approaches like neural networks. However, I tried running the library and got an error when trying to use 50,000 samples. The SVD in PolynomialRidgeApproximation._finish requires too much memory and I was wondering if it's really necessary. Could it be replaced with linalg.lstsq? I don't fully understand what is going on there, but from the description it's trying to find the c coefficients that match the optimal U, isn't it?

Answer 1 · 2019-09-30T22:15:45.000Z

This SVD isn't really necessary; it used to standardize the output such that the active subspace points in a consistent direction between iterations. You can now turn it off by setting the option "rotate=False" in PolynomialRidgeApproximation.

That being said, there was a small bug in this code---only the short-form SVD was necessary. With that option requested in commit 0060000 in the memory requirements drop substantially. So it should work fine with the defaults now.

Would you be willing to share your test data so I can further explore performance on this data set?

Answer 2 · 2019-09-30T23:37:30.000Z

I'd also like to note that the new initializer for poylnomial ridge approximation may be slow with this scale of data. In that case, it is best to initialize yourself by calling

import psdr
from psdr.initialization import initialize_subspace
pra = psdr.PolynomialRidgeApproximation(degree = 2, subspace_dimension = 1)
U0 = initialize_subspace(X, fX, n_grads = 10)
pra.fit(X, fX, U0 = U0[:,pra.subspace_dimension])

Only call the U0 part once and reuse it on further experiments. This change was added to make the code more robust on small problems, but is expensive for problems the scale of MNIST.

Answer 3 · 2019-10-02T07:01:23.000Z

hi! thanks for such a quick reply, and actually changing the code!!!

so far I have only performed a "smoke test", to see if the code returns reasonable results

from psdr import PolynomialRidgeApproximation as PRA
import numpy as np

sims = 50000
X = np.random.normal(0,1,(sims, 6))
y = X[:, 1]**2 + 3*(2*X[:, 0]+5*X[:, 2])**3+X[:, 4]
X_val = np.random.normal(0, 1, (sims, 6))
y_val = X_val[:, 1]**2 + 3*(2*X_val[:, 0]+5*X[:, 2])**3+X_val[:, 4]
pra = PRA(degree=3, subspace_dimension=3, basis='hermite', scale=True, rotate=False) 
pra.fit(X, y)
np.linalg.norm(pra.eval(X_val) - y_val, 2) #returns 536457.4715138731

It works well! Similar results to a neural network of comparable size.

from sklearn.neural_network import MLPRegressor
nn = MLPRegressor(hidden_layer_sizes=(20, ), solver='lbfgs')
nn.fit(X, y)
np.linalg.norm(nn.predict(X_val) - y_val, 2) #returns 536228.1016175864

I will start comparing to other methods. Before finding your paper last week, I had already coded another way of doing something similar (linear dimensionality reduction on a polynomial basis), based on riemannian BFGS. If you are interested I can share the comparison when ready.

Answer 4 · 2019-10-02T08:13:03.000Z

btw, would you accept a PR to add a predict method to PolynomialRidgeApproximation?

Answer 5 · 2019-10-02T16:15:49.000Z

This was an easy feature to add that I should have done a while ago. I try to follow the sklearn API pretty closely, so this is a good feature for interoperability. If you have any other suggestions or improvements please feel free to let me know or issue a PR.

Answer 6 · 2019-10-02T20:16:47.000Z

thanks! I'll let you know if I think of something else as I use it more