Check out KAL-Nets

Question

Check out KAL-Nets

Opened this issue 5 months ago · 5 comments

Would really love your feedback: https://github.com/1ssb/torchkan

Using Legendre Polynomials instead; ~98% on MNIST.

Answer 1 · 2024-05-13T14:19:51.000Z

Wow i see the recent 99.5% on MNIST! that's really impressive! I havnt test the difference between Legendre and Chebyshev (or other polys). Also normalizing x using min-max might be better! Really appreciate that!

Answer 2 · 2024-05-13T14:34:12.000Z

If you take a look I am using monomial bases; so does this mean, MLPs have actually been doing this all along? It constitutes to blur fine lines of representation equivalencies. I am actively working on other topics and will keep updating as I keep progressing.

Answer 3 · 2024-05-13T16:40:22.000Z

im also thinking about that. i think monomial bases are the same as those orthogonal polys. as mentioned in #3, without grid, KAN = LAN + custom activation func. im not 100% sure if its equal to MLP with GLU, but its similar enough. Ive tested the ChebyKAN with equivalent (same param) MLP on MNIST and no abvious advantage is observed. ChebyKAN even performs worse when the degree is high.

Answer 4 · 2024-05-13T16:47:00.000Z

I mean GLU creates a switching effect which is studied in Signal theory to have a very specific effect on transformations....but yes I agree witb the non gated custom activation, thats practically all I am doing as well. Now really the question is if that is the case could we describe them as mathematical kernel operations like a transform and inverse transforms. Because if that is the case, we can literally start treating networks with a systems approach.

Answer 5 · 2024-05-13T17:09:33.000Z

(sorry im not quite familiar with GLU. I might be wrong on that. )
Im not sure if its capable. My poor math knowledge cant find a way to transform that to a kernel operation.
but thats a brilliant idea. It makes me feel like it's related to some essence of MLPs.