ZPZhou-lab/tfkan

About implementing functions from tf and tfKAN interleaved

Opened this issue · 2 comments

Thank you for the library. I am curious if we can use both the libraries to write functions in the same model (say, many FC layers with the output layer is a DenseKAN). If this is not the case, would you mind suggesting me an alternation?
Thanks

In my opinion, the FC layer with activation function (tf.keras.layers.Dense()) and the KAN layer (tfkan.layers.DenseKAN()) enhance the non-linear expression ability of the model through two different approaches. The former mainly involves matrix multiplication, while the latter transfers the calculation to the spline function, only requiring adjustment of the spline coefficients.

Therefore, using them is like using different bases to represent the target you want to fit. For example, for any function f. If you use polynomial space as the basis (i.e. 1, x, x^2, x^3, ...) for expression, you will get the classical Taylor expansion expression of f:

  • f(x) = \sum_{n=0}^{\infty} (n!)^{-1} f^{(n)}(0) * x^n

and if you use trigonometric function space as the basis (i.e. 1, sin(x), cos(x), sin(2x), cos(2x), ...) for expression, you will get the Fourier expansion expression of f:

  • f(x) = a_0 + \sum_{n=1}^{\infty} (a_n * cos(nx) + b_n * sin(nx))

The classical MLP layer Dense() and KAN layer DenseKAN() are like using two different bases to fit the functional relationships in the data. KAN shifts the complexity of the model to the calculation of spline so as to enhancing the smoothness and flexibility of KAN's bases. Therefore, in the introduction and experiments of KAN authors, KAN may only require fewer model parameters to achieve good performance.

Here are some ideas about constructing a model mixed tf and tfkan:

  • If your task does not require a large model and you want to obtain a good explanation of the network connection structure (which is also the main focus of the KAN author), then constructing a smaller scale shallow DenseKAN is a suitable choice (in the author's introduction, such networks are mostly within 3 layers)
  • If your task requires a large model to achieve good approximation ability, in my observation, DenseKAN() will not have a greater performance advantage than FC Dense(). The deep MLP using residual connections in the architecture of the model now also has excellent non-linear expression ability. In addition, It should be noted that for large-scale neural networks, the training and inference efficiency of DenseKAN() has not been optimized yet

Hope this can help u~ 🤗

I am so sorry I was confused about the use of the buttons. Thanks for answering my question.