What is the traning and inference speedup compared with the vanilla transformers?

Question

What is the traning and inference speedup compared with the vanilla transformers?

JohnHerry opened this issue 4 months ago · 2 comments

Hi, is there any research about the kan based transformer VS vanilla transofmers? to get the same proformance (or better), how much can we get on memory shrink or inference speedup?

Answer 1 · 2024-09-27T08:28:41.000Z

Hello @JohnHerry,

Regarding KAN-based Transformers vs. vanilla Transformers, I don’t have much additional information beyond what we’ve discussed. Some sources mention using KAN-based Transformers, but they often apply KAN to only select layers rather than the entire model. This paper, however, claims that KAN-based Transformers offer improved performance, though with a slight trade-off in speed.

Before our work, there hasn’t been much successful research on this, mainly because the feasibility of implementing KAN fully in transformer has been a challenge, making a fair comparison difficult.

Answer 2 · 2024-09-27T08:42:09.000Z

Hello @JohnHerry,

Regarding KAN-based Transformers vs. vanilla Transformers, I don’t have much additional information beyond what we’ve discussed. Some sources mention using KAN-based Transformers, but they often apply KAN to only select layers rather than the entire model. This paper, however, claims that KAN-based Transformers offer improved performance, though with a slight trade-off in speed.

Before our work, there hasn’t been much successful research on this, mainly because the feasibility of implementing KAN fully in transformer has been a challenge, making a fair comparison difficult.

Get it, thank you very much.