What is the traning and inference speedup compared with the vanilla transformers?
JohnHerry opened this issue · 2 comments
Hi, is there any research about the kan based transformer VS vanilla transofmers? to get the same proformance (or better), how much can we get on memory shrink or inference speedup?
Hello @JohnHerry,
Regarding KAN-based Transformers
vs. vanilla Transformers
, I don’t have much additional information beyond what we’ve discussed. Some sources mention using KAN-based Transformers
, but they often apply KAN to only select layers rather than the entire model. This paper, however, claims that KAN-based Transformers offer improved performance, though with a slight trade-off in speed.
Before our work, there hasn’t been much successful research on this, mainly because the feasibility of implementing KAN fully in transformer has been a challenge, making a fair comparison difficult.
Hello @JohnHerry,
Regarding
KAN-based Transformers
vs.vanilla Transformers
, I don’t have much additional information beyond what we’ve discussed. Some sources mention usingKAN-based Transformers
, but they often apply KAN to only select layers rather than the entire model. This paper, however, claims that KAN-based Transformers offer improved performance, though with a slight trade-off in speed.Before our work, there hasn’t been much successful research on this, mainly because the feasibility of implementing KAN fully in transformer has been a challenge, making a fair comparison difficult.
Get it, thank you very much.