Better inference for M1(text to semantic) model

Question

Better inference for M1(text to semantic) model

rasenganai opened this issue a year ago · 0 comments

M1 is a decoder only model built on gpt, hence we can leverage the work on LLMS to speed up the model outputs.
KV caching and efficient attention working can be a start.