Try scheduling FlashAttention for "Chexo"
yamaguchi1024 opened this issue · 1 comments
yamaguchi1024 commented
Many neural network optimization and quantization methods may be a really good motivating example for "Chexo" because we probably never want to reason about the soundness of their numerical stability, etc.
rachitnigam commented
Couple of comments from @gilbo:
- There is no filed “Chexo” proposal in the issue tracker or written up anywhere, so it’s unreasonable for a general Exo developer to know what’s being referred to by “Chexo”
- The description of the idea here is not sufficiently fleshed out for someone else to understand. I’m not really sure what the idea is here and I’m familiar with all the terms. This reads a lot more like a note from Yuka to herself than an attempt to communicate with the rest of the team.
- This is not a clear bug report or actionable proposal for a feature/improvement
- There are actually multiple distinct ideas here: (1 is that FlashAttention is hard to schedule because it requires using algebraic identities for e^x that don’t exist in Exo; 2 is that there is some sort of issue/concern about quantization schemes; maybe there are more ideas)