fla-org/flash-linear-attention
๐ Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton
PythonMIT
Issues
- 5
[RFC] Add RWKV7 kernels and models
#105 opened by yzhangcs - 1
- 6
[Bug]: KV Cache exploded
#91 opened by rakkit - 6
- 4
Where can I find the training subset of SlimPajama dataset for your gla-340M-15B and gla-1.3B-100B models?
#118 opened by zmj1203 - 8
[Bug] failed to use deepspeed during training GLA
#112 opened by zmj1203 - 16
- 1
[RFC] Varlen training
#115 opened by sustcsonglin - 0
[RFC] Remove `head_first` kernel
#114 opened by sustcsonglin - 2
Consider support macOS?
#113 opened by MonolithFoundation - 8
[Bug]: Grad_norm & Loss are NAN when training Gated_Deltanet on fineweb-edu-10BT
#111 opened by Chris-city - 4
[Bug]: Gated DeltaNet Recurrence with constant Gate at 1 vs. DeltaNet Recurrence
#104 opened by JulienSiems - 0
[RFC] add Griffin and RecurrentGemma kernels
#110 opened by sustcsonglin - 0
[RFC] Add xLSTM kernels
#109 opened by sustcsonglin - 0
[RFC] Add TTT and Titans kernel
#107 opened by sustcsonglin - 0
[RFC] Add YOCO models
#106 opened by yzhangcs - 1
- 2
Question about the pseudocode in DeltaNet paper
#100 opened by xffxff - 1
Issue with Unpack
#101 opened by ahatamiz - 5
[Bug]: Bunches of Issues in Mamba and Mamba2
#90 opened by WorldEditors - 10
- 4
- 3
- 1
[Bug]: Can not run ./training/preprocess.py, said that "ValueError: BuilderConfig 'sample-10BT' not found. Available: ['default']"
#96 opened by zmj1203 - 2
training details for your gla-340M-15B models
#94 opened by zmj1203 - 1
Code Explained | ้ฎไธไปฃ็ ๆฏไปไนๆๆ
#95 opened by guoguo1314 - 2
Will FLA further speedup Mamba or Mamba2?
#93 opened by codingWla - 1
[Bug]: `recurrent_states` are difference between `chunk` and `fused_recurrent` in GSA
#92 opened by zml24 - 5
- 4
- 3
[Bug]: GSA backward error
#86 opened by zml24 - 1
- 2
Varlen Support
#82 opened by zml24 - 3
Feature Request (or Guidance Request): Frame-by-Frame Processing with ONNX Export in Flash Linear Attention Model
#79 opened by LarocheC - 5
- 1
[Bug]: AttributeError: 'tuple' object has no attribute 'backward' in `benchmark_gla.py`
#80 opened by zml24 - 0
[Bug]: ImportError: cannot import name 'chunk_gated_abc' from 'fla.ops.abc' in `enchmark_gsa.py`
#81 opened by zml24 - 6
- 4
Does GSA layer support backward with caches?
#73 opened by WorldEditors - 6
- 4
- 2
Questions About custom attention mask
#71 opened by Rbrq03 - 7
why delta_net so slow in inference ?
#61 opened by ching-sui1995 - 4
[Bug]: H100 memory access violations in chunk_gla
#68 opened by SmerkyG - 4
- 1
- 4
- 1
Checkpoints for 340M models
#53 opened by 0205090923 - 1
- 1