fla-org/flash-linear-attention

🚀 Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton

PythonMIT

Issues

[RFC] Add RWKV7 kernels and models
#105 opened 15 days ago by yzhangcs
5
[RFC] Autotune should consider batch size and number of heads
#117 opened 10 days ago by sustcsonglin
1
[Bug]: KV Cache exploded
#91 opened a month ago by rakkit
6
[RFC] 🔥 Flame: a minimal training framework based on torchtitan
#108 opened 7 days ago by yzhangcs
6
Where can I find the training subset of SlimPajama dataset for your gla-340M-15B and gla-1.3B-100B models?
#118 opened 7 days ago by zmj1203
4
[Bug] failed to use deepspeed during training GLA
#112 opened 7 days ago by zmj1203
8
[Bug]: GSA and RWKV6 Occasionally Report Gradient=NAN when Backward
#77 opened 2 months ago by WorldEditors
16
[RFC] Varlen training
#115 opened 11 days ago by sustcsonglin
1
[RFC] Remove `head_first` kernel
#114 opened 11 days ago by sustcsonglin
0
Consider support macOS?
#113 opened 12 days ago by MonolithFoundation
2
[Bug]: Grad_norm & Loss are NAN when training Gated_Deltanet on fineweb-edu-10BT
#111 opened 15 days ago by Chris-city
8
[Bug]: Gated DeltaNet Recurrence with constant Gate at 1 vs. DeltaNet Recurrence
#104 opened 13 days ago by JulienSiems
4
[RFC] add Griffin and RecurrentGemma kernels
#110 opened 15 days ago by sustcsonglin
0
[RFC] Add xLSTM kernels
#109 opened 15 days ago by sustcsonglin
0
[RFC] Add TTT and Titans kernel
#107 opened 15 days ago by sustcsonglin
0
[RFC] Add YOCO models
#106 opened 15 days ago by yzhangcs
0
[Bug]: Input with offset (varlen) can not be normally trained
#103 opened 16 days ago by YufangMo
1
Question about the pseudocode in DeltaNet paper
#100 opened 20 days ago by xffxff
2
Issue with Unpack
#101 opened 21 days ago by ahatamiz
1
[Bug]: Bunches of Issues in Mamba and Mamba2
#90 opened a month ago by WorldEditors
5
[Bug]: NaN Values in `fwd_prepare_wy_repr` Output in `GatedDeltaNet`
#99 opened 24 days ago by xffxff
10
[Bug]: Shape mismatch, can't divide axis of length 1334 in chunks of 83
#98 opened a month ago by zmj1203
4
[Bug]: Triton Error with `GatedDeltaNet` in Triton 2.2.0 and 2.3.x
#97 opened a month ago by xffxff
3
[Bug]: Can not run ./training/preprocess.py, said that "ValueError: BuilderConfig 'sample-10BT' not found. Available: ['default']"
#96 opened a month ago by zmj1203
1
training details for your gla-340M-15B models
#94 opened a month ago by zmj1203
2
Code Explained | 问下代码是什么意思
#95 opened a month ago by guoguo1314
1
Will FLA further speedup Mamba or Mamba2?
#93 opened a month ago by codingWla
2
[Bug]: `recurrent_states` are difference between `chunk` and `fused_recurrent` in GSA
#92 opened a month ago by zml24
1
[Bug]: Call RWKV6Attention and report an error environment.
#74 opened a month ago by synbol
5
[Bug]: Triton Compiler Error when Sequence Length <= 8
#88 opened a month ago by WorldEditors
4
[Bug]: GSA backward error
#86 opened a month ago by zml24
3
[Bug]: Addressing the compound errors within inference time
#83 opened 2 months ago by WorldEditors
1
Varlen Support
#82 opened 2 months ago by zml24
2
Feature Request (or Guidance Request): Frame-by-Frame Processing with ONNX Export in Flash Linear Attention Model
#79 opened 2 months ago by LarocheC
3
[Bug]: multi-GPU, TypeError: 'NoneType' object is not a mapping
#66 opened 2 months ago by Spray-N
5
[Bug]: AttributeError: 'tuple' object has no attribute 'backward' in `benchmark_gla.py`
#80 opened 2 months ago by zml24
1
[Bug]: ImportError: cannot import name 'chunk_gated_abc' from 'fla.ops.abc' in `enchmark_gsa.py`
#81 opened 2 months ago by zml24
0
[Bug]: IndexError: map::at when use GatedLinearAttention
#75 opened 3 months ago by hieugiaosu
6
Does GSA layer support backward with caches?
#73 opened 3 months ago by WorldEditors
4
Produce T2R Experiments in Gated Slot Attention Paper
#69 opened 3 months ago by ching-sui1995
6
[Bug]: Backward pass with AMP not working with GLA and GSA
#70 opened 3 months ago by Niccolo-Ajroldi
4
Questions About custom attention mask
#71 opened 3 months ago by Rbrq03
2
why delta_net so slow in inference ?
#61 opened 3 months ago by ching-sui1995
7
[Bug]: H100 memory access violations in chunk_gla
#68 opened 4 months ago by SmerkyG
4
[Bug]: new autotune error running simple gla chunked
#67 opened 4 months ago by SmerkyG
4
[Bug]: H100 Triton 3.0.0 compile crash when using num_warps=8 in autotune
#58 opened 4 months ago by SmerkyG
1
RuntimeError: Triton Error [CUDA]: invalid argument
#64 opened 4 months ago by TiminHu
4
Checkpoints for 340M models
#53 opened 4 months ago by 0205090923
1
[Bug]: Mamba2 incorrect inference time behavior
#63 opened 4 months ago by zhixuan-lin
1
About `rescale_prenorm_residual` default value in Mamba 2
#60 opened 4 months ago by zhixuan-lin
1