forhaoliu/ringattention

Transformers with Arbitrarily Large Context

PythonApache-2.0

Issues

Llama 3 ring attention implementation for inference
#21 opened 6 months ago by joshpopelka20gmail
1
This work doesn't change kernel, but utilize dependency to compute a whole line?
#20 opened 6 months ago by ziyuhuang123
0
Could you provice GPU code like A100?
#19 opened 6 months ago by ziyuhuang123
0
Incorrect project requirements
#16 opened 6 months ago by hadipash
1
vmem OOM on TPU
#11 opened 6 months ago by hxssgaa
2
Pretrained models?
#10 opened 6 months ago by matteoguarrera
1
Question: Has this been tested against the Trition Flash Attention version?
#2 opened 2 years ago by casper-hansen
10
scripts/jax2hf. py error
#17 opened 8 months ago by liuxpro
1
Questions about the paper
#14 opened 10 months ago by hiroshinoji
2
PyTorch Implementation
#4 opened 2 years ago by conceptofmind
10
Test Script Issues
#15 opened 10 months ago by djbyrne
0
[Question] Add a normalization layer between Attention and FFN?
#8 opened a year ago by findmyway
4
fine-tuning model mismatch - KeyError
#13 opened a year ago by chenwuperth
0
JAX partitioning error when attempting to run with sequence parallelism factor not a power of 2
#9 opened a year ago by exists-forall
0
train_dataset. download
#5 opened 2 years ago by lljjgg
1
How to combine BPT with sequence parallel?
#1 opened 2 years ago by fanghgit
2