FMInference/H2O

[NeurIPS'23] H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.

Python

Issues

Question about the reproduction of COPA results
#36 opened 9 days ago by hyeseongshin
0
Recent Budget Simulation is incorrect
#35 opened 15 days ago by gopikrishnajha
0
Eviction doesn't happen at all if recent_budget=0
#33 opened 16 days ago by foreverpiano
0
No support of GQA of Llama in real_drop
#32 opened a month ago by Tomorrowdawn
1
The setting of experiments
#31 opened a month ago by YingYellow
0
Where is the streaming_utils
#30 opened 2 months ago by gawainx
0
Package Error when I reproduce as https://github.com/FMInference/H2O/tree/main/h2o_hf
#19 opened 6 months ago by KylinC
4
Supporting new versions of transformers & batch size > 1
#29 opened 2 months ago by huangyuxiang03
0
Question about the reproduction of XSUM results
#20 opened 5 months ago by SherrySwift
14
h2o is slower and not optimized in memory
#12 opened 7 months ago by ZhuoruiLiu12
3
Can not reproduce results by LLAMA-7B on OpenBook QA
#24 opened 3 months ago by AkideLiu
8
KeyError: 'Cache only has 0 layers, attempted to access layer with index 0'
#25 opened 3 months ago by hasanar1f
3
H2O is slower than full
#10 opened 7 months ago by haiasd
2
Got error when running scripts/generation/llama_h2o.sh
#22 opened 3 months ago by ThisisBillhe
1
Replication of results with h2o_flexgen flex_opt.py
#23 opened 4 months ago by g-x-w
1
论文中Figure 3中的数值问题
#3 opened 7 months ago by 0x00-pl
3
Is the flash attention version ready now?
#21 opened 4 months ago by York-Cheung
0
在运行helm的xsum的时候（llama-7b），local出来的结果accuracy是空的
#7 opened 8 months ago by oujieww
3
TASK=xsum HH_SIZE=256 RECENT_SIZE=256 Model=llama-7b and the rouge2 of h2o is low
#17 opened 6 months ago by duyuxuan1486
3
About flash attention
#18 opened 6 months ago by imh966
0
HH scores summed along batch dimension
#14 opened 6 months ago by yeoedward
4
关于softmax和mask顺序的问题
#2 opened 6 months ago by 0x00-pl
0
按照 readme 完全不能跑
#16 opened 6 months ago by seeyourcell
0
H20单任务推理问题（推理时间和显存占有率似乎没有优化效果）
#6 opened 6 months ago by SUSTechBruce
3
Error with running run_xsum_flexgen.py
#15 opened 6 months ago by gushu333
0
the problem of transformers package
#13 opened 6 months ago by lylcyl
0
ModuleNotFoundError: No module named 'lost_in_the_middle'
#9 opened 7 months ago by haiasd
1
ModuleNotFoundError: No module named 'streaming_llm'
#8 opened 7 months ago by haiasd
2
h2o is not working when the input is short.
#11 opened 7 months ago by mutonix
0
Cannot reproduce the results
#5 opened 8 months ago by li2haipeng
1
OOM when reproducing h2o_20_a100_80
#4 opened 10 months ago by machilusZ
0