FMInference/H2O
[NeurIPS'23] H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.
Python
Issues
- 0
- 0
Recent Budget Simulation is incorrect
#35 opened by gopikrishnajha - 0
- 1
No support of GQA of Llama in real_drop
#32 opened by Tomorrowdawn - 0
The setting of experiments
#31 opened by YingYellow - 0
Where is the streaming_utils
#30 opened by gawainx - 4
Package Error when I reproduce as https://github.com/FMInference/H2O/tree/main/h2o_hf
#19 opened by KylinC - 0
- 14
- 3
h2o is slower and not optimized in memory
#12 opened by ZhuoruiLiu12 - 8
- 3
- 2
H2O is slower than full
#10 opened by haiasd - 1
- 1
Replication of results with h2o_flexgen flex_opt.py
#23 opened by g-x-w - 3
论文中Figure 3中的数值问题
#3 opened by 0x00-pl - 0
Is the flash attention version ready now?
#21 opened by York-Cheung - 3
- 3
TASK=xsum HH_SIZE=256 RECENT_SIZE=256 Model=llama-7b and the rouge2 of h2o is low
#17 opened by duyuxuan1486 - 0
About flash attention
#18 opened by imh966 - 4
HH scores summed along batch dimension
#14 opened by yeoedward - 0
关于softmax和mask顺序的问题
#2 opened by 0x00-pl - 0
按照 readme 完全不能跑
#16 opened by seeyourcell - 3
H20单任务推理问题(推理时间和显存占有率似乎没有优化效果)
#6 opened by SUSTechBruce - 0
Error with running run_xsum_flexgen.py
#15 opened by gushu333 - 0
the problem of transformers package
#13 opened by lylcyl - 1
- 2
- 0
h2o is not working when the input is short.
#11 opened by mutonix - 1
Cannot reproduce the results
#5 opened by li2haipeng - 0
OOM when reproducing h2o_20_a100_80
#4 opened by machilusZ