FasterDecoding/Medusa
Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
Jupyter NotebookApache-2.0
Issues
- 0
PPL compute
#106 opened by yuyangxie96 - 2
Token-wise the same generalization?
#99 opened by Ageliss - 2
Some questions about sampling strategy
#63 opened by qianxiao1111 - 2
Using Medusa with Whisper
#100 opened by AvivSham - 0
Containerization with Dockerfile to setup medusa
#104 opened by gangooteli - 0
- 7
- 5
Medusa Training Loss
#95 opened by TomYang-TZ - 0
[bug] fix preprocess function
#101 opened by xiezipeng-ML - 1
ImportError: cannot import name 'is_flash_attn_available' from 'transformers.utils'
#98 opened by imneov - 3
Is there no way to inference without training?
#77 opened by MoOo2mini - 1
Is there a bug in gen_model_answer_baseline.py?
#96 opened by qspang - 1
train medusa stage-2
#94 opened by smartliuhw - 0
mistral.json
#93 opened by Git-L1 - 0
- 5
Training Medusa heads
#70 opened by mmilunovic-mdcs - 2
- 0
Cant it support chatgllm?
#91 opened by PeterXiaTian - 0
HYDRA support?
#90 opened by arunpatala - 0
Misleading Name LLM Name MEDUSA
#89 opened by Pittconnect - 0
about Medusa mask details
#88 opened by dhcode-cpp - 1
release medusa-llm v1.0
#84 opened by zhyncs - 0
[Dynamic Batching] Concerns about whether features are not supported using Medusa
#82 opened by Ageliss - 0
Encounter an CUDA error when set Medusa head
#81 opened by 1649759610 - 2
- 0
deepspeed support
#78 opened by jiangix-paper - 1
- 7
CUBLAS_STATUS_EXECUTION_FAILED when training Medusa Head with base model set to Llama2 7B
#45 opened by void-main - 2
Medusa 1 and 2 speed up
#73 opened by LotuSrc - 3
- 2
About changing LLM from LLAMA to LLAMA-2
#68 opened by dydrkfl06 - 2
- 2
- 7
- 6
Sparse candidate generation confusion
#64 opened by zankner - 1
- 4
- 2
[New feature] More sampling schemes
#39 opened by Jokoe66 - 1
Question about Heads warmup
#74 opened by eloooooon - 11
vLLM support
#41 opened by MichaelJayW - 5
Clarifications on Models + Batch Size
#66 opened by RonanKMcGovern - 1
Can I make an AWQ quantization?
#65 opened by RonanKMcGovern - 3
- 8
Results for different configs
#62 opened by zankner - 0
- 0
- 1
FasterTransformer support
#57 opened by niyunsheng - 1
[Feature Request] Qwen model support
#52 opened by JianbangZ - 3
How to test latency between medusa & baseline
#49 opened by YixinSong-e - 2