horseee/LLM-Pruner
[NeurIPS 2023] LLM-Pruner: On the Structural Pruning of Large Language Models. Support Llama-3/3.1, Llama-2, LLaMA, BLOOM, Vicuna, Baichuan, TinyLlama, etc.
PythonApache-2.0
Issues
- 5
Llama3 reports shape error after pruning
#69 opened by WentaoTan - 0
evaluate PPL with the post-training model
#79 opened by VincentZ-2020 - 3
- 2
No such file or directory: pytorch_model.bin
#74 opened by yaolu-zjut - 0
关于consecutive_groups
#78 opened by VincentZ-2020 - 4
- 4
- 2
- 0
- 3
- 2
- 1
Evaluation:UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
#58 opened by manlenzzz - 0
Taylor pruner under-utilizing resources
#76 opened by marianbasti - 0
- 1
Does it support qwen2?
#71 opened by yangxue-1 - 5
- 0
- 0
Custom Model pruning
#72 opened by saidineshpola - 2
ConnectionError: Couldn't reach https://raw.githubusercontent.com/wojzaremba/lstm/master/data/ptb.train.txt (ReadTimeout(ReadTimeoutError("HTTPSConnectionPool(host='raw.githubusercontent.com', port=443): Read timed out. (read timeout=100)")))
#48 opened by qxpBlog - 0
Loading pruned model for causal llm
#68 opened by sriyachakravarthy - 7
Adaptation of GQA
#64 opened by junzhang-zj - 0
请问能裁剪普通的transformer模型吗
#62 opened by SKY072410 - 0
请问可以支持chatglm3剪枝吗
#61 opened by Franklin-L - 0
Difference in Perplexity Values
#60 opened by nikhil-ghosh-berkeley - 0
- 0
Pruning llama3
#57 opened by yinwangsong - 0
Is this method implementable on multi-GPUs?
#54 opened by LeonCheng0129 - 0
How to prune the embedding and lm_head?
#55 opened by L-hongbin - 0
Unable to reproduce the results for param_first and param_second in the paper after finetuning.
#52 opened by danyal97 - 0
- 2
a post-training issue
#35 opened by cmnfriend - 0
The quantization of the compressed models
#49 opened by lihuang258 - 0
Cannot use huggface to load
#46 opened by coderchem - 1
401 Client Error: Unauthorized for url: https://huggingface.co/decapoda-research/llama-7b-hf/resolve/main/tokenizer_config.json
#43 opened by azuryl - 1
- 2
Latency code
#33 opened by tuidan - 3
Supporting device_map = 'auto' similar to the one in .from_pretrained method from Huggingface
#36 opened by Ahmed-Roushdy - 4
- 2
Question related to the model tuning
#39 opened by shawnricecake - 0
- 0
在将部分层进行剪枝之后,不能直接通过tgi加载模型
#41 opened by coderchem - 0
Pruning MQA?
#40 opened by jianyuheng - 2
为什么num_examples默认是10?
#38 opened by coderchem - 6
Reproducing paper results
#34 opened by grigorn - 1
Can not import LlamaConfig
#32 opened by Ahmed-Roushdy - 0
Examples on the Huggingface Hub
#31 opened by vgoklani - 0
When will you support ChatGLM?
#30 opened by AboveParadise - 1
Force even pruning across layers
#29 opened by thedarkzeno - 2
Calculating Importance of 'param_mix'
#28 opened by kiucho - 1
When would the code for GPT-J-6B be released?
#27 opened by mumuyeye