princeton-nlp/LLM-Shearing
[ICLR 2024] Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning
PythonMIT
Issues
- 3
- 3
- 1
The dtype of tokenized data should be uint32
#65 opened by ZhiYuanZeng - 1
about shearing params config
#67 opened by LoverLost - 1
Can LLM-Shearing be used on ViT models?
#68 opened by n9s8a - 2
Open source the pruning mask.
#70 opened by Achazwl - 1
Support for Llama-3 / GQA?
#69 opened by LorrinWWW - 7
- 3
Why the rope params are ignored while converting hf checkpoint to composer checkpoint?
#66 opened by ZhiYuanZeng - 2
When should we apply hidden_z?
#50 opened by sbwww - 5
If I can't configure Slurm on a cluster, does that mean I can't use multi-node multi-GPU setups?
#58 opened by rzr002 - 0
composer model trans to pythia problem
#64 opened by rzr002 - 1
LlamaRMSNorm() layer differs from original llama
#63 opened by suhmily - 3
- 7
How much compute will this take?
#22 opened by fakerybakery - 2
Instruction tuning dataset
#57 opened by kiucho - 2
Pruning fine-tuned model
#55 opened by kiucho - 1
save model meet problem
#56 opened by 18140663659 - 0
- 0
有没有不用Slurm跑剪枝的方法?
#59 opened by Beatlesso - 0
- 5
- 6
Start training but nothing continue
#53 opened by logan-zou - 2
KeyError: 'state'
#49 opened by changheecho - 0
missmatch shape
#52 opened by coderchem - 4
Release sheared model without re-training?
#44 opened by sbwww - 1
- 3
Avoid OOM using deepspeed zero-stage
#47 opened by gywlssww - 3
- 2
duplicate mean values during mask initialization
#45 opened by czhang99 - 1
在进行Building trainer时,训练会卡住;
#46 opened by coderchem - 1
Docker Request
#28 opened by TonyZhanghm - 5
Finetuning using LoRA
#25 opened by Nimisha-Pabbichetty - 6
sample data generate name
#21 opened by sunzhe09 - 2
Metric Scores and NQ Evaluation
#41 opened by Spico197 - 6
- 1
The implementation of dynamic batch loading code seems inconsistent with the pseudo-code in the paper
#42 opened by YWMditto - 6
Pruning crash at iteration 592.
#32 opened by lippman1125 - 1
- 2
Drive dress error
#39 opened by YanxiZSQ - 6
- 1
meta-llama/Llama-2-7b-hf Model Preparation failed
#37 opened by rzr002 - 3
ShearedCodeLLama
#35 opened by SinanAkkoyun - 1
AttributeError: module 'flash_attn.flash_attn_interface' has no attribute 'flash_attn_unpadded_func'
#33 opened by YanxiZSQ - 2
- 9
Path no use in continue_pretrain.sh
#24 opened by Longyichen - 1
Flash-attn dependency issues
#27 opened by Forival - 2
- 3
- 4
Please share the alpaca generate and eval code and script to reproduce the results shared in
#26 opened by sanyalsunny111