princeton-nlp/LLM-Shearing

[ICLR 2024] Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning

PythonMIT

Issues

Could you provide tokenized continue-pretraining dataset for reproduction?
#51 opened 8 months ago by gywlssww
3
Default Initialization of Lambda Parameters to Zero
#71 opened 4 months ago by lpyhdzx
3
The dtype of tokenized data should be uint32
#65 opened 4 months ago by ZhiYuanZeng
1
about shearing params config
#67 opened 5 months ago by LoverLost
1
Can LLM-Shearing be used on ViT models?
#68 opened 5 months ago by n9s8a
1
Open source the pruning mask.
#70 opened 4 months ago by Achazwl
2
Support for Llama-3 / GQA?
#69 opened 4 months ago by LorrinWWW
1
The Project is not implemented for 70B llama?
#62 opened 7 months ago by zhangzhenyu13
7
Why the rope params are ignored while converting hf checkpoint to composer checkpoint?
#66 opened 6 months ago by ZhiYuanZeng
3
When should we apply hidden_z?
#50 opened 6 months ago by sbwww
2
If I can't configure Slurm on a cluster, does that mean I can't use multi-node multi-GPU setups?
#58 opened 6 months ago by rzr002
5
composer model trans to pythia problem
#64 opened 6 months ago by rzr002
0
LlamaRMSNorm() layer differs from original llama
#63 opened 6 months ago by suhmily
1
Start training but only output config information
#61 opened 7 months ago by Beatlesso
3
How much compute will this take?
#22 opened 9 months ago by fakerybakery
7
Instruction tuning dataset
#57 opened 7 months ago by kiucho
2
Pruning fine-tuned model
#55 opened 7 months ago by kiucho
2
save model meet problem
#56 opened 7 months ago by 18140663659
1
None
#60 opened 7 months ago by Beatlesso
0
有没有不用Slurm跑剪枝的方法？
#59 opened 7 months ago by Beatlesso
0
TypeError: buffer is too small for requested array
#54 opened 7 months ago by 18140663659
0
cannot reshape array of size 4 into shape (1,newaxis,8)
#38 opened 9 months ago by rzr002
5
Start training but nothing continue
#53 opened 7 months ago by logan-zou
6
KeyError: 'state'
#49 opened 8 months ago by changheecho
2
missmatch shape
#52 opened 8 months ago by coderchem
0
Release sheared model without re-training?
#44 opened 8 months ago by sbwww
4
Error running CheckpointSaver.close(). Skipping CheckpointSaver.post_close()
#48 opened 8 months ago by rzr002
1
Avoid OOM using deepspeed zero-stage
#47 opened 8 months ago by gywlssww
3
model.prune_params() NotImplementedError: Could not run 'aten::nonzero'
#43 opened 9 months ago by YanxiZSQ
3
duplicate mean values during mask initialization
#45 opened 9 months ago by czhang99
2
在进行Building trainer时，训练会卡住；
#46 opened 9 months ago by coderchem
1
Docker Request
#28 opened 9 months ago by TonyZhanghm
1
Finetuning using LoRA
#25 opened 9 months ago by Nimisha-Pabbichetty
5
sample data generate name
#21 opened 9 months ago by sunzhe09
6
Metric Scores and NQ Evaluation
#41 opened 9 months ago by Spico197
2
LanguageCrossEntropy logs nan when bash pruning.sh
#34 opened 10 months ago by YanxiZSQ
6
The implementation of dynamic batch loading code seems inconsistent with the pseudo-code in the paper
#42 opened 9 months ago by YWMditto
1
Pruning crash at iteration 592.
#32 opened 10 months ago by lippman1125
6
Missing index.json in dataset shared on drive
#40 opened 9 months ago by AnonNoNameAccount
1
Drive dress error
#39 opened 9 months ago by YanxiZSQ
2
wiki proportion finally dominates at the end of the pruning stage
#36 opened 9 months ago by lippman1125
6
meta-llama/Llama-2-7b-hf Model Preparation failed
#37 opened 9 months ago by rzr002
1
ShearedCodeLLama
#35 opened 10 months ago by SinanAkkoyun
3
AttributeError: module 'flash_attn.flash_attn_interface' has no attribute 'flash_attn_unpadded_func'
#33 opened 10 months ago by YanxiZSQ
1
Train metrics/train/github_LanguageCrossEntropy: nan
#31 opened 10 months ago by lippman1125
2
Path no use in continue_pretrain.sh
#24 opened 10 months ago by Longyichen
9
Flash-attn dependency issues
#27 opened 10 months ago by Forival
1
KV head count on princeton-nlp/Sheared-LLaMA-1.3B-ShareGPT ?
#29 opened 10 months ago by SinanAkkoyun
2
NotImplementedError: offload_to_cpu=True and NO_SHARD is not supported yet
#23 opened 10 months ago by Longyichen
3
Please share the alpaca generate and eval code and script to reproduce the results shared in
#26 opened 10 months ago by sanyalsunny111
4