Issues
- 0
Error when trying to run Llama2-7b: attention_mask and position_ids are None
#60 opened by Philippe-Guyard - 0
Breaks under Numpy 2
#58 opened by time-less-ness - 0
Llama2 `ExpectedMoreSplits` Exception
#57 opened by time-less-ness - 0
How to save the pruned model?
#56 opened by botox-100 - 0
which dataset do you use for jailbreak?
#55 opened by lucywang720 - 2
why choosing vicuna as the tokenizer
#53 opened by hainingfang - 3
Ambiguous result for LLAMA-2-13b
#54 opened by Qian2333 - 5
Cannot reproduce Llama2 results
#52 opened by taratt - 2
- 2
- 1
Compressing a Finetuned llama2 model with lora
#50 opened by bkhanal-11 - 3
llama_7b wikitext perplexity 7.0915350914
#45 opened by xiaopengaia - 3
- 4
Perplexity is off for Llama 2-7b
#47 opened by taratt - 1
Where is cache['attention_mask']?
#48 opened by JohnneyQin - 1
- 16
Support for LLaMA-2
#23 opened by junzhang-zj - 0
Change sparsity rates
#42 opened by Lexlum - 0
Need Clarification regarding prune_deit() in - https://github.com/locuslab/wanda/blob/main/image_classifiers/main.py
#41 opened by solomonmanuelraj - 0
Wanda Prunning for Zero Shot Object Detection Model - google/owlv2-base-patch16-ensemble
#40 opened by solomonmanuelraj - 1
pruned model load slowly
#28 opened by kfchenhn - 1
- 2
Some question about the code
#38 opened by liuxiaozhu01 - 0
Can this be used for pruning Whisper?
#37 opened by caroljoyv - 2
run 70b error:RuntimeError: shape '[1, 4096, 64, 128]' is invalid for input of size 4194304
#33 opened by JiaQuan1203 - 1
Cannot load the c4 dataset
#36 opened by simlaharma - 1
Issue with Mixed Device Tensors (cuda:0 and cuda:1)
#35 opened by pprp - 5
- 2
Can wanda be used to ConvNet for CV tasks?
#32 opened by frankinwi - 2
HellaSwag numbers?
#31 opened by forresti - 1
Fine-tune the pruned model
#24 opened by guozhiyu - 2
Pruned model is same size as original
#29 opened by virentakia - 2
Questions about sub-networks of LLMs
#27 opened by JiwenJ - 4
Publish the Llama2 sparsified models
#30 opened by egeor - 4
Question about the latency speedup!
#26 opened by ybai62868 - 1
error in loading datasets
#25 opened by Ahmed-Roushdy - 3
- 1
Can wanda speedup LLM inference performance?
#14 opened by ifromeast - 1
What are the dependency of this project? What's the version of transformers?
#17 opened by guanchuwang - 3
is it possible to prune gptq models?
#11 opened by GrailFinder - 3
- 7
LoRA fine-tuning hyper-parameters
#8 opened by ryusaeba - 1
calibration data seq_length
#22 opened by kiucho - 2
Some questions about the codes.
#21 opened by kiseliu - 2
can not reproduce the results of figure 3
#19 opened by HaihangWu - 2
Structured 2:4 sparsity pattern supporting GPU
#20 opened by kiucho - 1
Is this work for other models, such as GPT-2?
#18 opened by anhdang000 - 3
Some questions.
#12 opened by nikitabalakin - 1
Falcon 7b / 40b
#13 opened by BaiqingL - 1
How can the pruned model with sparse matrix save model size and computation cost?
#9 opened by JiachuanDENG