locuslab/wanda

tokenizers `use_fast = False`

eggie5 opened this issue · 3 comments

eggie5 commented

why is this set to false? It removes the use of GPTXNeo variants.

I set it to True and get this error:

python main.py \
    --model databricks/dolly-v2-3b \
    --prune_method wanda \
    --sparsity_ratio 0.5 \
    --sparsity_type unstructured \
    --save out/llama_7b/unstructured/wanda/
torch 2.0.1
transformers 4.30.2
accelerate 0.20.3
# of gpus:  1
loading llm model databricks/dolly-v2-3b
The model weights are not tied. Please use the `tie_weights` method before using the `infer_auto_device` function.
Downloading (…)/main/tokenizer.json: 2.11MB [00:00, 7.60MB/s]
Downloading (…)cial_tokens_map.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 228/228 [00:00<00:00, 665kB/s]
use device  cuda:0
pruning starts
loading calibdation data
Downloading readme: 2.38kB [00:00, 4.36MB/s]
Downloading and preparing dataset json/allenai--c4 to /Users/eggie5/.cache/huggingface/datasets/allenai___json/allenai--c4-6fbe877195f42de5/0.0.0/e347ab1c932092252e717ff3f949105a4dd28b27e842dd53157d2f72e276c2e4...
Downloading data: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 319M/319M [00:13<00:00, 23.4MB/s]
Downloading data files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:14<00:00, 14.20s/it]
Extracting data files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:03<00:00,  3.27s/it]
Dataset json downloaded and prepared to /Users/eggie5/.cache/huggingface/datasets/allenai___json/allenai--c4-6fbe877195f42de5/0.0.0/e347ab1c932092252e717ff3f949105a4dd28b27e842dd53157d2f72e276c2e4. Subsequent calls will reuse this data.
Downloading and preparing dataset json/allenai--c4 to /Users/eggie5/.cache/huggingface/datasets/allenai___json/allenai--c4-efc3d4f4606f44bd/0.0.0/e347ab1c932092252e717ff3f949105a4dd28b27e842dd53157d2f72e276c2e4...
Downloading data: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 40.5M/40.5M [00:01<00:00, 23.6MB/s]
Downloading data files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:02<00:00,  2.20s/it]
Extracting data files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  2.33it/s]
Dataset json downloaded and prepared to /Users/eggie5/.cache/huggingface/datasets/allenai___json/allenai--c4-efc3d4f4606f44bd/0.0.0/e347ab1c932092252e717ff3f949105a4dd28b27e842dd53157d2f72e276c2e4. Subsequent calls will reuse this data.
Traceback (most recent call last):
  File "/Users/eggie5/Development/wanda/main.py", line 88, in <module>
    main()
  File "/Users/eggie5/Development/wanda/main.py", line 65, in main
    prune_wanda(args, model, tokenizer, device, prune_n=prune_n, prune_m=prune_m)
  File "/Users/eggie5/Development/wanda/lib/prune.py", line 132, in prune_wanda
    dataloader, _ = get_loaders("c4",nsamples=args.nsamples,seed=args.seed,seqlen=2048,tokenizer=tokenizer)
  File "/Users/eggie5/Development/wanda/lib/data.py", line 73, in get_loaders
    return get_c4(nsamples, seed, seqlen, tokenizer)
  File "/Users/eggie5/Development/wanda/lib/data.py", line 55, in get_c4
    i = random.randint(0, trainenc.input_ids.shape[1] - seqlen - 1)
  File "/Users/eggie5/.pyenv/versions/3.10.0/lib/python3.10/random.py", line 370, in randint
    return self.randrange(a, b+1)
  File "/Users/eggie5/.pyenv/versions/3.10.0/lib/python3.10/random.py", line 353, in randrange
    raise ValueError("empty range for randrange() (%d, %d, %d)" % (istart, istop, width))
ValueError: empty range for randrange() (0, 0, 0)

Hi, Thanks for you interest in our work. Our repo is research motivated and currently mainly supports the llama model family as reported in our paper. To use it with gpt-neox, i think the forwarding function to compute activations function may depend on the specific model architectures.

However, it seems that the error here is not related to that. Could you check the value of trainenc.input_ids.shape[1] - seqlen - 1 in File "/Users/eggie5/Development/wanda/lib/data.py", line 55, in get_c4 i = random.randint(0, trainenc.input_ids.shape[1] - seqlen - 1) and see if seqlen is set correctly for your model, which does seem to be the cause for the failure.

It may be fixed by this recent commit. This error was not observed when pruning LLaMA model. Also remember to set the context length correctly in here for a different LLM.

eggie5 commented

I was able to get it work by, in addition to your latest commit, renaming all model.model calls to model.base_model

Thanks for your help and this work overall.