Speculative Decoding: TypeError: list indices must be integers or slices, not tuple (Apple M1 MacOS Sonoma 14.6.1)
solitaryangler opened this issue · 0 comments
solitaryangler commented
Hi,
I am trying to run Speculative Decoding from the example given here: huggingface.co/distil-whisper/distil-large-v2#speculative-decoding. I'm using the code:
from transformers import pipeline, AutoModelForCausalLM, AutoModelForSpeechSeq2Seq, AutoProcessor
import torch
from datasets import load_dataset
device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
assistant_model_id = "distil-whisper/distil-large-v2"
assistant_model = AutoModelForCausalLM.from_pretrained(
assistant_model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
)
assistant_model.to(device)
model_id = "openai/whisper-large-v2"
model = AutoModelForSpeechSeq2Seq.from_pretrained(
model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
)
model.to(device)
processor = AutoProcessor.from_pretrained(model_id)
pipe = pipeline(
"automatic-speech-recognition",
model=model,
tokenizer=processor.tokenizer,
feature_extractor=processor.feature_extractor,
max_new_tokens=128,
generate_kwargs={"assistant_model": assistant_model},
torch_dtype=torch_dtype,
device=device,
)
dataset = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
sample = dataset[0]["audio"]
result = pipe(sample, return_timestamps=True)
print(result["text"])
My environment has: python-3.10.13
with (non-exhaustive list)
torch==2.6.0.dev20240925
torchaudio==2.5.0.dev20240925
torchvision==0.20.0.dev20240925
ffmpeg-python==0.2.0
future==1.0.0
librosa==0.10.2.post1
transformers==4.45.0
accelerate==0.34.2
I am running everything on an Apple M1 chip with MacOS Sonoma 14.6.1.
I am getting the following error:
miniconda3/envs/py3.10.13/lib/python3.10/site-packages/transformers/models/whisper/generation_whisper.py:496: FutureWarning: The input name `inputs` is deprecated. Please make sure to use `input_features` instead.
warnings.warn(
Due to a bug fix in https://github.com/huggingface/transformers/pull/28687 transcription using a multilingual Whisper will default to language detection followed by transcription instead of translation to English.This might be a breaking change for your use case. If you want to instead always translate your audio to English, make sure to pass `language='en'`.
Passing a tuple of `past_key_values` is deprecated and will be removed in Transformers v4.43.0. You should pass an instance of `EncoderDecoderCache` instead, e.g. `past_key_values=EncoderDecoderCache.from_legacy_cache(past_key_values)`.
From v4.47 onwards, when a model cache is to be returned, `generate` will return a `Cache` instance instead by default (as opposed to the legacy tuple of tuples format). If you want to keep returning the legacy format, please set `return_legacy_cache=True`.
Traceback (most recent call last):
File "test_specdec.py", line 41, in <module>
result = pipe(sample, return_timestamps=True)
File "miniconda3/envs/py3.10.13/lib/python3.10/site-packages/transformers/pipelines/automatic_speech_recognition.py", line 284, in __call__
return super().__call__(inputs, **kwargs)
File "miniconda3/envs/py3.10.13/lib/python3.10/site-packages/transformers/pipelines/base.py", line 1260, in __call__
return next(
File "miniconda3/envs/py3.10.13/lib/python3.10/site-packages/transformers/pipelines/pt_utils.py", line 124, in __next__
item = next(self.iterator)
File "miniconda3/envs/py3.10.13/lib/python3.10/site-packages/transformers/pipelines/pt_utils.py", line 269, in __next__
processed = self.infer(next(self.iterator), **self.params)
File "miniconda3/envs/py3.10.13/lib/python3.10/site-packages/transformers/pipelines/base.py", line 1175, in forward
model_outputs = self._forward(model_inputs, **forward_params)
File "miniconda3/envs/py3.10.13/lib/python3.10/site-packages/transformers/pipelines/automatic_speech_recognition.py", line 512, in _forward
tokens = self.model.generate(
File "miniconda3/envs/py3.10.13/lib/python3.10/site-packages/transformers/models/whisper/generation_whisper.py", line 671, in generate
) = self.generate_with_fallback(
File "miniconda3/envs/py3.10.13/lib/python3.10/site-packages/transformers/models/whisper/generation_whisper.py", line 834, in generate_with_fallback
seek_outputs = super().generate(
File "miniconda3/envs/py3.10.13/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "miniconda3/envs/py3.10.13/lib/python3.10/site-packages/transformers/generation/utils.py", line 1992, in generate
result = self._assisted_decoding(
File "miniconda3/envs/py3.10.13/lib/python3.10/site-packages/transformers/generation/utils.py", line 4015, in _assisted_decoding
candidate_input_ids, candidate_logits = candidate_generator.get_candidates(input_ids)
File "miniconda3/envs/py3.10.13/lib/python3.10/site-packages/transformers/generation/candidate_generator.py", line 207, in get_candidates
self.assistant_kwargs["past_key_values"] = _crop_past_key_values(
File "miniconda3/envs/py3.10.13/lib/python3.10/site-packages/transformers/generation/candidate_generator.py", line 404, in _crop_past_key_values
past_key_values[idx][0][:, :, :max_length, :],
TypeError: list indices must be integers or slices, not tuple
Kindly help!
Thanks.