amrrs/LLM-QA-Bot

runtime error with https://huggingface.co/togethercomputer/RedPajama-INCITE-Chat-3B-v1

Opened this issue · 0 comments

Setting pad_token_id to eos_token_id:0 for open-end generation.

RuntimeError Traceback (most recent call last)
in <cell line: 2>()
1 query_engine = index.as_query_engine()
----> 2 response = query_engine.query( "What's the cost of Whisper model?")

37 frames
/usr/local/lib/python3.10/dist-packages/transformers/models/gpt_neox/modeling_gpt_neox.py in _attn(self, query, key, value, attention_mask, head_mask)
217 # Need to be on the same device, otherwise RuntimeError: ..., x and y to be on the same device
218 mask_value = torch.tensor(mask_value, dtype=attn_scores.dtype).to(attn_scores.device)
--> 219 attn_scores = torch.where(causal_mask, attn_scores, mask_value)
220
221 if attention_mask is not None:

RuntimeError: The size of tensor a (2048) must match the size of tensor b (2049) at non-singleton dimension 3