The output is so bad - total garbage what I am doing wrong? It is also super slow and requires huge amount of RAM

Here my entire command

from transformers import AutoTokenizer, OPTForCausalLM

tokenizer = AutoTokenizer.from_pretrained("facebook/galactica-1.3b")
model = OPTForCausalLM.from_pretrained("facebook/galactica-1.3b", device_map="auto")

 
input_text = "The benefits of deadlifting\n\n"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")

outputs = model.generate(input_ids,new_doc=False,top_p=0.7, max_length=1000)
print(tokenizer.decode(outputs[0]))

And the output is total repetition and garbage. I am trying to generate an article based on the topic sentence I provide

Also even 28 GB VRAM is not enough for 6.7b model. I am testing CPU runtime on IPU and it has been more than 2 hours with just 6.7b model.

the output as below

The benefits of deadlifting

The benefits of deadlifting are numerous. It is a simple, inexpensive, and effective method of reducing the risk of injury to the shoulder and elbow. It is also a simple and effective method of reducing the risk of injury to the hand.

Shoulder

The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper

Here another example. Why all repetition?

Thanks for reporting, will look into this and get back to you tomorrow

In half precision mode 6.7b fits on a 3090. As for the output quality, you need to tweak generation parameters a little, this blogpost explains quite a bit.

Here's a snippet of how I use it:

import torch, gc
from transformers import AutoTokenizer, OPTForCausalLM

tokenizer = AutoTokenizer.from_pretrained("facebook/galactica-6.7b")
tokenizer.pad_token_id = 1
tokenizer.padding_side = 'left'
tokenizer.model_max_length = 2020

model = OPTForCausalLM.from_pretrained("facebook/galactica-6.7b", device_map="auto", torch_dtype=torch.float16)

input_text = """# Scientific article.
title: Contrastive analysis of models used for DRAM simulation.

# Introduction
"""
input_ids = tokenizer(input_text, return_tensors="pt", padding='max_length').input_ids.to("cuda")

outputs = model.generate(input_ids,
                        max_new_tokens=1000,
                        do_sample=True,
                        temperature=0.7,
                        top_k=25,
                        top_p=0.9,
                        no_repeat_ngram_size=10,
                        early_stopping=True)

print(tokenizer.decode(outputs[0]).lstrip('<pad>'))

gc.collect()
torch.cuda.empty_cache()

padding='max_length' means new_doc=True, set to False to disable

@AbstractQbit ty very much for answer

May I ask something regarding this format

# Scientific article.
title: Contrastive analysis of models used for DRAM simulation.

# Introduction

So the text generator understand # character as a special character and do something?

what does these 2 parameter do?
pad_token_id
padding_side

Authors say in the paper that the model was trained on text in markdown format, so giving markdown-ish prompts to the model should probably work best, I guess.

Padding params I took from here

galai/galai/model.py

Lines 85 to 86 in f6d9b0a

    
           self.tokenizer.enable_padding(direction="left", pad_id=1, pad_type_id=0, pad_token="[PAD]") 
        
           self.tokenizer.enable_truncation(max_length=2020, direction="left")

@AbstractQbit ty so much for answers

about these hyper parameters, have you tested them or how did you come up with those values?

do_sample=True,
                    temperature=0.7,
                    top_k=25,
                    top_p=0.9,
                    no_repeat_ngram_size=10,
                    early_stopping=True

@FurkanGozukara Those are just what I've ended up with after playing around with the model for a bit. There was no real methodology for picking those. They just produced somewhat sensible output, so I've shared them here as a starting point for you. There are no one-size-fits-all parameters, you'll have to experiment yourself to tailor them to your needs.

As to what they do, please refer to the article I've linked above. I'm not an NLP expert, so I can't explain them any better than HF people.

@FurkanGozukara I played around with your prompt, and this is what the model came up with.

Title: The benefits of deadlifting

Abstract: The purpose of this study was to determine the effect of deadlifting on the cardiovascular system. 
The study consisted of a group of 13 men and 11 women who were randomly assigned to an experimental group (n = 24) or a control group (n = 24,). 
Subjects in the experimental group performed deadlifting exercises 2 days per week for 6 weeks, while subjects in the control group did not participate in any exercise program. 
The 6-week program consisted of a 3-week progressive phase and a 3-week maintenance phase. 
At the end of each phase, a graded exercise test (GXT) was performed on a treadmill to determine peak oxygen consumption (VO2peak), ventilatory threshold (VT), and heart rate (HR) at the VT. 
At the end of the 6-week program, VO2peak increased by 15.4% (P < 0.05) in the experimental group compared with a 0.9% increase (P > 0.05) in the control group. 
The experimental group demonstrated a 13.9% increase (P < 0.01) in HR at the VT compared with a 2.4% increase (P > 0...</s>

I followed the parameters from @AbstractQbit.
Here is the complete code. I could run it on one Titan RTX (24GB VRAM) but only when using half precision.

from transformers import AutoTokenizer, OPTForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained("facebook/galactica-6.7b")
tokenizer.pad_token_id = 1
tokenizer.padding_side = 'left'
tokenizer.model_max_length = 4020

model = OPTForCausalLM.from_pretrained("facebook/galactica-6.7b", device_map="auto", torch_dtype=torch.float16)

#input_text = "The Transformer architecture [START_REF]"
input_text = "Title: The benefits of deadlifting\n\n"

input_ids = tokenizer(input_text, padding='max_length', return_tensors="pt").input_ids.to("cuda")

outputs = model.generate(input_ids, max_new_tokens=1000,
                         do_sample=True,
                         temperature=0.7,
                         top_k=25,
                         top_p=0.9,
                         no_repeat_ngram_size=10,
                         early_stopping=True)

print(tokenizer.decode(outputs[0]).lstrip('<pad>'))

@AbstractQbit your answer has hleped me a lot, many thanks!

Do you (or anyone else) now how to use the new_doc (padding) parameter to continue a document generation after the first prompt? Do I have to use the output of the prompt as input to the next one or it's better to use larger max_new_tokens value?

i'm getting CUDA error with @legor and @AbstractQbit ' code

details bug report when setting CUDA_LAUNCH_BLOCKING=1

/home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [57,0,0], thread: [0,0,0] Assertion `srcIndex < srcSelectDimSize` failed. /home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [57,0,0], thread: [1,0,0] Assertion `srcIndex < srcSelectDimSize` failed. /home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [57,0,0], thread: [2,0,0] Assertion `srcIndex < srcSelectDimSize` failed. /home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [57,0,0], thread: [3,0,0] Assertion `srcIndex < srcSelectDimSize` failed. /home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [57,0,0], thread: [4,0,0] Assertion `srcIndex < srcSelectDimSize` failed. /home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [57,0,0], thread: [5,0,0] Assertion `srcIndex < srcSelectDimSize` failed. /home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [57,0,0], thread: [6,0,0] Assertion `srcIndex < srcSelectDimSize` failed. /home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [57,0,0], thread: [7,0,0] Assertion `srcIndex < srcSelectDimSize` failed. /home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [57,0,0], thread: [8,0,0] Assertion `srcIndex < srcSelectDimSize` failed. /home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [57,0,0], thread: [9,0,0] Assertion `srcIndex < srcSelectDimSize` failed. /home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [57,0,0], thread: [10,0,0] Assertion `srcIndex < srcSelectDimSize` failed. /home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [57,0,0], thread: [11,0,0] Assertion `srcIndex < srcSelectDimSize` failed. /home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [57,0,0], thread: [12,0,0] Assertion `srcIndex < srcSelectDimSize` failed. /home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [57,0,0], thread: [13,0,0] Assertion `srcIndex < srcSelectDimSize` failed. /home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [57,0,0], thread: [14,0,0] Assertion `srcIndex < srcSelectDimSize` failed. /home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [57,0,0], thread: [15,0,0] Assertion `srcIndex < srcSelectDimSize` failed. /home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [57,0,0], thread: [16,0,0] Assertion `srcIndex < srcSelectDimSize` failed. /home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [57,0,0], thread: [17,0,0] Assertion `srcIndex < srcSelectDimSize` failed. /home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [57,0,0], thread: [18,0,0] Assertion `srcIndex < srcSelectDimSize` failed. /home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [57,0,0], thread: [19,0,0] Assertion `srcIndex < srcSelectDimSize` failed. /home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [57,0,0], thread: [20,0,0] Assertion `srcIndex < srcSelectDimSize` failed. /home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [57,0,0], thread: [21,0,0] Assertion `srcIndex < srcSelectDimSize` failed. /home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [57,0,0], thread: [22,0,0] Assertion `srcIndex < srcSelectDimSize` failed. /home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [57,0,0], thread: [23,0,0] Assertion `srcIndex < srcSelectDimSize` failed. /home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [57,0,0], thread: [24,0,0] Assertion `srcIndex < srcSelectDimSize` failed. /home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [57,0,0], thread: [25,0,0] Assertion `srcIndex < srcSelectDimSize` failed. /home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [57,0,0], thread: [26,0,0] Assertion `srcIndex < srcSelectDimSize` failed. /home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [57,0,0], thread: [27,0,0] Assertion `srcIndex < srcSelectDimSize` failed. /home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [57,0,0], thread: [28,0,0] Assertion `srcIndex < srcSelectDimSize` failed. /home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [57,0,0], thread: [29,0,0] Assertion `srcIndex < srcSelectDimSize` failed. /home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [57,0,0], thread: [30,0,0] Assertion `srcIndex < srcSelectDimSize` failed. /home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [57,0,0], thread: [31,0,0] Assertion `srcIndex < srcSelectDimSize` failed. /home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [32,0,0] Assertion `srcIndex < srcSelectDimSize` failed. /home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [33,0,0] Assertion `srcIndex < srcSelectDimSize` failed. /home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [34,0,0] Assertion `srcIndex < srcSelectDimSize` failed. /home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [35,0,0] Assertion `srcIndex < srcSelectDimSize` failed. /home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [36,0,0] Assertion `srcIndex < srcSelectDimSize` failed. /home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [37,0,0] Assertion `srcIndex < srcSelectDimSize` failed. /home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [38,0,0] Assertion `srcIndex < srcSelectDimSize` failed. /home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [39,0,0] Assertion `srcIndex < srcSelectDimSize` failed. /home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [40,0,0] Assertion `srcIndex < srcSelectDimSize` failed. /home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [41,0,0] Assertion `srcIndex < srcSelectDimSize` failed. /home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [42,0,0] Assertion `srcIndex < srcSelectDimSize` failed. /home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [43,0,0] Assertion `srcIndex < srcSelectDimSize` failed. /home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [44,0,0] Assertion `srcIndex < srcSelectDimSize` failed. /home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [45,0,0] Assertion `srcIndex < srcSelectDimSize` failed. /home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [46,0,0] Assertion `srcIndex < srcSelectDimSize` failed. /home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [47,0,0] Assertion `srcIndex < srcSelectDimSize` failed. /home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [48,0,0] Assertion `srcIndex < srcSelectDimSize` failed. /home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [49,0,0] Assertion `srcIndex < srcSelectDimSize` failed. /home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [50,0,0] Assertion `srcIndex < srcSelectDimSize` failed. /home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [51,0,0] Assertion `srcIndex < srcSelectDimSize` failed. /home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [52,0,0] Assertion `srcIndex < srcSelectDimSize` failed. /home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [53,0,0] Assertion `srcIndex < srcSelectDimSize` failed. /home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [54,0,0] Assertion `srcIndex < srcSelectDimSize` failed. /home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [55,0,0] Assertion `srcIndex < srcSelectDimSize` failed. /home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [56,0,0] Assertion `srcIndex < srcSelectDimSize` failed. /home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [57,0,0] Assertion `srcIndex < srcSelectDimSize` failed. /home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [58,0,0] Assertion `srcIndex < srcSelectDimSize` failed. /home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [59,0,0] Assertion `srcIndex < srcSelectDimSize` failed. /home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [60,0,0] Assertion `srcIndex < srcSelectDimSize` failed. /home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [61,0,0] Assertion `srcIndex < srcSelectDimSize` failed. /home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [62,0,0] Assertion `srcIndex < srcSelectDimSize` failed. /home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [63,0,0] Assertion `srcIndex < srcSelectDimSize` failed. --------------------------------------------------------------------------- RuntimeError Traceback (most recent call last) Cell In [10], line 1 ----> 1 outputs_ids = model.generate( 2 input_ids, max_new_tokens=1000, 3 do_sample=True, temperature=0.7, 4 top_k=25, top_p=0.9, 5 no_repeat_ngram_size=10, 6 early_stopping=True 7 )

File ███/myconda/condaGA/lib/python3.10/site-packages/torch/autograd/grad_mode.py:27, in _DecoratorContextManager.call..decorate_context(*args, **kwargs)
24 @functools.wraps(func)
25 def decorate_context(*args, **kwargs):
26 with self.clone():
---> 27 return func(*args, **kwargs)

File ███/myconda/condaGA/lib/python3.10/site-packages/transformers/generation_utils.py:1543, in GenerationMixin.generate(self, inputs, max_length, min_length, do_sample, early_stopping, num_beams, temperature, penalty_alpha, top_k, top_p, typical_p, repetition_penalty, bad_words_ids, force_words_ids, bos_token_id, pad_token_id, eos_token_id, length_penalty, no_repeat_ngram_size, encoder_no_repeat_ngram_size, num_return_sequences, max_time, max_new_tokens, decoder_start_token_id, use_cache, num_beam_groups, diversity_penalty, prefix_allowed_tokens_fn, logits_processor, renormalize_logits, stopping_criteria, constraints, output_attentions, output_hidden_states, output_scores, return_dict_in_generate, forced_bos_token_id, forced_eos_token_id, remove_invalid_values, synced_gpus, exponential_decay_length_penalty, suppress_tokens, begin_suppress_tokens, forced_decoder_ids, **model_kwargs)
1535 input_ids, model_kwargs = self._expand_inputs_for_generation(
1536 input_ids,
1537 expand_size=num_return_sequences,
1538 is_encoder_decoder=self.config.is_encoder_decoder,
1539 **model_kwargs,
1540 )
1542 # 12. run sample
-> 1543 return self.sample(
1544 input_ids,
1545 logits_processor=logits_processor,
1546 logits_warper=logits_warper,
1547 stopping_criteria=stopping_criteria,
1548 pad_token_id=pad_token_id,
1549 eos_token_id=eos_token_id,
1550 output_scores=output_scores,
1551 return_dict_in_generate=return_dict_in_generate,
1552 synced_gpus=synced_gpus,
1553 **model_kwargs,
1554 )
1556 elif is_beam_gen_mode:
1557 if num_return_sequences > num_beams:

File ███/myconda/condaGA/lib/python3.10/site-packages/transformers/generation_utils.py:2482, in GenerationMixin.sample(self, input_ids, logits_processor, stopping_criteria, logits_warper, max_length, pad_token_id, eos_token_id, output_attentions, output_hidden_states, output_scores, return_dict_in_generate, synced_gpus, **model_kwargs)
2479 model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
2481 # forward pass to get next token
-> 2482 outputs = self(
2483 **model_inputs,
2484 return_dict=True,
2485 output_attentions=output_attentions,
2486 output_hidden_states=output_hidden_states,
2487 )
2489 if synced_gpus and this_peer_finished:
2490 continue # don't waste resources running the code we don't need

File ███/myconda/condaGA/lib/python3.10/site-packages/torch/nn/modules/module.py:1130, in Module._call_impl(self, *input, **kwargs)
1126 # If we don't have any hooks, we want to skip the rest of the logic in
1127 # this function, and just call forward.
1128 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1129 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1130 return forward_call(*input, **kwargs)
1131 # Do not call functions when jit is used
1132 full_backward_hooks, non_full_backward_hooks = [], []

File ███/myconda/condaGA/lib/python3.10/site-packages/accelerate/hooks.py:156, in add_hook_to_module..new_forward(*args, **kwargs)
154 output = old_forward(*args, **kwargs)
155 else:
--> 156 output = old_forward(*args, **kwargs)
157 return module._hf_hook.post_forward(module, output)

File ███/myconda/condaGA/lib/python3.10/site-packages/transformers/models/opt/modeling_opt.py:929, in OPTForCausalLM.forward(self, input_ids, attention_mask, head_mask, past_key_values, inputs_embeds, labels, use_cache, output_attentions, output_hidden_states, return_dict)
926 return_dict = return_dict if return_dict is not None else self.config.use_return_dict
928 # decoder outputs consists of (dec_features, layer_state, dec_hidden, dec_attn)
--> 929 outputs = self.model.decoder(
930 input_ids=input_ids,
931 attention_mask=attention_mask,
932 head_mask=head_mask,
933 past_key_values=past_key_values,
934 inputs_embeds=inputs_embeds,
935 use_cache=use_cache,
936 output_attentions=output_attentions,
937 output_hidden_states=output_hidden_states,
938 return_dict=return_dict,
939 )
941 logits = self.lm_head(outputs[0]).contiguous()
943 loss = None

File ███/myconda/condaGA/lib/python3.10/site-packages/torch/nn/modules/module.py:1130, in Module._call_impl(self, *input, **kwargs)
1126 # If we don't have any hooks, we want to skip the rest of the logic in
1127 # this function, and just call forward.
1128 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1129 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1130 return forward_call(*input, **kwargs)
1131 # Do not call functions when jit is used
1132 full_backward_hooks, non_full_backward_hooks = [], []

File ███/myconda/condaGA/lib/python3.10/site-packages/accelerate/hooks.py:156, in add_hook_to_module..new_forward(*args, **kwargs)
154 output = old_forward(*args, **kwargs)
155 else:
--> 156 output = old_forward(*args, **kwargs)
157 return module._hf_hook.post_forward(module, output)

File ███/myconda/condaGA/lib/python3.10/site-packages/transformers/models/opt/modeling_opt.py:628, in OPTDecoder.forward(self, input_ids, attention_mask, head_mask, past_key_values, inputs_embeds, use_cache, output_attentions, output_hidden_states, return_dict)
625 past_key_values_length = past_key_values[0][0].shape[2] if past_key_values is not None else 0
627 if inputs_embeds is None:
--> 628 inputs_embeds = self.embed_tokens(input_ids)
630 # embed positions
631 if attention_mask is None:

File ███/myconda/condaGA/lib/python3.10/site-packages/torch/nn/modules/module.py:1130, in Module._call_impl(self, *input, **kwargs)
1126 # If we don't have any hooks, we want to skip the rest of the logic in
1127 # this function, and just call forward.
1128 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1129 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1130 return forward_call(*input, **kwargs)
1131 # Do not call functions when jit is used
1132 full_backward_hooks, non_full_backward_hooks = [], []

File ███/myconda/condaGA/lib/python3.10/site-packages/accelerate/hooks.py:156, in add_hook_to_module..new_forward(*args, **kwargs)
154 output = old_forward(*args, **kwargs)
155 else:
--> 156 output = old_forward(*args, **kwargs)
157 return module._hf_hook.post_forward(module, output)

File ███/myconda/condaGA/lib/python3.10/site-packages/torch/nn/modules/sparse.py:158, in Embedding.forward(self, input)
157 def forward(self, input: Tensor) -> Tensor:
--> 158 return F.embedding(
159 input, self.weight, self.padding_idx, self.max_norm,
160 self.norm_type, self.scale_grad_by_freq, self.sparse)

File ███/myconda/condaGA/lib/python3.10/site-packages/torch/nn/functional.py:2199, in embedding(input, weight, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse)
2193 # Note [embedding_renorm set_grad_enabled]
2194 # XXX: equivalent to
2195 # with torch.no_grad():
2196 # torch.embedding_renorm_
2197 # remove once script supports set_grad_enabled
2198 no_grad_embedding_renorm(weight, input, max_norm, norm_type)
-> 2199 return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)

RuntimeError: CUDA error: device-side assert triggered

@legor ty so much. I wonder they release a model without any proper example as yours.

@AbstractQbit your answer has hleped me a lot, many thanks!

Do you (or anyone else) now how to use the new_doc (padding) parameter to continue a document generation after the first prompt? Do I have to use the output of the prompt as input to the next one or it's better to use larger max_new_tokens value?

unfortunately hugging face doesnt support newdoc i dont know why. your other questions i also wonder

	self.tokenizer.enable_padding(direction="left", pad_id=1, pad_type_id=0, pad_token="[PAD]")
	self.tokenizer.enable_truncation(max_length=2020, direction="left")