Stop token during Inference
sankethgadadinni opened this issue · 3 comments
Why are the responses cut down in the middle?
you need to update the
generation_config.max_new_tokens = 200
to which how many max new tokens you want it to generate
generation_config = model.generation_config
generation_config.max_new_tokens = 100
generation_config.temperature = 0.5
generation_config.top_p = 0.7
generation_config.num_return_sequences = 1
generation_config.pad_token_id = tokenizer.eos_token_id
generation_config.eos_token_id = tokenizer.eos_token_id
I have this config but still it ends after completion of number of tokens. Is there way to stop at end of sentences like openai.
Increase the number of max new tokens to something like 400-500 to get bigger replies. Falcon7b can output max 2k tokens