NisaarAgharia/Indian-LawyerGPT

Stop token during Inference

sankethgadadinni opened this issue · 3 comments

Why are the responses cut down in the middle?

you need to update the
generation_config.max_new_tokens = 200

to which how many max new tokens you want it to generate

generation_config = model.generation_config
generation_config.max_new_tokens = 100
generation_config.temperature = 0.5
generation_config.top_p = 0.7
generation_config.num_return_sequences = 1
generation_config.pad_token_id = tokenizer.eos_token_id
generation_config.eos_token_id = tokenizer.eos_token_id

I have this config but still it ends after completion of number of tokens. Is there way to stop at end of sentences like openai.

Increase the number of max new tokens to something like 400-500 to get bigger replies. Falcon7b can output max 2k tokens