microsoft/kernel-memory

TextGenerationOptions is totally not used

AsakusaRinne opened this issue · 9 comments

TextGenerationOptions is a parameter of ITextGeneration.GenerateTextAsync. However currently it seems to be not used anywhere.

For some API service like OpenAI Chatgpt, stop sequence is not so important. However for local model inference, the model will endlessly generate response without a stop sequence.

Could you please expose TextGenerationOptions to AskAsync API to let users configure the settings themselves? It will help a lot for local LLM inference integration.

If possible, I hope that the method of calculating the number of tokens can also provide custom configuration.

Any updates? @dluc I understand that at the beginning stage of a project, there's always short of hands. Please at least let us know if it would be solved in the future.

dluc commented

Sorry we didn't have an opportunity to look into this yet, but we always keep an eye on the list of open issues, so we'll provide an update as soon as possible.

Ok, I'm looking forward to it. Thank you for your works anyway.

dluc commented

I noticed that LLama would generate tokens ad infinitum (almost, at some point it throws an exception). SearchClientConfig.AnswerTokens will be passed as TextGenerationOptions.MaxTokens.

I'll look into adding the options to the Ask API, so the behavior can be managed more easily.

Thank you a lot! I'm looking forward to it

Update: @marcominerva added new settings to SeachClientConfig, see #341, allowing to configure Temperature, TopP and other LLM request settings. These will soon be used by AskAsync, PR coming soon.

Fixed: see #341 and #344

@dluc Thank you Devis! We'll keep track with the next release and apply it in LLamaSharp