Need to recover the output word by word

Question

Need to recover the output word by word

ParisNeo opened this issue 2 years ago · 6 comments

Hi there,
Thank you for this promissing binding for gpt-J.

I am developing the GPT4All-ui that supports llamacpp for now and would like to support other backends such as gpt-j.

I can use your backend in my tool but I need the binding to be able to do the following:

1 - Allow me ro provide a callback function to the generate function to recover the output word by word as it comes.
2 - The callback function returns a continue boolean. If the boolean is False, it means that the user wants to stop the generation process. If the function returns nothing, you cas assulme that we want to continue.
3 - My tool has a continuous discussion system so I will send you every thing at once (the conditionning text + all previous discussion parts). So no need to add internal conditionning, just take all my text as prompt and give me the answer word by word.
4 - I also need a function that gives me the count of tokens from a text. I need this to adjust the window of words I send you so that it doesn't exceed the context size without loosing the conditionning part.

I think your binding can be a good thing to add to my tool as GPT-J has its own
Here is my tool if you want to check it out:
https://github.com/nomic-ai/gpt4all-ui
you can view my videos that explain how to install it, use it and modify the settings:
https://www.youtube.com/watch?v=6kKv6ESnwMk&list=PLuI3It1_o4U9u_0_SZTVYofUlYKQwIcl_&ab_channel=ParisNeo

The ui has advanced greatly from when I made this video. You can look at latest version in the github.

Thanks for making this binding.

Best regards

Answer 1 · 2023-04-20T12:46:28.000Z

Hi, I'm already planning to add a callback function similar to gpt4all-chat. Currently I'm working on automating the build process for different OS and have to do some cleanup, so will look into it after that. I can also add 4) function to get count of tokens.

gpt4all-ui looks interesting. I actually thought of making a simple chat ui for fun using flask/quart and the bindings I created but yours looks more advanced. Also looking at webui.sh, wouldn't it simpler to package the application as a pypi package so that you can just do pip install gpt4all-ui and something like python -m gpt4all-ui to start the server?

Answer 2 · 2023-04-20T13:44:24.000Z

you are right.
I think I'll do that :)

Answer 3 · 2023-04-20T14:05:29.000Z

As of the binding. I have started building the backend system.
Now there is a folder called backends that contains the list of backends. They all implement the same interface. For now I'm putting the pyllamacpp and a degraded version of your binding that spits all the output when the generation finishes. That will allow me to test bindings selection tool and verify that every thing is ready for when you do the update.

Thank you very much.

Answer 4 · 2023-04-20T17:50:21.000Z

I have integrated your backend as one of the supported backends. It doesn't spit the words one by one, so the users need to wait for the message to be processed entirely before seeing it on the ui.
If you want, you can check it out and report any problems you may encounter.

Thank you for your contribution.
Checkout our discord channel: https://discord.gg/4rR282WJb6

Answer 5 · 2023-04-23T18:35:39.000Z

This is released in the latest version 0.2.0

If a callback function is passed to model.generate(), it will be called once per each generated token. To stop generating more tokens, return False inside the callback function.

def callback(token):
    print(token)

model.generate(prompt, callback=callback)

Also added model.num_tokens():

count = model.num_tokens(prompt)

Answer 6 · 2023-04-23T19:09:17.000Z

You are awesome