Few suggestions to add
ParisNeo opened this issue · 9 comments
Hi and thanks for your awesome backend.
I was just wondering if you can add those options to your generate method:
repeat_penalty
repeat_last_n
They are standard GPT parameters and your backend kinda miss that. They help penalize the model when it starts repeating stuff.
I also have a question. Are generations independent? I mean, if I generate a first answer, does the model remember it or should I send back the whole discussion every time?
My UI uses a full thread and for the other backend, I send the entire discussion every time. It should be cool to have the possibility to select one or another and to be able to clear the chatbot when I create a new discussion.
Thanks alot, these bindings are awesome. I didn't put a release yet, but When I do I'll send you the link if you want to test.
Hi, the original C++ code from ggerganov/ggml repo doesn't provide repeat_penalty, repeat_last_n options for GPT-J model, so it would require major changes to add them. I will look into this over the weekend.
All generations are independent as it doesn't remember any of the previous inputs/outputs. Can you please share link to your code where you are sending the entire discussion every time. Wouldn't the input get very huge over time and exceed the context size?
Hi,
Oh, sorry I forgot. Here is my project:
https://github.com/nomic-ai/gpt4all-ui
And here is a video about how to use it:
https://www.youtube.com/watch?v=M7NFajCyZKs&t=27s&ab_channel=ParisNeo
It is hosted by nomic-ai's github server. That provided me with more visibility.
If you look onto pyGPT4ALL/backends you can find the backend code if you want to see how I use it.
I'm still fixing some bugs in the UI. But I think I'll make your backend the default one if I manage to make it work perfectly. For now the default one uses llama-cpp backend which supports original gpt4all model, vicunia 7B and 13B. I'm testing the outputs from all these models to figure out which one is the best to keep as the default but I'll keep supporting every backend out there including hugging face's transformers.
I'll make some links to the backends projects I am using so that people can check your work up and give you some love.
Thanks for your reactivity.
Added repeat_penalty
, repeat_last_n
options in the latest version 0.2.3:
model.generate(prompt, repeat_penalty=1.1, repeat_last_n=64)
By default repeat_penalty
is 1.0
which means it is disabled.
I will look into storing the previous context next week.
Thanks man. I am going to support third party backends to help give more visibility to contributors. I'll show you how you can build your own compatible backend and you could add it to my ui. I'll build an example in my github and you can copy it, modify it and do whatever you want with it. When it is ready, you can add it to the supported backends list via a pull request.
Added support for keeping the previous context in the latest version 0.2.5
To keep the previous context ("remember" the previous response), use reset=False
:
model.generate('Write code to sort numbers in Python.')
model.generate('Rewrite the code in JavaScript.', reset=False)
# New discussion
model.generate('...', reset=True)
model.generate('...', reset=False)
By default reset
is True
.
Recently I also added support for LangChain (see README).
Hi there.
Thanks man. I am reorganizing the backends and should definetely support your code in the upcoming version. This week i have a trip and got to finish writing two papers for a conference. Please be patient as i will have no tie to work on this these days.
Hi, I created a new library which supports more models and has more features: https://github.com/marella/ctransformers
If you are using gpt4all-j
, I highly recommend migrating to this new library. Currently it supports GPT-2, GPT-J (GPT4All-J), GPT-NeoX (StableLM), Dolly V2, StarCoder. Will continue to add more models.
It provides a unified interface for all models:
from ctransformers import AutoModelForCausalLM
llm = AutoModelForCausalLM.from_pretrained('/path/to/ggml-gpt-2.bin', model_type='gpt2')
print(llm('AI is going to'))
It also provides a generator interface:
tokens = llm.tokenize('AI is going to')
for token in llm.generate(tokens):
print(llm.detokenize(token))
It can be used with models hosted on the Hugging Face Hub:
llm = AutoModelForCausalLM.from_pretrained('marella/gpt-2-ggml')
It also has built-in support for LangChain. Please see README for more details.
Thanks alot. I'll do that this evening.