The replied from chat, continue run and show all the sample response
chongy076 opened this issue · 2 comments
Expected Behavior
Show stop generating from sample and removed from replied message to user.
Current Behavior
it is continue generating and
Steps to Reproduce
Please provide detailed steps to reproduce the issue.
- Step 1 - load model ggml-vicuna-13b-4bit-rev1.bin
- Step 2- start the chat
Possible Solution
The new committed code at the 8 hours ago was working fine also generate faster, and the latest code seem removed many heavy code,
May need to take a look at from pyllamacpp.model import Model for code generation , previously was using self.cancel_gen=True to stop the generation. but notice , at the backend it is still running even cancel was called.
But at least on front end code stopped the hallucinations code from appeared at user end.
The best would be at stop in model generation itself. it loaded large result which not required and consumed a lot of time,
unless the generation was broken into batched. otherwise the return may not able to stop the thread in the model.
for tok in self.model.generate(prompt,
n_predict=n_predict,
temp=self.config['temp'],
top_k=self.config['top_k'],
top_p=self.config['top_p'],
repeat_penalty=self.config['repeat_penalty'],
repeat_last_n = self.config['repeat_last_n'],
n_threads=self.config['n_threads'],
):
if not new_text_callback(tok):
return
Context
Please provide any additional context about the issue.
Screenshots
If applicable, add screenshots to help explain the issue.
The last version does stop every thing if you use the llamacpp backend. you have to update the project so that it installs the latest version of the backend requirements. pip install -r requirements.txt
The last version does stop every thing if you use the llamacpp backend. you have to update the project so that it installs the latest version of the backend requirements. pip install -r requirements.txt
Hi , i will try again. At the same time. I am looking at pyllamacpp. it has the antiprompt, and some other parameters.
could you check the pull i did today, the code you updated on wsgi server, to run smoothly it can http_server can serve directly without needed to use socketio,run.
Thanks
[here the testcode i tried. on antiprompt]
#from pyllamacpp.model import Model
#model = Model(ggml_model='./models/llama_cpp/ggml-vicuna-13b-4bit-rev1.bin')
#for token in model.generate("hello"):
print(token, end='')
from pyllamacpp.model import Model
prompt_context = """ Act as ### Assistant:. ### Assistant: is helpful, kind, honest, good at writing, and never fails to answer the User's requests immediately and with precision. To do this, ### Assistant: uses a database of information collected from many different sources, including books, journals, online articles, and more. Stop generate after ### Assistant:
Human: Nice to meet you Bob!
Assistant: Welcome! I'm here to assist you with anything you need. What can I do for you today?
"""
prompt_prefix = "### Human:"
prompt_suffix = "### Assistant:"
smodel='./models/llama_cpp/ggml-vicuna-13b-4bit-rev1.bin'
model = Model(ggml_model=smodel, n_ctx=512, prompt_context=prompt_context, prompt_prefix=prompt_prefix,
prompt_suffix=prompt_suffix,anti_prompts=[prompt_prefix] )
while True:
try:
bStart=False
prompt = input(prompt_prefix)
if prompt == '':
continue
print(prompt_suffix , end='')
for tok in model.generate(prompt,n_predict=300):
if prompt_suffix in tok:
bStart=True
print("[Start]")
if bStart==True and prompt_prefix in tok:
print("[terminate]")
break
print(f"{tok}", end='', flush=True)
print()
except KeyboardInterrupt:
break
[result from model]
Human:list 2 pain killer medicine
Assistant: Here are two common painkiller medicines:
- Aspirin - a nonsteroidal anti-inflammatory drug (NSAID) used to relieve minor aches and pains, such as headaches, menstrual cramps, arthritis, toothaches, and the common cold.
- Acetaminophen (also known as paracetamol) - a pharmacological agent used for the relief of fe
Human:
hahaha , this is called a box ...... :D