deep-diver/LLM-As-Chatbot

bug in chatbot UI

GeorvityLabs opened this issue ยท 27 comments

When i say "hello" , after telling the first sentence by text input box gets blocked with the loading animation.
i'm not able to enter anything!
i attached a screenshot below, any idea why this issue is there in chatbot UI?

also can you add "reset" button to reset chat?

Screenshot from 2023-03-28 22-50-28

after like 5 or six questions it bugs out , any idea why this happens?
Screenshot from 2023-03-28 22-55-12

@deep-diver any idea why there are these bugs in the chatbot UI?

Sorry about that. Probably I need to understand Gradio better!

Will have a look into this case. Thanks for letting me know!

By the way I am hosting this in 3*A6000 now. Please try if you are interested

https://notebooksf.jarvislabs.ai/BuOu_VbEuUHb09VEVHhfnFq4-PMhBRVCcfHBRCOrq7c4O9GI4dIGoidvNf76UsRL/

@deep-diver is the current colab example in batch mode or streaming mode?
is there a difference in code.
if the colab example is in batch mode, how can it be converted to streaming mode?

Default mode is streaming. If you want batch mode set --batch_size higher than 1

input_box_issues.mp4

as you can see in the video above, the input box is blocked by the "loading animation" , any idea why this happens?

no_outputs_uibug.mp4

this is another UI bug where context box too is filled with the loading animation , any way to disable the loading animation @deep-diver
also no response is obtained from model , even after waiting for minutes.

Are you using Colab? That happens. It looks like connection is not stable in Colab

Are you using Colab? That happens. It looks like connection is not stable in Colab

not, colab. i am using on my local machine.
is there any way to fix this? if it is connection issue

but i did see same errors while using colab as well @deep-diver

and i also sometimes saw the same issues on jarvis labs ai page as well @deep-diver

@deep-diver any idea on how to achieve proper functioning inside colab? where you able to run tests and check if the results were stable inside colab.
i think the instabilities in the UI translate to when we run on local machine too.
some instability in the chatbot UI.

I just added cancel button

ok that is great @deep-diver , i will run some tests and check how it functions now.

I just added cancel button

@deep-diver can you also add a reset button? so that we can reset the chat , similar to how bing chat has reset option

Sounds good! Will try

@deep-diver any update on the reset button? similar to chatgpt , to reset the chat/start new conversation.

Not yet

Sorry about that. I am currently busy at experimenting 65b model

I am thinking about having history tab instead of reset. Like you login with google account

@deep-diver is there a way to measure the inference speed, like how many tokens/sec on a given gpu for a given model.

you can write your own code to simply count how many tokens are yielded for a certain amount of time window

just added reset button

@deep-diver which dataset did you use to train this model : chansung/gpt4-alpaca-lora-13b ?
is the dataset available on huggingface?

Used GPT4 generated dataset introduced in the "Instruction Tuning with GPT-4" paper. You can find out the dataset in the official repo

@deep-diver , I have a GPU with 40GB of VRAM.
If i run LLM-As-Chatbot with Llama 7B and AlpacaGPT4 Lora , how many instances can I run in parallel on a single GPU?

I don't want to make any queues, I want to check how many instances I can run on a single GPU at a time in parallel.

Hope you could shed some light onto this.

Also @deep-diver , I'm currently running multiple instances in parallel on a single GPU by creating seperate docker containers for each instance of the chatbot (that way i get unique gradio links for each instance) , is there a better way to run multiple LLM-As-Chatbot instances in parallel on a single GPU?

@GeorvityLabs

sorry, I am not sure about this question. Even if you containerize, I/O blocking should be still there because there is a single GPU globally available. Better solution would logically isolate a single GPU as if there are multiple physical GPUs.

I am going to close this issue for now. Please put anything that you are wondering in the Discussion menu. I think that is a better place :)