huggingface/llm-vscode

[Feature request] Adding a checker to see if a custom endpoint is working properly

remyleone opened this issue · 1 comments

I'm trying to run a model using the following command on my server:

docker run --gpus all --shm-size 1g -p 8080:80 -v /scratch/data:/data -e HUGGING_FACE_HUB_TOKEN=$HF_TOKEN -e HF_HUB_ENA^CE_HF_TRANSFER=0 ghcr.io/huggingface/text-generation-inference:1.1.0 --model-id bigcode/starcoder

But when I configure the IP in my editor: http://XXXX:8080/generate I would like to have a test from the editor that tell me whether or not the editor can successfully connect.

It could be helpful in the settings or as a dedicated command from the LLM extensions to verify that everything is in place and have helpful messages in case it is not.

As a check from the server side, I'm using nvidia-smi to see if additional GPU usage is happening

This issue is stale because it has been open for 30 days with no activity.