Running in instruct mode and model file in a different directory
regstuff opened this issue · 5 comments
Was wondering how I could pass the arguments --instruct and --model to the npm start command.
PORT=14003 npm start mlock ctx_size 1500 threads 12 instruct model ~/llama_models/wizardLM-7B-GGML/wizardLM-7B.ggml.q5_1.bin
I get an Args error: instruct is not a valid argument. model is not a valid argument.
These are valid arguments for llama.cpp to run alpaca style models from a directory other than the default model folder.
instruct isn't a valid flag because it's encompassed in the api itself – ChatCompletion will simulate a chat response and Completion will simulate a completion specifically. So it's not a necessary flag (the app using the OpenAi API should already be doing the right "instruct" mode when necessary)
For the model, you want to pass that into the gpt-app instead (like chatbot-ui or auto-gpt), typically in the .env file, so it'd look something like OPENAI_API_KEY=../llama.cpp/models/wizardLM-7B-GGML/wizardLM-7B.ggml.q5_1.bin
That would be weird abuse of a variable. It would be much better to have a LOCAL_MODEL_PATH variable, and if no local model path is set, then use OpenAI's API, for example. I would favor trying to use a de facto standard local API such as text-generation-webui's API, rather than trying to reinvent the wheel by running local models directly, though. For one thing, sharing one local API means that multiple tools can use it. For another, there's a LOT of complexity in supporting local acceleration hardware and different model types and so on. Just using a standard local API makes it a lot simpler.
@keldenl
Sorry I think I'm missing something. How do I get it to follow the ### INSTRUCTION ### RESPONSE template for alpaca and similar models. When I use chatcompletion, it seems to be in a User: Assistant: template, which isn't working for wizardLM. The LLM doesn't follow my instructions.
When I use the Completions endpoint and add the Instruction Response template into the prompt, the server seems to hang and no response is generated.
It Processes the prompt, and then the ===== RESPONSE ===== line appears, and that's it.
That would be weird abuse of a variable. It would be much better to have a LOCAL_MODEL_PATH variable, and if no local model path is set, then use OpenAI's API, for example. I would favor trying to use a de facto standard local API such as text-generation-webui's API, rather than trying to reinvent the wheel by running local models directly, though. For one thing, sharing one local API means that multiple tools can use it. For another, there's a LOT of complexity in supporting local acceleration hardware and different model types and so on. Just using a standard local API makes it a lot simpler.
The thing about this is that the end goal for this project to be able to plug 'n play with any GPT-powered project – the less changes (even 0 changes like in chatbot-ui) to the code the better. LOCAL_MODEL_PATH is something people need to account for (i.e. langchain supporting local models), but this project aims to solve for all the other GPT apps that exist out there how can we leverage the work folks have done but run a local model against it? That's the goal.
@keldenl
Sorry I think I'm missing something. How do I get it to follow the ### INSTRUCTION ### RESPONSE template for alpaca and similar models. When I use chatcompletion, it seems to be in a User: Assistant: template, which isn't working for wizardLM. The LLM doesn't follow my instructions.
When I use the Completions endpoint and add the Instruction Response template into the prompt, the server seems to hang and no response is generated.
It Processes the prompt, and then the ===== RESPONSE ===== line appears, and that's it.
@regstuff it sounds like you might be running into a different issue – any chance you could post what's showing up on your terminal and what the request is? (where are you using the server? chatbot-ui?)
also i just merged some changes that should give u better error logging so maybe pull and then post here?