AndraxDev/speak-gpt

Bug/FR: allow disabling some arguments because anthropic and mistral endpoint don't accept them

Closed this issue · 7 comments

Hi,

Disclaimer

I was writing this issue while you answered #105 but now I'm confused as to wether i should post this one or not.

You said:

Mistran is not supported

and

don't give any warranties that non-OpenAI endpoints will fully compatible and all subsequent issues will be closed.

But the README says about endpoints:

Other (must be tested by community, don't be shy and provide your feedback)

For that and because I'm not sure if that's something that would best be solved in openai-kotlin or if that has more to do with arguments specified by SpeakGPT, I decided to post this issue to provide feedback because that would avoid hard crashes for some some endpoints (at least mistral and probably anthropic).

The issue

The mistral endpoint cannot currently be used: https://api.mistral.ai/v1/. I'm fairly that's because there's a mismatch in argument compatibility.

There's a lib called LiteLLM that provides a common python API to reach many LLM endpoints and their solution was to add a drop_params argument that drops any argument that is not supported by the specific endpoint.

As you can see in their API the mistral models don't expose logit_bias, presence penalty etc. But it supports additional arguments for example "safe_prompt".

I totally understand that SpeakGPT does not provide warranties for supporting any endpoints, but still decided to provide this feedback. I think a solution would be to allow disabling arguments per endpoint, but I of course totally understand if you think the changes are too involved and unecessary.

If you don't mind I'll post my suggestion on what maybe the easiest general solution would be:

  1. add a list of checkbox in the "Edit API endpoint" window to selectively disable arguments like logit bias, temperature, top_p, frequency_penalty, presence_penalty. Also to disable vision or not.
  2. add a field "extra argument" because some endpoints, can support some, for example mistral has a safe_prompt argument. But I'm sure in the future there will be many more specificities like that. For example anthropic does not have the penalty arguments
  3. Optional: Automatically greying out the options in the quick settings if the selected endpoint does not support them, and hiding the "share image" buttons if vision is disabled.

The full error can be found below:

Expand

An error has been occurred during generation. See the error details below:

m2.a
	at w2.d.c(Unknown Source:131)
	at w2.d.a(Unknown Source:13)
	at w2.d.i(Unknown Source:89)
	at w2.c.m(Unknown Source:12)
	at x8.a.g(Unknown Source:5)
	at n9.k0.run(Unknown Source:101)
	at android.os.Handler.handleCallback(Handler.java:959)
	at android.os.Handler.dispatchMessage(Handler.java:100)
	at android.os.Looper.loopOnce(Looper.java:232)
	at android.os.Looper.loop(Looper.java:317)
	at android.app.ActivityThread.main(ActivityThread.java:8532)
	at java.lang.reflect.Method.invoke(Native Method)
	at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:552)
	at com.android.internal.os.ExecInit.main(ExecInit.java:50)
	at com.android.internal.os.RuntimeInit.nativeFinishInit(Native Method)
	at com.android.internal.os.RuntimeInit.main(RuntimeInit.java:359)
Caused by: g7.e: Client request(POST https://api.mistral.ai/v1/chat/completions) invalid: 422 Unprocessable Entity. Text: "{"object":"error","message":{"detail":[{"type":"extra_forbidden","loc":["body","presence_penalty"],"msg":"Extra inputs are not permitted","input":0.0,"url":"https://errors.pydantic.dev/2.6/v/extra_forbidden"},{"type":"extra_forbidden","loc":["body","frequency_penalty"],"msg":"Extra inputs are not permitted","input":0.0,"url":"https://errors.pydantic.dev/2.6/v/extra_forbidden"}]},"type":"invalid_request_error","param":null,"code":null}"
	at g7.j.m(Unknown Source:218)
	at g7.j.i(Unknown Source:12)
	at g7.u.b(Unknown Source:116)
	at a7.b.m(Unknown Source:461)
	at x8.a.g(Unknown Source:5)
	at n9.k0.run(Unknown Source:109)
	... 10 more

(Sorry for the many notification but: I confirm that using the openrouter endpoint and setting the model to mistralai/mistral-large-latest works perflectly fine. So that makes me hopeful that better support with openrouter would be a good way to support many more models without having to change the code too much.)

Fixed in SpeakGPT 3.25

Thanks a lot for the many features!

I know mistral is not officially supported and openrouter now is but I noticed that i get a very suspicious error message when using the mistral endpoint (and not via openrouter) that might indicate something deeper:

{"object":"error","message":"Expected last role to be one of: [user, tool] but got system","type":"invalid_request_error","param":null,"code":null}"

I did check and mistral does support system prompt source. So I prefered to report it in case in some situations the system prompt where moved somewhere unexpected.

  • Deleting my system prompt solves the issue and the LLM works normally.
  • I tested several mistral models and got the same result.
  • I tried removing the system prompt, exchanging a few messages, then adding the system prompt to the chat to see if it would error something different and it didn't, I get the same error:
Expand
An error has been occurred during generation. See the error details below:

m2.a
	at w2.d.c(Unknown Source:155)
	at w2.d.a(Unknown Source:13)
	at w2.d.i(Unknown Source:89)
	at w2.c.m(Unknown Source:12)
	at x8.a.g(Unknown Source:5)
	at n9.k0.run(Unknown Source:101)
	at android.os.Handler.handleCallback(Handler.java:959)
	at android.os.Handler.dispatchMessage(Handler.java:100)
	at android.os.Looper.loopOnce(Looper.java:232)
	at android.os.Looper.loop(Looper.java:317)
	at android.app.ActivityThread.main(ActivityThread.java:8532)
	at java.lang.reflect.Method.invoke(Native Method)
	at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:552)
	at com.android.internal.os.ExecInit.main(ExecInit.java:50)
	at com.android.internal.os.RuntimeInit.nativeFinishInit(Native Method)
	at com.android.internal.os.RuntimeInit.main(RuntimeInit.java:359)
Caused by: g7.e: Client request(POST https://api.mistral.ai/v1/chat/completions) invalid: 400 Bad Request. Text: "{"object":"error","message":"Expected last role to be one of: [user, tool] but got system","type":"invalid_request_error","param":null,"code":null}"
	at g7.j.m(Unknown Source:218)
	at g7.j.i(Unknown Source:12)
	at g7.u.b(Unknown Source:116)
	at a7.b.m(Unknown Source:461)
	at x8.a.g(Unknown Source:5)
	at n9.k0.run(Unknown Source:109)
	... 10 more
Now

Addendum: I see that in the system prompt window you indicate

Putting a system message at the end of conversation...

I'm pretty sure most LLM expect the system prompt to appear first:

Typically, a conversation will start with a system message that tells the assistant how to behave

For your defense I also read this on openai:

Be aware that gpt-3.5-turbo-0301 does not generally pay as much attention to the system message as gpt-4-0314 or gpt-3.5-turbo-0613. Therefore, for gpt-3.5-turbo-0301, we recommend placing important instructions in the user message instead. Some developers have found success in continually moving the system message near the end of the conversation to keep the model's attention from drifting away as conversations get longer.

But that only applies to models that are not very good and for conversation that are quite long and for models with very short context length.

I think last models are quite powerful and have the big context windows so it's better to append system prompt at the end. Specially for this I run some experiments and noticed that if conversation is big system prompt is being ignored. So it will not be fixed.

Typically, a conversation will start with a system message that tells the assistant how to behave

I checked OpenAI dock now and didn't found such citations. Maybe in your country OpenAI has outdated docs. gpt-3.5-turbo-0301 model is deprecated and possibly shutdown. The whole year has passed since this recommendation is issued. During this period a lot things has changed.

So SpeakGPT system prompt will stay untouched and no options to add it to the start of the end of conversation will be added.

I'm sorry I have given the wrong link : https://cookbook.openai.com/examples/how_to_format_inputs_to_chatgpt_models

That's where both citations come frome.

I strongly suggest aligning with what seems like best practices.

1.This quote from OpenAI:

Typically, a conversation will start with a system message that tells the assistant how to behave

  1. Also from openai:

Be aware that gpt-3.5-turbo-0301 does not generally pay as much attention to the system message as gpt-4-0314 or gpt-3.5-turbo-0613. Therefore, for gpt-3.5-turbo-0301, we recommend placing important instructions in the user message instead. Some developers have found success in continually moving the system message near the end of the conversation to keep the model's attention from drifting away as conversations get longer.

The above quote means that "some developpers" (so it's not common) found a way that is relevant to "to keep the model's attention from drifting away as conversations get longer" so it's only relevant if the attention starts drifting away. Meaning that in all other cases the recommended way is to put the system prompt FIRST. That means that if indeed you tested and noticed that adding the system prompt at the end makes it not be ignored, that also suggest that if you have a 5 message long conversation (so NOT a situation where the attention is drifting away) but added the system prompt at the end then the 2 assistant so far would have been prompted incorrectly (what I mean by that is that the LLM was trained expecting the system prompt to be first so "getting ignored" is not a good test to see if it's properly followed when it's not ignored).

  1. Additionnaly, in the 8 examples I count in the openai cookbook absolutely none contain a system prompt that is not before the user messages.

  2. Another argument is that mistral which, although not officially supported, is a big player in the AI spaces even errors out if the system prompt is not first.

  3. Another argument is that even Llama is expecting the system prompt first:

messages = [
{"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
{"role": "user", "content": "Who are you?"},
]

  1. To back my claim I also asked perplexity ai and it seems to agree mostly with me:
    image

  2. The dolphin family of fine tunes from someone well known as an example also puts the system prompt first:
    image

I think those are 7 strong arguments, with sources, from OpenAI, Mistral and Meta. If you are still not convinced, I am extremely curious as to what kind of testing you have done that justifies not following what seems to me like established practices.

And to be clear: I'm not asking for another option, I'm asking for SpeakGPT to prompt the LLMs the way they were trained.

Receive new update (SpeakGPT 3.26) and close your mouth please (I'm tired and nervouse for now because you sent too much notifications). Thanks!