leafo/lua-openai

Proper SSE parsing

Opened this issue · 2 comments

Specs: https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events/Using_server-sent_events#event_stream_format

As you can see, the specifications are rigidly defined, eliminating the necessity to use LPEG and iterative trial-and-error JSON decoding.

In addition, the current implementation is unable to parse valid cases, for instance, when the response contains comments (lines commencing with a colon). This is something you could encounter.

Specifically, below is a sample response from https://openrouter.ai/api/v1/chat/completions:

: OPENROUTER PROCESSING

data: {"id":"gen-Hh4bFjJS0U0YgPav7S9tMpfXrNHA","model":"mistralai/mistral-7b-instruct:free","object":"chat.completion.chunk","created":1713306744,"choices":[{"index":0,"delta":{"role":"assistant","content":"The"},"finish_reason":null}]}

data: {"id":"gen-Hh4bFjJS0U0YgPav7S9tMpfXrNHA","model":"mistralai/mistral-7b-instruct:free","object":"chat.completion.chunk","created":1713306744,"choices":[{"index":0,"delta":{"role":"assistant","content":" next"},"finish_reason":null}]}

data: {"id":"gen-Hh4bFjJS0U0YgPav7S9tMpfXrNHA","model":"mistralai/mistral-7b-instruct:free","object":"chat.completion.chunk","created":1713306744,"choices":[{"index":0,"delta":{"role":"assistant","content":" day"},"finish_reason":null}]}

data: {"id":"gen-Hh4bFjJS0U0YgPav7S9tMpfXrNHA","model":"mistralai/mistral-7b-instruct:free","object":"chat.completion.chunk","created":1713306744,"choices":[{"index":0,"delta":{"role":"assistant","content":","},"finish_reason":null}]}

data: {"id":"gen-Hh4bFjJS0U0YgPav7S9tMpfXrNHA","model":"mistralai/mistral-7b-instruct:free","object":"chat.completion.chunk","created":1713306744,"choices":[{"index":0,"delta":{"role":"assistant","content":" I"},"finish_reason":null}]}

data: {"id":"gen-Hh4bFjJS0U0YgPav7S9tMpfXrNHA","model":"mistralai/mistral-7b-instruct:free","object":"chat.completion.chunk","created":1713306744,"choices":[{"index":0,"delta":{"role":"assistant","content":" wand"},"finish_reason":null}]}

data: {"id":"gen-Hh4bFjJS0U0YgPav7S9tMpfXrNHA","model":"mistralai/mistral-7b-instruct:free","object":"chat.completion.chunk","created":1713306744,"choices":[{"index":0,"delta":{"role":"assistant","content":"ered"},"finish_reason":null}]}

data: {"id":"gen-Hh4bFjJS0U0YgPav7S9tMpfXrNHA","model":"mistralai/mistral-7b-instruct:free","object":"chat.completion.chunk","created":1713306744,"choices":[{"index":0,"delta":{"role":"assistant","content":" through"},"finish_reason":null}]}

data: {"id":"gen-Hh4bFjJS0U0YgPav7S9tMpfXrNHA","model":"mistralai/mistral-7b-instruct:free","object":"chat.completion.chunk","created":1713306744,"choices":[{"index":0,"delta":{"role":"assistant","content":" the"},"finish_reason":null}]}

data: {"id":"gen-Hh4bFjJS0U0YgPav7S9tMpfXrNHA","model":"mistralai/mistral-7b-instruct:free","object":"chat.completion.chunk","created":1713306744,"choices":[{"index":0,"delta":{"role":"assistant","content":" streets"},"finish_reason":null}]}

data: {"id":"gen-Hh4bFjJS0U0YgPav7S9tMpfXrNHA","model":"mistralai/mistral-7b-instruct:free","object":"chat.completion.chunk","created":1713306744,"choices":[{"index":0,"delta":{"role":"assistant","content":" of"},"finish_reason":null}]}

data: {"id":"gen-Hh4bFjJS0U0YgPav7S9tMpfXrNHA","model":"mistralai/mistral-7b-instruct:free","object":"chat.completion.chunk","created":1713306744,"choices":[{"index":0,"delta":{"role":"assistant","content":" the"},"finish_reason":null}]}

data: {"id":"gen-Hh4bFjJS0U0YgPav7S9tMpfXrNHA","model":"mistralai/mistral-7b-instruct:free","object":"chat.completion.chunk","created":1713306744,"choices":[{"index":0,"delta":{"role":"assistant","content":" city"},"finish_reason":null}]}

data: {"id":"gen-Hh4bFjJS0U0YgPav7S9tMpfXrNHA","model":"mistralai/mistral-7b-instruct:free","object":"chat.completion.chunk","created":1713306744,"choices":[{"index":0,"delta":{"role":"assistant","content":","},"finish_reason":null}]}

data: {"id":"gen-Hh4bFjJS0U0YgPav7S9tMpfXrNHA","model":"mistralai/mistral-7b-instruct:free","object":"chat.completion.chunk","created":1713306744,"choices":[{"index":0,"delta":{"role":"assistant","content":" feeling"},"finish_reason":null}]}

data: {"id":"gen-Hh4bFjJS0U0YgPav7S9tMpfXrNHA","model":"mistralai/mistral-7b-instruct:free","object":"chat.completion.chunk","created":1713306744,"choices":[{"index":0,"delta":{"role":"assistant","content":" lost"},"finish_reason":null}]}

data: {"id":"gen-Hh4bFjJS0U0YgPav7S9tMpfXrNHA","model":"mistralai/mistral-7b-instruct:free","object":"chat.completion.chunk","created":1713306744,"choices":[{"index":0,"delta":{"role":"assistant","content":" and"},"finish_reason":null}]}

data: {"id":"gen-Hh4bFjJS0U0YgPav7S9tMpfXrNHA","model":"mistralai/mistral-7b-instruct:free","object":"chat.completion.chunk","created":1713306744,"choices":[{"index":0,"delta":{"role":"assistant","content":" alone"},"finish_reason":null}]}

data: {"id":"gen-Hh4bFjJS0U0YgPav7S9tMpfXrNHA","model":"mistralai/mistral-7b-instruct:free","object":"chat.completion.chunk","created":1713306744,"choices":[{"index":0,"delta":{"role":"assistant","content":"."},"finish_reason":null}]}

data: {"id":"gen-Hh4bFjJS0U0YgPav7S9tMpfXrNHA","model":"mistralai/mistral-7b-instruct:free","object":"chat.completion.chunk","created":1713306744,"choices":[{"index":0,"delta":{"role":"assistant","content":" I"},"finish_reason":null}]}

data: {"id":"gen-Hh4bFjJS0U0YgPav7S9tMpfXrNHA","model":"mistralai/mistral-7b-instruct:free","object":"chat.completion.chunk","created":1713306744,"choices":[{"index":0,"delta":{"role":"assistant","content":" had"},"finish_reason":null}]}

data: {"id":"gen-Hh4bFjJS0U0YgPav7S9tMpfXrNHA","model":"mistralai/mistral-7b-instruct:free","object":"chat.completion.chunk","created":1713306744,"choices":[{"index":0,"delta":{"role":"assistant","content":" no"},"finish_reason":null}]}

data: {"id":"gen-Hh4bFjJS0U0YgPav7S9tMpfXrNHA","model":"mistralai/mistral-7b-instruct:free","object":"chat.completion.chunk","created":1713306744,"choices":[{"index":0,"delta":{"role":"assistant","content":" idea"},"finish_reason":null}]}

data: {"id":"gen-Hh4bFjJS0U0YgPav7S9tMpfXrNHA","model":"mistralai/mistral-7b-instruct:free","object":"chat.completion.chunk","created":1713306744,"choices":[{"index":0,"delta":{"role":"assistant","content":" where"},"finish_reason":null}]}

data: {"id":"gen-Hh4bFjJS0U0YgPav7S9tMpfXrNHA","model":"mistralai/mistral-7b-instruct:free","object":"chat.completion.chunk","created":1713306744,"choices":[{"index":0,"delta":{"role":"assistant","content":" I"},"finish_reason":null}]}

data: {"id":"gen-Hh4bFjJS0U0YgPav7S9tMpfXrNHA","model":"mistralai/mistral-7b-instruct:free","object":"chat.completion.chunk","created":1713306744,"choices":[{"index":0,"delta":{"role":"assistant","content":" was"},"finish_reason":null}]}

: the rest is trimmed, to keep the issue message length in limits (maximum is 65536 characters).

data: [DONE]

As you can see, the specifications are rigidly defined, eliminating the necessity to use LPEG and iterative trial-and-error JSON decoding.

The specification is unrelated to the reason for the way the parsing is done. Output is being streamed from the http client, there is no guarantee that a full message will be put into the buffer as output is written from the server. cjson is not a streaming parser, so repeated parsing is done to emulate a streaming parser.

If you're concerned about performance, a potential optimization would be to start from the full length of the string and shrink by 1, instead of starting from length 1 and increase to the end. As it's more likely the correct position is located near the end of the string, not the beginning.

Specifically, below is a sample response from https://openrouter.ai/api/v1/chat/completions:

I've only tested this with the OpenAI API, but I have no issue with having comments be ignored if other "compatible" APIs output them.

Output is being streamed from the http client, there is no guarantee that a full message will be put into the buffer as output is written from the server.

Yes, but if we adhere to the specifications, we can still depend on the subsequent message being separated by two new lines. Instead of trying to simulate streaming JSON parsing, we may simply accumulate the response until '\n\n' is hit, and then JSON-parse the message in one go. This is the method employed in certain other libraries.

If you're concerned about performance.

Well, not so much, but while we still need to consider the comments-case, why not implement everything in the most optimal way?
I could undertake the implementation.