Curl to llama.cpp server fails to parse json if accessed over the network
gordon-quad opened this issue · 10 comments
Using following config
llamacpp_url = 'http://10.1.1.23:8123'
require('model').setup({
chats = {
zephyr = {
provider = llamacpp,
options = {
url = llamacpp_url
},
system = 'You are a helpful assistant',
create = input_if_selection,
run = zephyr_fmt.chat
},
},
})
Open Mchat buffer and make model produce long enough text:
zephyr
---
{
options = {
url = "http://10.1.1.23:8123"
}
}
---
> You are a helpful assistant
Tell me something I don't know, produce a very long response. I need it to test something.
======
Certainly! Here's a lengthy response to test your capabilities:
Did you know that the world's largest living organism is actually a fungus, and it covers an area of over 2,000 acres in Oregon's Malheur National Forest? Known as the Armillaria ostoyae, this particular species of fungus has been estimated to be at least 2,400 years old. The fungus spreads through a network of underground mycelia that connects neighboring trees and can grow at a rate of up to three feet per day. In fact, the collective weight of this single organism is estimated to be more than 6,000 tons. This fungus has been dubbed the "Humongous Fungus" and has been a source of fascination for scientists and nature enthusiasts alike. It's also a testament to the incredible complexity and interconnectedness that exists in our natural world, reminding us of the importance of preserving these delicate ecosystems for future generations.
======
Keep talking.
After such prompt model.nvim fails to parse returned json stating the reason it is incomplete. If I call curl directly by hand it produces correct i.e. non-cropped json. However if I replace curl here https://github.com/gsuuon/model.nvim/blob/main/lua/model/util/curl.lua#L86 with custom script containing
#!/bin/bash
curl "$@"
sleep 1
it succeeds every time.
My suspicion is that neovim's system
function calls on_exit
prior to calling on_stdout
on last part of the curl's output, but I am not so familiar to neovim's lua programming to find out how to test it quickly.
Neovim version:
NVIM v0.9.5
Build type: Release
LuaJIT 2.1.0-beta3
Compilation: /usr/bin/x86_64-pc-linux-gnu-gcc -Wall -Wextra -pedantic -Wno-unused-parameter -Wstrict-prototypes -std=gnu99 -Wshadow -Wconversion -Wvla -Wdouble-promotion -Wmissing-noreturn -Wmissing-format-attribute -Wmissing-prototypes -fno-common -Wno-unused-result -Wimplicit-fallthrough -fdiagnostics-color=always -fstack-protector-strong -DUNIT_TESTING -DINCLUDE_GENERATED_DECLARATIONS -D_GNU_SOURCE -I/usr/include/luajit-2.1 -I/usr/include -I/usr/include -I/var/tmp/portage/app-editors/neovim-0.9.5/work/neovim-0.9.5_build/src/nvim/auto -I/var/tmp/portage/app-editors/neovim-0.9.5/work/neovim-0.9.5_build/include -I/var/tmp/portage/app-editors/neovim-0.9.5/work/neovim-0.9.5_build/cmake.config -I/var/tmp/portage/app-editors/neovim-0.9.5/work/neovim-0.9.5/src -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include
system vimrc file: "/etc/vim/sysinit.vim"
fall-back for $VIM: "/usr/share/nvim"
Thanks for the report! Just to clarify, it only happens over network (so running locally always works) and only with long responses? Do you know about how long the response needs to be to hit the issue? Does this happen with any of the other providers?
My pity laptop is unable to run any model locally, but I can try later to run nvim on my ML rig just to be sure. It does happen only with longer responses, e.g. first response in my example finished generating successfully, however response to "Keep talking." does fail repeatedly. I also did not try any other providers since I have no intention of using any cloud solution and have only llama.cpp for local ones.
I also forgot to include my neovim version, updated description.
It is hard to judge how long the response need to be, but it seems to be related to the very last chunk of data with summary stats from llama.cpp server, and it includes context for such request, so I think initial context length indirectly influences that.
For example, such Mchat buffer also triggers the issue:
zephyr
---
{
options = {
url = "http://10.1.1.23:8123"
}
}
---
> You are a helpful assistant
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Keep talking.
And that's what I get in message buffer along with parse error:
'{"content":"","generation_settings":{"dynatemp_exponent":1.0,"dynatemp_range":0.0,"frequency_penalty":0.0,"grammar":"","ignore_eos":false,"logit_bias":[],"min_keep":0,"min_p":0.05000000074505806,"mirostat":0,"mirostat_eta":0.10000000149011612,"mirostat_tau":5.0,"model":"models/zephyr-7b-beta.Q8_0.gguf","n_ctx":32768,"n_keep":0,"n_predict":-1,"n_probs":0,"penalize_nl":true,"penalty_prompt_tokens":[],"presence_penalty":0.0,"repeat_last_n":64,"repeat_penalty":1.100000023841858,"samplers":["top_k","tfs_z","typical_p","top_p","min_p","temperature"],"seed":4294967295,"stop":[],"stream":true,"temperature":0.800000011920929,"tfs_z":1.0,"top_k":40,"top_p":0.949999988079071,"typical_p":1.0,"use_penalty_prompt_tokens":false},"model":"models/zephyr-7b-beta.Q8_0.gguf","prompt":"<|system|>\\nYou are a helpful assistant</s>\\n\\n<|user|>\\naaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\\nKeep talking.</s>\\n\\n<|assistant|>\\n","slot_id":0,"stop":true,"stopped_eos":true,"stopped_limit":false,"stopped_word":false,"stopping_word":"","timings":{"predicted_ms":11733.463,"predicted_n":154,"predicted_per_second":13.124854955438135,"predicted_per_to'
Which amounts to 4096 bytes, which cannot be a coincidence.
I'm not able to repro this locally - let me know how it goes when you try neovim on your ML server. Maybe we're just missing a curl argument for that case, if it's cutting off at exactly 4kb.
I can confirm that when llama.cpp server is running locally, issue is not reproducible. I used the same neovim version with whole config copied over to the machine that is capable of running my models.
Also it is not reproducible if machines are on the same physical network, but reproducible if they are talking over the internet so it might be latency related.
Can it potentially be related to the fact that on_stdin
is scheduled and potentially can be called after on_exit
?
https://github.com/gsuuon/model.nvim/blob/main/lua/model/util/system.lua#L64
Excuse my crudity of trying to debug stuff using util.notify
- I have 0 experience with lua and with nvim plugin developement, trying my best here. What I did is just added few traces:
--- a/lua/model/util/sse.lua
+++ b/lua/model/util/sse.lua
@@ -1,4 +1,5 @@
local curl = require('model.util.curl')
+local util = require('model.util')
local M = {}
@@ -64,6 +65,7 @@ function M.client(handlers)
-- partial of multiple sse message
pending = pending .. raw
+ util.notify('trace1: ' .. string.sub(raw, -100):gsub('\r', ''):gsub('\n', ''))
if is_sse then
consume_sse_messages()
@@ -76,6 +78,7 @@ function M.client(handlers)
on_error = handlers.on_error,
on_exit = function()
if #pending > 0 then
+ util.notify('trace2: ' .. string.sub(pending, -100):gsub('\r', ''):gsub('\n', ''))
if is_sse then
handlers.on_message(M.parse_message(pending), pending)
else
And in the message buffer of vim it is clearly evident that either on_exit
is called before last on_stdout
or there is some race condition happening and pending = pending .. raw
change is not propagated when on_exit
is executed.
[skipped]
trace1: data: {"content":" particularly","multimodal":false,"slot_id":0,"stop":false}
trace1: data: {"content":" proud","multimodal":false,"slot_id":0,"stop":false}
trace1: data: {"content":" of","multimodal":false,"slot_id":0,"stop":false}
trace1: data: {"content":" yourself","multimodal":false,"slot_id":0,"stop":false}
trace1: data: {"content":"?","multimodal":false,"slot_id":0,"stop":false}
trace1: data: {"content":"","multimodal":false,"slot_id":0,"stop":false}
trace1: data: {"content":"","multimodal":false,"slot_id":0,"stop":false}
trace1: data: {"content":"","multimodal":false,"slot_id":0,"stop":false}
trace1: data: {"content":"","multimodal":false,"slot_id":0,"stop":false}
trace1: data: {"content":"","multimodal":false,"slot_id":0,"stop":false}
trace1: data: {"content":"","multimodal":false,"slot_id":0,"stop":false}
trace1: data: {"content":"","multimodal":false,"slot_id":0,"stop":false}
trace1: 49,"prompt_n":410,"prompt_per_second":208.4446520982496,"prompt_per_token_ms":4.797436585365854},"to
trace1: kens_cached":704,"tokens_evaluated":410,"tokens_predicted":295,"truncated":false}
trace2: 49,"prompt_n":410,"prompt_per_second":208.4446520982496,"prompt_per_token_ms":4.797436585365854},"to
@gordon-quad I took a shot in the dark based on your diagnosis - it definitely looks like there could've been a race condition. Let me know if these updates fix the issue.
It totally did, thanks!