keldenl/gpt-llama.cpp

no response message with Readable Stream: CLOSED

lzbeefnoodle opened this issue · 2 comments

Really appreciate if anyone can shed light here. Seems requests have been received but response could not be seen, and got Readable Stream Closed on server console.

【test installation terminal】
D:\github\gpt-llama.cpp>sh test-installation.sh
--GPT-LLAMA.CPP TEST INSTALLATION SCRIPT LAUNCHED--
PLEASE MAKE SURE THAT A LOCAL GPT-LLAMA.CPP SERVER IS STARTED. OPEN A SEPARATE TERMINAL WINDOW START IT.\n
What port is your server running on? (press enter for default 443 port): 8000
Please drag and drop the location of your Llama-based Model (.bin) here and press enter: ../llama.cpp/models/ggml-alpaca-7b-q4.bin

<if terminate the server, got message below>

% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 266 0 0 0 266 0 0 --:--:-- 0:30:05 --:--:-- 0
curl: (56) Recv failure: Connection was reset

--RESPONSE--

--RESULTS--
Curl command was successful!
To use any app with gpt-llama.cpp, please provide the following as the OPENAI_API_KEY:
../llama.cpp/models/ggml-alpaca-7b-q4.bin

【server console】
D:\github\gpt-llama.cpp>set PORT=8000
D:\github\gpt-llama.cpp>npm start
...

gpt-llama.cpp@0.2.6 start
node index.js
...
REQUEST RECEIVED
PROCESSING NEXT REQUEST FOR /v1/chat/completions
LLAMA.CPP DETECTED

===== CHAT COMPLETION REQUEST =====

ALPACA MODEL DETECTED. LOADING ALPACA ENGINE...
{}

===== LLAMA.CPP SPAWNED =====
..\llama.cpp\main -m ..\llama.cpp\models\ggml-alpaca-7b-q4.bin --temp 0.7 --n_predict 1000 --top_p 0.1 --top_k 40 -c 2048 --seed -1 --repeat_penalty 1.1764705882352942 --reverse-prompt user: --reverse-prompt
user --reverse-prompt system: --reverse-prompt
system --reverse-prompt

--reverse-prompt ## --reverse-prompt

--reverse-prompt ### -i -p Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

Instruction

Complete the following chat conversation between the user and the assistant. system messages should be strictly followed as additional instructions.
)

Inputs

system: You are a helpful assistant.
user: How are you?
assistant: Hi, how may I help you today?
system: You are ChatGPT, a helpful assistant developed by OpenAI.
user: How are you doing today?

Response

===== REQUEST =====
user: How are you doing today?
===== PROCESSING PROMPT... =====
===== PROCESSING PROMPT... =====
===== PROCESSING PROMPT... =====
......
===== PROCESSING PROMPT... =====

===== STDERR =====
stderr Readable Stream: CLOSED
done
llama_model_load: model size = 4017.27 MB / num tensors = 291

system_info: n_threads = 4 / 12 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
sampling parameters: temp = 0.700000, top_k = 40, top_p = 0.100000, repeat_last_n = 64, repeat_penalty = 1.176471

[end of text]

main: mem per token = 14368644 bytes
main: load time = 1305.05 ms
main: sample time = 42.33 ms
main: predict time = 63007.42 ms / 277.57 ms per token
main: total time = 65483.43 ms

Readable Stream: CLOSED

PROCESS COMPLETE

BTW, got more error message when testing with API docs,

  1. get
    TypeError: Failed to execute 'fetch' on 'Window': Request with GET/HEAD method cannot have body.

  2. Embedding
    Failed to fetch.
    Possible Reasons:
    CORS
    Network Failure
    URL scheme must be "http" or "https" for CORS request.
    --

It relates to the model files, need to download and quantize the models.