Deployment to K8s only reports RPC errors trying to connect
DavidARivkin opened this issue · 7 comments
LocalAI version:
localai:latest
Environment, CPU architecture, OS, and Version:
Okteto Kubernetes on GKE
Describe the bug
When using any CURL command from the examples, one gets the following errors reported in the log and CURL does not return until it times out.
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:37513: connect: connection refused"
This error will repeat over and over and even continue after you quit (Ctl-C) curl.
The port number changes every reported error.
To Reproduce
Deply LocalAI to Okteto or any K8s using the default Helm chart.
Use curl like this: curl https://local-ai-localai.cloud.okteto.net/v1/completions -H "Content-Type: application/json" -d '{ "model": "", "prompt": "A long time ago in a galaxy far, far away", "temperature": 0.7 }'
Expected behavior
I would expect curl to return with a valid JSON response, not hang until timeout. I would not expect the errors on the pod.
Logs
local-ai-677497c7f9-qzpzb[pod-event]Successfully assigned localai/local-ai-677497c7f9-qzpzb to gke-cloud-dev-3-8749baa3-snj0
local-ai-677497c7f9-qzpzb[pod-event]Pulling image "busybox"
local-ai-677497c7f9-qzpzb[pod-event]Successfully pulled image "busybox" in 161.620421ms (161.646474ms including waiting)
local-ai-677497c7f9-qzpzb[pod-event]Created container download-model
local-ai-677497c7f9-qzpzb[pod-event]Started container download-model
local-ai-677497c7f9-qzpzbdownload-modelDownloading pytorch_model
local-ai-677497c7f9-qzpzbdownload-modelConnecting to huggingface.co (18.172.134.24:443)
local-ai-677497c7f9-qzpzbdownload-modelwget: note: TLS certificate validation not implemented
local-ai-677497c7f9-qzpzbdownload-modelsaving to '/models/pytorch_model'
local-ai-677497c7f9-qzpzbdownload-modelpytorch_model 100% |********************************| 75824 0:00:00 ETA
local-ai-677497c7f9-qzpzbdownload-model'/models/pytorch_model' saved
local-ai-677497c7f9-qzpzbdownload-modelDownload completed.
local-ai-677497c7f9-qzpzb[pod-event]Pulling image "quay.io/go-skynet/local-ai:latest"
local-ai-677497c7f9-qzpzb[pod-event]Successfully pulled image "quay.io/go-skynet/local-ai:latest" in 272.326357ms (272.344416ms including waiting)
local-ai-677497c7f9-qzpzb[pod-event]Created container local-ai
local-ai-677497c7f9-qzpzb[pod-event]Started container local-ai
local-ai-677497c7f9-qzpzblocal-ai@@@@@
local-ai-677497c7f9-qzpzblocal-aiSkipping rebuild
local-ai-677497c7f9-qzpzblocal-ai@@@@@
local-ai-677497c7f9-qzpzblocal-aiIf you are experiencing issues with the pre-compiled builds, try setting REBUILD=true
local-ai-677497c7f9-qzpzblocal-aiIf you are still experiencing issues with the build, try setting CMAKE_ARGS and disable the instructions set as needed:
local-ai-677497c7f9-qzpzblocal-aiCMAKE_ARGS="-DLLAMA_F16C=OFF -DLLAMA_AVX512=OFF -DLLAMA_AVX2=OFF -DLLAMA_FMA=OFF"
local-ai-677497c7f9-qzpzblocal-aisee the documentation at: https://localai.io/basics/build/index.html
local-ai-677497c7f9-qzpzblocal-aiNote: See also #288
local-ai-677497c7f9-qzpzblocal-ai@@@@@
local-ai-677497c7f9-qzpzblocal-aiCPU info:
local-ai-677497c7f9-qzpzblocal-aimodel name : Intel(R) Xeon(R) CPU @ 2.20GHz
local-ai-677497c7f9-qzpzblocal-aiflags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti ssbd ibrs ibpb stibp fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt arat md_clear arch_capabilities
local-ai-677497c7f9-qzpzblocal-aiCPU: AVX found OK
local-ai-677497c7f9-qzpzblocal-aiCPU: AVX2 found OK
local-ai-677497c7f9-qzpzblocal-aiCPU: no AVX512 found
local-ai-677497c7f9-qzpzblocal-ai@@@@@
local-ai-677497c7f9-qzpzblocal-ai10:13AM INF Starting LocalAI using 4 threads, with models path: /models
local-ai-677497c7f9-qzpzblocal-ai10:13AM INF LocalAI version: v1.40.0 (6ef7ea2)
local-ai-677497c7f9-qzpzblocal-ai
local-ai-677497c7f9-qzpzblocal-ai ┌───────────────────────────────────────────────────┐
local-ai-677497c7f9-qzpzblocal-ai │ Fiber v2.50.0 │
local-ai-677497c7f9-qzpzblocal-ai │ http://127.0.0.1:8080/ │
local-ai-677497c7f9-qzpzblocal-ai │ (bound on host 0.0.0.0 and port 8080) │
local-ai-677497c7f9-qzpzblocal-ai │ │
local-ai-677497c7f9-qzpzblocal-ai │ Handlers ............ 73 Processes ........... 1 │
local-ai-677497c7f9-qzpzblocal-ai │ Prefork ....... Disabled PID ................ 14 │
local-ai-677497c7f9-qzpzblocal-ai └───────────────────────────────────────────────────┘
local-ai-677497c7f9-qzpzblocal-ai
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:37409: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:45435: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:38269: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:38821: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:44161: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:37931: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:32991: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:39363: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:45439: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:37665: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:37659: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:34629: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:42527: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:44433: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:41345: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:46551: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:46161: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:43875: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:35013: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:45791: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:43513: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:44759: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:42137: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:33535: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:46495: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:35091: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:35841: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:45573: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:35061: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:35547: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:42835: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:46757: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:35015: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:33193: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:34557: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:33811: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:41561: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:38009: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:43791: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:37309: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:38995: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:46749: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:44729: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:46277: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:35875: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:43163: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:43523: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:43833: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:43769: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:37513: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:39265: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:38455: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:43853: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:45705: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:40979: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:41295: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:36323: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:35425: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:34885: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:43077: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:34759: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:32957: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:40279: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:45735: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:34531: connect: connection refused"
Additional context
exactly the same error here in EKS
Same error when running locally on CPU
localai-api-1 | CPU info:
localai-api-1 | model name : AMD A8-3870 APU with Radeon(tm) HD Graphics
localai-api-1 | flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt hw_pstate vmmcall arat npt lbrv svm_lock nrip_save pausefilter
localai-api-1 | CPU: no AVX found
localai-api-1 | CPU: no AVX2 found
localai-api-1 | CPU: no AVX512 found
localai-api-1 | @@@@@
localai-api-1 | 8:29PM INF Starting LocalAI using 6 threads, with models path: /models
localai-api-1 | 8:29PM INF LocalAI version: v1.40.0 (6ef7ea2)
localai-api-1 |
localai-api-1 | ┌───────────────────────────────────────────────────┐
localai-api-1 | │ Fiber v2.50.0 │
localai-api-1 | │ http://127.0.0.1:8080 │
localai-api-1 | │ (bound on host 0.0.0.0 and port 8080) │
localai-api-1 | │ │
localai-api-1 | │ Handlers ............ 73 Processes ........... 1 │
localai-api-1 | │ Prefork ....... Disabled PID ................ 14 │
localai-api-1 | └───────────────────────────────────────────────────┘
localai-api-1 |
localai-api-1 | rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:41535: connect: connection refused"
localai-api-1 | rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:32815: connect: connection refused"
@jischebeck / @DavidARivkin No error seen, your missing a model from your request. See the how tos for more info
⚠️ ⚠️ ⚠️ ⚠️ ⚠️
Hi! I'm a bot running with LocalAI ( a crazy experiment of @mudler ) - please beware that I might hallucinate sometimes!
but.... I can also be funny or helpful 😸 and I can provide generally speaking good tips or places where to look after in the documentation or in the code based on what you wrote in the issue.
Don't engage in conversation with me,
I don't support (yet) replying!
⚠️ ⚠️ ⚠️ ⚠️ ⚠️
ERROR: The prompt size exceeds the context window size and cannot be processed.
Sources:
My guess is that it's just a very fast connect try that fails. Due to what I observed so far it has nothing to do with K8s on any provider and functionally nothing is wrong.
If you set the environment DEBUG=true
for the localai container, the logs look like this:
8:04PM DBG Loading model bert-embeddings from bert-MiniLM-L6-v2q4_0.bin
8:04PM DBG Loading model in memory from file: /models/bert-MiniLM-L6-v2q4_0.bin
8:04PM DBG Loading GRPC Model bert-embeddings: {...}
8:04PM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/bert-embeddings
8:04PM DBG GRPC Service for bert-MiniLM-L6-v2q4_0.bin will be running at: '127.0.0.1:43361'
8:04PM DBG GRPC Service state dir: /tmp/go-processmanager2466349905
8:04PM DBG GRPC Service Started
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:43361: connect: connection refused"
8:04PM DBG GRPC(bert-MiniLM-L6-v2q4_0.bin-127.0.0.1:43361): stderr 2023/12/04 20:04:22 gRPC Server listening at 127.0.0.1:43361
8:04PM DBG GRPC Service Ready
8:04PM DBG GRPC: Loading model with options: {...}
...
As you can see the connect error is sandwiched in between grpc service started and actually ready.
@dionysius thanks for debugging this. I got the same error CONSISTENTLY. is there a way to fix this?
@jischebeck / @DavidARivkin No error seen, your missing a model from your request. See the how tos for more info
@tianzhicdev You do not have a model setup, that is what is making that not error, it just means you dont have a model setup! <3