tsurdilo/my-temporal-dockercompose

Nginx upstream timed out (110: Connection timed out) while reading response header from upstream

Closed this issue · 1 comments

I am getting below error in nginx logs
temporal-nginx

temporal-nginx          | 172.23.0.3 - - [03/May/2023:14:24:48 +0000] "POST /temporal.api.workflowservice.v1.WorkflowService/PollActivityTaskQueue HTTP/2.0" 200 5 "-" "grpc-java-netty/1.53.0"
temporal-nginx          | 2023/05/03 14:24:53 [error] 29#29: *7440 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 172.23.0.3, server: , request: "POST /temporal.api.workflowservice.v1.WorkflowService/PollWorkflowTaskQueue HTTP/2.0", upstream: "grpc://172.23.0.11:7237", host: "temporal-nginx:7233"
temporal-nginx          | 172.23.0.3 - - [03/May/2023:14:24:53 +0000] "POST /temporal.api.workflowservice.v1.WorkflowService/PollWorkflowTaskQueue HTTP/2.0" 204 0 "-" "grpc-java-netty/1.53.0"
temporal-nginx          | 172.23.0.3 - - [03/May/2023:14:24:55 +0000] "POST /temporal.api.workflowservice.v1.WorkflowService/PollWorkflowTaskQueue HTTP/2.0" 200 5 "-" "grpc-java-netty/1.53.0"
temporal-nginx          | 2023/05/03 14:24:57 [error] 29#29: *7440 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 172.23.0.3, server: , request: "POST /temporal.api.workflowservice.v1.WorkflowService/PollActivityTaskQueue HTTP/2.0", upstream: "grpc://172.23.0.11:7237", host: "temporal-nginx:7233"
temporal-nginx          | 172.23.0.3 - - [03/May/2023:14:24:57 +0000] "POST /temporal.api.workflowservice.v1.WorkflowService/PollActivityTaskQueue HTTP/2.0" 204 0 "-" "grpc-java-netty/1.53.0"
temporal-nginx          | 172.23.0.3 - - [03/May/2023:14:25:06 +0000] "POST /temporal.api.workflowservice.v1.WorkflowService/PollActivityTaskQueue HTTP/2.0" 200 5 "-" "grpc-java-netty/1.53.0"
temporal-nginx          | 2023/05/03 14:25:09 [error] 29#29: *7440 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 172.23.0.3, server: , request: "POST /temporal.api.workflowservice.v1.WorkflowService/PollWorkflowTaskQueue HTTP/2.0", upstream: "grpc://172.23.0.11:7237", host: "temporal-nginx:7233"
temporal-nginx          | 172.23.0.3 - - [03/May/2023:14:25:09 +0000] "POST /temporal.api.workflowservice.v1.WorkflowService/PollWorkflowTaskQueue HTTP/2.0" 204 0 "-" "grpc-java-netty/1.53.0"
temporal-nginx          | 172.23.0.3 - - [03/May/2023:14:25:33 +0000] "POST /temporal.api.workflowservice.v1.WorkflowService/PollActivityTaskQueue HTTP/2.0" 200 5 "-" "grpc-java-netty/1.53.0"
temporal-nginx          | 172.23.0.3 - - [03/May/2023:14:25:40 +0000] "POST /temporal.api.workflowservice.v1.WorkflowService/PollWorkflowTaskQueue HTTP/2.0" 200 5 "-" "grpc-java-netty/1.53.0"
temporal-nginx          | 172.23.0.3 - - [03/May/2023:14:25:44 +0000] "POST /temporal.api.workflowservice.v1.WorkflowService/PollActivityTaskQueue HTTP/2.0" 200 5 "-" "grpc-java-netty/1.53.0"

In spring boot I am getting
https://community.temporal.io/t/spring-boot-connected-to-temporal-sever-getting-io-grpc-statusruntimeexception-unknown-http-status-code-204/8122

Error in temporal-matching

temporal-system","lifecycle":"Started","logging-call-at":"taskQueueManager.go:292"}
temporal-matching       | {"level":"error","ts":"2023-05-03T14:43:35.777Z","msg":"unable to add metrics baggage to gRPC trailer","error":"rpc error: code = Internal desc = transport: SendHeader called multiple times","logging-call-at":"grpc.go:143","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Error\n\t/home/builder/temporal/common/log/zap_logger.go:150\ngo.temporal.io/server/common/metrics.NewServerMetricsTrailerPropagatorInterceptor.func1\n\t/home/builder/temporal/common/metrics/grpc.go:143\ngoogle.golang.org/grpc.getChainUnaryHandler.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.54.0/server.go:1164\ngo.temporal.io/server/common/metrics.NewServerMetricsContextInjectorInterceptor.func1\n\t/home/builder/temporal/common/metrics/grpc.go:66\ngoogle.golang.org/grpc.getChainUnaryHandler.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.54.0/server.go:1164\ngo.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc.UnaryServerInterceptor.func1\n\t/go/pkg/mod/go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc@v0.36.4/interceptor.go:341\ngoogle.golang.org/grpc.getChainUnaryHandler.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.54.0/server.go:1164\ngo.temporal.io/server/common/rpc.ServiceErrorInterceptor\n\t/home/builder/temporal/common/rpc/grpc.go:137\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.54.0/server.go:1155\ngo.temporal.io/server/api/matchingservice/v1._MatchingService_PollWorkflowTaskQueue_Handler\n\t/home/builder/temporal/api/matchingservice/v1/service.pb.go:379\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/pkg/mod/google.golang.org/grpc@v1.54.0/server.go:1345\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/pkg/mod/google.golang.org/grpc@v1.54.0/server.go:1722\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.2\n\t/go/pkg/mod/google.golang.org/grpc@v1.54.0/server.go:966"}
temporal-matching       | {"level":"info","ts":"2023-05-03T14:47:45.912Z","msg":"none","service":"matching","component":"matching-engine","wf-task-queue-name":"/_sys/temporal-sys-batcher-taskqueue/1","wf-task-queue-type":"Workflow","wf-namespace":"temporal-system","lifecycle":"Started","logging-call-at":"taskQueueManager.go:292"}
temporal-matching       | {"level":"info","ts":"2023-05-03T14:48:18.204Z","msg":"none","service":"matching","component":"matching-engine","wf-task-queue-name":"/_sys/default-worker-tq/1","wf-task-queue-type":"Activity","wf-namespace":"temporal-system","lifecycle":"Stopped","logging-call-at":"taskQueueManager.go:321"}
temporal-matching       | {"level":"info","ts":"2023-05-03T14

temporal-history

temporal-history        | {"level":"error","ts":"2023-05-02T19:07:41.884Z","msg":"Fail to process task","shard-id":4,"address":"172.23.0.7:7234","component":"timer-queue-processor","wf-namespace-id":"32049b68-7872-4094-8e63-d0dd59896a83","wf-id":"temporal-sys-history-scanner","wf-run-id":"92580ba9-5263-4426-a64f-bc5ab95b6a0a","queue-task-id":4196136,"queue-task-visibility-timestamp":"2023-04-20T00:00:00.896Z","queue-task-type":"WorkflowBackoffTimer","queue-task":{"NamespaceID":"32049b68-7872-4094-8e63-d0dd59896a83","WorkflowID":"temporal-sys-history-scanner","RunID":"92580ba9-5263-4426-a64f-bc5ab95b6a0a","VisibilityTimestamp":"2023-04-20T00:00:00.896857Z","TaskID":4196136,"Version":0,"WorkflowBackoffType":2},"wf-history-event-id":1,"error":"shard status unknown","lifecycle":"ProcessingFailed","logging-call-at":"lazy_logger.go:68","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Error\n\t/home/builder/temporal/common/log/zap_logger.go:150\ngo.temporal.io/server/common/log.(*lazyLogger).Error\n\t/home/builder/temporal/common/log/lazy_logger.go:68\ngo.temporal.io/server/service/history/queues.(*executableImpl).HandleErr\n\t/home/builder/temporal/service/history/queues/executable.go:321\ngo.temporal.io/server/common/tasks.(*FIFOScheduler[...]).executeTask.func1\n\t/home/builder/temporal/common/tasks/fifo_scheduler.go:232\ngo.temporal.io/server/common/backoff.ThrottleRetry.func1\n\t/home/builder/temporal/common/backoff/retry.go:175\ngo.temporal.io/server/common/backoff.ThrottleRetryContext\n\t/home/builder/temporal/common/backoff/retry.go:199\ngo.temporal.io/server/common/backoff.ThrottleRetry\n\t/home/builder/temporal/common/backoff/retry.go:176\ngo.temporal.io/server/common/tasks.(*FIFOScheduler[...]).executeTask\n\t/home/builder/temporal/common/tasks/fifo_scheduler.go:241\ngo.temporal.io/server/common/tasks.(*FIFOScheduler[...]).processTask\n\t/home/builder/temporal/common/tasks/fifo_scheduler.go:217"}
temporal-history        | {"level":"error","ts":"2023-05-02T19:07:41.885Z","msg":"Persistent fetch operation Failure","shard-id":4,"address":"172.23.0.7:7234","wf-namespace-id":"32049b68-7872-4094-8e63-d0dd59896a83","wf-id":"temporal-sys-tq-scanner","wf-run-id":"60e195ac-b37d-4525-adc1-44b8c8643401","store-operation":"get-wf-execution","error":"shard status unknown","logging-call-at":"transaction_impl.go:465","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Error\n\t/home/builder/temporal/common/log/zap_logger.go:150\ngo.temporal.io/server/service/history/workflow.getWorkflowExecution\n\t/home/builder/temporal/service/history/workflow/transaction_impl.go:465\ngo.temporal.io/server/service/history/workflow.(*ContextImpl).LoadMutableState\n\t/home/builder/temporal/service/history/workflow/context.go:271\ngo.temporal.io/server/service/history.LoadMutableStateForTask\n\t/home/builder/temporal/service/history/nDCTaskUtil.go:144\ngo.temporal.io/server/service/history.loadMutableStateForTimerTask\n\t/home/builder/temporal/service/history/nDCTaskUtil.go:125\ngo.temporal.io/server/service/history.(*timerQueueTaskExecutorBase).executeDeleteHistoryEventTask\n\t/home/builder/temporal/service/history/timerQueueTaskExecutorBase.go:110\ngo.temporal.io/server/service/history.(*timerQueueActiveTaskExecutor).Execute\n\t/home/builder/temporal/service/history/timerQueueActiveTaskExecutor.go:128\ngo.temporal.io/server/service/history/queues.(*executorWrapper).Execute\n\t/home/builder/temporal/service/history/queues/executor_wrapper.go:67\ngo.temporal.io/server/service/history/queues.(*executableImpl).Execute\n\t/home/builder/temporal/service/history/queues/executable.go:211\ngo.temporal.io/server/common/tasks.(*FIFOScheduler[...]).executeTask.func1\n\t/home/builder/temporal/common/tasks/fifo_scheduler.go:231\ngo.temporal.io/server/common/backoff.ThrottleRetry.func1\n\t/home/builder/temporal/common/backoff/retry.go:175\ngo.temporal.io/server/common/backoff.ThrottleRetryContext\n\t/home/builder/temporal/common/backoff/retry.go:199\ngo.temporal.io/server/common/backoff.ThrottleRetry\n\t/home/builder/temporal/common/backoff/retry.go:176\ngo.temporal.io/server/common/tasks.(*FIFOScheduler[...]).executeTask\n\t/home/builder/temporal/common/tasks/fifo_scheduler.go:241\ngo.temporal.io/server/common/tasks.(*FIFOScheduler[...]).processTask\n\t/home/builder/temporal/common/tasks/fifo_scheduler.go:217"}
temporal-history        | {"level":"error","ts":"2023-05-02T19:07:41.885Z","msg":"Fail to process task","shard-id":4,"address":"172.23.0.7:7234","component":"timer-queue-processor","wf-namespace-id":"32049b68-7872-4094-8e63-d0dd59896a83","wf-id":"temporal-sys-tq-scanner","wf-run-id":"60e195ac-b37d-4525-adc1-44b8c8643401","queue-task-id":4195942,"queue-task-visibility-timestamp":"2023-04-25T00:01:10.599Z","queue-task-type":"DeleteHistoryEvent","queue-task":{"NamespaceID":"32049b68-7872-4094-8e63-d0dd59896a83","WorkflowID":"temporal-sys-tq-scanner","RunID":"60e195ac-b37d-4525-adc1-44b8c8643401","VisibilityTimestamp":"2023-04-25T00:01:10.599786Z","TaskID":4195942,"Version":0,"BranchToken":"CiQ2MGUxOTVhYy1iMzdkLTQ1MjUtYWRjMS00NGI4Yzg2NDM0MDESJDcxMThlNzg0LTE5MDItNDgzOS1hODE5LTQzZDVhZGZhYWQ0ZQ==","WorkflowDataAlreadyArchived":false,"ProcessStage":0},"wf-history-event-id":1,"error":"shard status unknown","lifecycle":"ProcessingFailed","logging-call-at":"lazy_logger.go:68","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Error\n\t/home/builder/temporal/common/log/zap_logger.go:150\ngo.temporal.io/server/common/log.(*lazyLogger).Error\n\t/home/builder/temporal/common/log/lazy_logger.go:68\ngo.temporal.io/server/service/history/queues.(*executableImpl).HandleErr\n\t/home/builder/temporal/service/history/queues/executable.go:321\ngo.temporal.io/server/common/tasks.(*FIFOScheduler[...]).executeTask.func1\n\t/home/builder/temporal/common/tasks/fifo_scheduler.go:232\ngo.temporal.io/server/common/backoff.ThrottleRetry.func1\n\t/home/builder/temporal/common/backoff/retry.go:175\ngo.temporal.io/server/common/backoff.ThrottleRetryContext\n\t/home/builder/temporal/common/backoff/retry.go:199\ngo.temporal.io/server/common/backoff.ThrottleRetry\n\t/home/builder/temporal/common/backoff/retry.go:176\ngo.temporal.io/server/common/tasks.(*FIFOScheduler[...]).executeTask\n\t/home/builder/temporal/common/tasks/fifo_scheduler.go:241\ngo.temporal.io/server/common/tasks.(*FIFOScheduler[...]).processTask\n\t/home/builder/temporal/common/tasks/fifo_scheduler.go:217"}
temporal-history        | {"level":"error","ts":"2023-05-02T19:07:41.885Z","msg":"Persistent fetch operation Failure","shard-id":4,"address":"172.23.0.7:7234","wf-namespace-id":"32049b68-7872-4094-8e63-d0dd59896a83","wf-id":"temporal-sys-tq-scanner","wf-run-id":"60e195ac-b37d-4525-adc1-44b8c8643401","store-operation":"get-wf-execution","error":"shard status unknown","logging-call-at":"transaction_impl.go:465","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Error\n\t/home/builder/temporal/common/log/zap_logger.go:150\ngo.temporal.io/server/service/history/workflow.getWorkflowExecution\n\t/home/builder/temporal/service/history/workflow/transaction_impl.go:465\ngo.temporal.io/server/service/history/workflow.(*ContextImpl).LoadMutableState\n\t/home/builder/temporal/service/history/workflow/context.go:271\ngo.temporal.io/server/service/history.LoadMutableStateForTask\n\t/home/builder/temporal/service/history/nDCTaskUtil.go:144\ngo.temporal.io/server/service/history.loadMutableStateForTimerTask\n\t/home/builder/temporal/service/history/nDCTaskUtil.go:125\ngo.temporal.io/server/service/history.(*timerQueueTaskExecutorBase).executeDeleteHistoryEventTask\n\t/home/builder/temporal/service/history/timerQueueTaskExecutorBase.go:110\ngo.temporal.io/server/service/history.(*timerQueueActiveTaskExecutor).Execute\n\t/home/builder/temporal/service/history/timerQueueActiveTaskExecutor.go:128\ngo.temporal.io/server/service/history/queues.(*executorWrapper).Execute\n\t/home/builder/temporal/service/history/queues/executor_wrapper.go:67\ngo.temporal.io/server/service/history/queues.(*executableImpl).Execute\n\t/home/builder/temporal/service/history/queues/executable.go:211\ngo.temporal.io/server/common/tasks.(*FIFOScheduler[...]).executeTask.func1\n\t/home/builder/temporal/common/tasks/fifo_scheduler.go:231\ngo.temporal.io/server/common/backoff.ThrottleRetry.func1\n\t/home/builder/temporal/common/backoff/retry.go:175\ngo.temporal.io/server/common/backoff.ThrottleRetryContext\n\t/home/builder/temporal/common/backoff/retry.go:199\ngo.temporal.io/server/common/backoff.ThrottleRetry\n\t/home/builder/temporal/common/backoff/retry.go:176\ngo.temporal.io/server/common/tasks.(*FIFOScheduler[...]).executeTask\n\t/home/builder/temporal/common/tasks/fifo_scheduler.go:241\ngo.temporal.io/server/common/tasks.(*FIFOScheduler[...]).processTask\n\t/home/builder/temporal/common/tasks/fifo_scheduler.go:217"}
temporal-history        | {"level":"error","ts":"2023-05-02T19:07:41.885Z","msg":"Fail to process task","shard-id":4,"address":"172.23.0.7:7234","component":"timer-queue-processor","wf-namespace-id":"32049b68-7872-4094-8e63-d0dd59896a83","wf-id":"temporal-sys-tq-scanner","wf-run-id":"60e195ac-b37d-4525-adc1-44b8c8643401","queue-task-id":4195942,"queue-task-visibility-timestamp":"2023-04-25T00:01:10.599Z","queue-task-type":"DeleteHistoryEvent","queue-task":{"NamespaceID":"32049b68-7872-4094-8e63-d0dd59896a83","WorkflowID":"temporal-sys-tq-scanner","RunID":"60e195ac-b37d-4525-adc1-44b8c8643401","VisibilityTimestamp":"2023-04-25T00:01:10.599786Z","TaskID":4195942,"Version":0,"BranchToken":"CiQ2MGUxOTVhYy1iMzdkLTQ1MjUtYWRjMS00NGI4Yzg2NDM0MDESJDcxMThlNzg0LTE5MDItNDgzOS1hODE5LTQzZDVhZGZhYWQ0ZQ==","WorkflowDataAlreadyArchived":false,"ProcessStage":0},"wf-history-event-id":1,"error":"shard status unknown","lifecycle":"ProcessingFailed","logging-call-at":"lazy_logger.go:68","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Error\n\t/home/builder/temporal/common/log/zap_logger.go:150\ngo.temporal.io/server/common/log.(*lazyLogger).Error\n\t/home/builder/temporal/common/log/lazy_logger.go:68\ngo.temporal.io/server/service/history/queues.(*executableImpl).HandleErr\n\t/home/builder/temporal/service/history/queues/executable.go:321\ngo.temporal.io/server/common/tasks.(*FIFOScheduler[...]).executeTask.func1\n\t/home/builder/temporal/common/tasks/fifo_scheduler.go:232\ngo.temporal.io/server/common/backoff.ThrottleRetry.func1\n\t/home/builder/temporal/common/backoff/retry.go:175\ngo.temporal.io/server/common/backoff.ThrottleRetryContext\n\t/home/builder/temporal/common/backoff/retry.go:199\ngo.temporal.io/server/common/backoff.ThrottleRetry\n\t/home/builder/temporal/common/backoff/retry.go:176\ngo.temporal.io/server/common/tasks.(*FIFOScheduler[...]).executeTask\n\t/home/builder/temporal/common/tasks/fifo_scheduler.go:241\ngo.temporal.io/server/common/tasks.(*FIFOScheduler[...]).processTask\n\t/home/builder/temporal/common/tasks/fifo_scheduler.go:217"}
temporal-history        | {"level":"info","ts":"2023-05-02T19:07:41.913Z","msg":"Range updated for shardID","shard-id":4,"address":"172.23.0.7:7234","shard-range-id":7,"previous-shard-range-id":6,"number":6291463,"next-number":7340032,"logging-call-at":"context_impl.go:1188"}
temporal-history        | {"level":"info","ts":"2023-05-02T19:07:41.913Z","msg":"Acquired shard","shard-id":4,"address":"172.23.0.7:7234","logging-call-at":"context_impl.go:1912"}
temporal-history        | {"level":"info","ts":"2023-05-02T19:07:43.119Z","msg":"matching client encountered error","service":"history","error":"Not enough hosts to serve the request","service-error-type":"serviceerror.Unavailable","logging-call-at":"metric_client.go:218"}
temporal-history        | {"level":"info","ts":"2023-05-02T19:07:43.119Z","msg":"Current reachable members","service":"history","component":"service-resolver","service":"matching","addresses":["172.23.0.9:7235"],"logging-call-at":"rpServiceResolver.go:284"}
temporal-history        | {"level":"info","ts":"2023-05-02T19:07:45.585Z","msg":"Current reachable members","service":"history","component":"service-resolver","service":"frontend","addresses":["172.23.0.11:7237"],"logging-call-at":"rpServiceResolver.go:284"}
temporal-history        | {"level":"info","ts":"2023-05-02T19:07:45.985Z","msg":"Current reachable members","service":"history","component":"service-resolver","service":"frontend","addresses":["172.23.0.10:7236","172.23.0.11:7237"],"logging-call-at":"rpServiceResolver.go:284"}
temporal-history        | {"level":"info","ts":"2023-05-02T19:07:49.922Z","msg":"Current reachable members","service":"history","component":"service-resolver","service":"worker","addresses":["172.23.0.13:7239"],"logging-call-at":"rpServiceResolver.go:284"}

With ha-proxy I am getting below error in spring boot

temporal io io.grpc.StatusRuntimeException: UNAVAILABLE: HTTP status code 504

I was able to solve spring boot error after increasing timeout in ha-proxy to 75s

global
   log stdout format raw local0
   maxconn 50000

defaults
   timeout connect 10000ms
   timeout client 75s
   timeout server 75s
   timeout http-request 75s
   mode http
   maxconn 3000

frontend stats
   bind *:8404
   http-request use-service prometheus-exporter if { path /metrics }
   stats enable
   stats uri /stats
   stats refresh 10s

frontend www
   mode http
   bind :7233 proto h2
   default_backend servers

backend servers
   mode http
   balance roundrobin
   server f1 temporal-frontend:7237 proto h2
   server f2 temporal-frontend2:7236 proto h2

increase timeout
from

   timeout client 60000ms
   timeout server 60000ms

to

   timeout client 75s
   timeout server 75s

I think we can use some property in nginx too to solve this issue
Asked here https://community.temporal.io/t/spring-boot-connected-to-temporal-sever-getting-io-grpc-statusruntimeexception-unknown-http-status-code-204/8122