magneticio/vamp

deployment status failed issue

bench87 opened this issue · 23 comments

even though my container health is good and it is accepting request for health but vamp UI that contaner status is failed
2018-03-08 11 24 00
2018-03-08 11 24 36

Can you share vamp logs and blueprints? Any indication on when this happened, what triggered this, how to reproduce?

@olafmol I am investigating to reproduce

harmw commented

what are you using as endpoints for marathon and mesos? as this issue looks very familiar :)

harmw commented

as far as logging is concerned, I get a healthy gateway:

�[36m19:41:15.665�[0;39m �[34m| INFO  | i.v.o.g.GatewaySynchronizationActor      | r/govamp/gateway-synchronization-actor-9 | Gateway event: blaze-sparrow-service/gateway - deployed

Followed by lots of these:

�[0;39m�[36m19:50:22.280�[0;39m �[34m| INFO  | i.v.c.marathon.MarathonDriverActor       | amp/user/govamp/marathon-driver-actor-23 | Deploying JObject(List((container,JObject(List((docker,JObject(List((parameters,JArray(List(JObject(List((key,JString(user)), (value,JString(498:498)))), JObject(List((key,JString(log-opt)), (value,JString(tag=docker.blaze-sparrow-service)))), JObject(List((key,JString(log-driver)), (value,JString(fluentd)))), JObject(List((key,JString(label)), (value,JString(role=sparrow)))), JObject(List((key,JString(label)), (value,JString(team=Checkout)))), JObject(List((key,JString(log-opt)), (value,JString(labels=team))))))), (image,JString(wehkamp/blaze-sparrow-service:27-414d5b3)), (portMappings,JArray(List(JObject(List((containerPort,JInt(5000)), (hostPort,JInt(0)), (protocol,JString(tcp))))))), (privileged,JBool(true)), (network,JString(BRIDGE))))), (type,JString(DOCKER))))), (healthChecks,JArray(List(JObject(List((timeoutSeconds,JInt(10)), (path,JString(/status)), (portIndex,JInt(0)), (gracePeriodSeconds,JInt(30)), (maxConsecutiveFailures,JInt(3)), (intervalSeconds,JInt(60)), (protocol,JString(HTTP))))))), (labels,JObject(List((io.vamp.deployment,JString(blaze-sparrow-service)), (io.vamp.cluster,JString(blaze-sparrow-service)), (io.vamp.service,JString(blaze-sparrow-service:27-414d5b3))))), (uris,JArray(List(JString(file:///etc/.dockercfg)))), (id,JString(/govamp/deployment-blaze-sparrow-service-service-36f623682f06ebaca1dddd55b8b0066e4bd9d926)), (instances,JInt(1)), (cpus,JDouble(0.1)), (mem,JInt(500)), (env,JObject(List((LOG_JSON_TCP_HOST,JString(logging.blaze:5170)), (SERVICE_NAME,JString(blaze-sparrow-service)), (SERVICE_TAGS,JString(vamp-managed))))), (constraints,JArray(List()))))
�[0;39m�[36m19:50:32.280�[0;39m �[34m| INFO  | i.v.c.marathon.MarathonDriverActor       | amp/user/govamp/marathon-driver-actor-23 | marathon update service: blaze-sparrow-service / wehkamp/blaze-sparrow-service:27-414d5b3
�[0;39m�[36m19:50:32.280�[0;39m �[34m| INFO  | i.v.c.marathon.MarathonDriverActor       | amp/user/govamp/marathon-driver-actor-23 | Deploying JObject(List((container,JObject(List((docker,JObject(List((parameters,JArray(List(JObject(List((key,JString(user)), (value,JString(498:498)))), JObject(List((key,JString(log-opt)), (value,JString(tag=docker.blaze-sparrow-service)))), JObject(List((key,JString(log-driver)), (value,JString(fluentd)))), JObject(List((key,JString(label)), (value,JString(role=sparrow)))), JObject(List((key,JString(label)), (value,JString(team=Checkout)))), JObject(List((key,JString(log-opt)), (value,JString(labels=team))))))), (image,JString(wehkamp/blaze-sparrow-service:27-414d5b3)), (portMappings,JArray(List(JObject(List((containerPort,JInt(5000)), (hostPort,JInt(0)), (protocol,JString(tcp))))))), (privileged,JBool(true)), (network,JString(BRIDGE))))), (type,JString(DOCKER))))), (healthChecks,JArray(List(JObject(List((timeoutSeconds,JInt(10)), (path,JString(/status)), (portIndex,JInt(0)), (gracePeriodSeconds,JInt(30)), (maxConsecutiveFailures,JInt(3)), (intervalSeconds,JInt(60)), (protocol,JString(HTTP))))))), (labels,JObject(List((io.vamp.deployment,JString(blaze-sparrow-service)), (io.vamp.cluster,JString(blaze-sparrow-service)), (io.vamp.service,JString(blaze-sparrow-service:27-414d5b3))))), (uris,JArray(List(JString(file:///etc/.dockercfg)))), (id,JString(/govamp/deployment-blaze-sparrow-service-service-36f623682f06ebaca1dddd55b8b0066e4bd9d926)), (instances,JInt(1)), (cpus,JDouble(0.1)), (mem,JInt(500)), (env,JObject(List((LOG_JSON_TCP_HOST,JString(logging.blaze:5170)), (SERVICE_NAME,JString(blaze-sparrow-service)), (SERVICE_TAGS,JString(vamp-managed))))), (constraints,JArray(List()))))
�[0;39m�[36m19:50:42.279�[0;39m �[34m| INFO  | i.v.c.marathon.MarathonDriverActor       | amp/user/govamp/marathon-driver-actor-23 | marathon update service: blaze-sparrow-service / wehkamp/blaze-sparrow-service:27-414d5b3
�[0;39m�[36m19:50:42.280�[0;39m �[34m| INFO  | i.v.c.marathon.MarathonDriverActor       | amp/user/govamp/marathon-driver-actor-23 | Deploying JObject(List((container,JObject(List((docker,JObject(List((parameters,JArray(List(JObject(List((key,JString(user)), (value,JString(498:498)))), JObject(List((key,JString(log-opt)), (value,JString(tag=docker.blaze-sparrow-service)))), JObject(List((key,JString(log-driver)), (value,JString(fluentd)))), JObject(List((key,JString(label)), (value,JString(role=sparrow)))), JObject(List((key,JString(label)), (value,JString(team=Checkout)))), JObject(List((key,JString(log-opt)), (value,JString(labels=team))))))), (image,JString(wehkamp/blaze-sparrow-service:27-414d5b3)), (portMappings,JArray(List(JObject(List((containerPort,JInt(5000)), (hostPort,JInt(0)), (protocol,JString(tcp))))))), (privileged,JBool(true)), (network,JString(BRIDGE))))), (type,JString(DOCKER))))), (healthChecks,JArray(List(JObject(List((timeoutSeconds,JInt(10)), (path,JString(/status)), (portIndex,JInt(0)), (gracePeriodSeconds,JInt(30)), (maxConsecutiveFailures,JInt(3)), (intervalSeconds,JInt(60)), (protocol,JString(HTTP))))))), (labels,JObject(List((io.vamp.deployment,JString(blaze-sparrow-service)), (io.vamp.cluster,JString(blaze-sparrow-service)), (io.vamp.service,JString(blaze-sparrow-service:27-414d5b3))))), (uris,JArray(List(JString(file:///etc/.dockercfg)))), (id,JString(/govamp/deployment-blaze-sparrow-service-service-36f623682f06ebaca1dddd55b8b0066e4bd9d926)), (instances,JInt(1)), (cpus,JDouble(0.1)), (mem,JInt(500)), (env,JObject(List((LOG_JSON_TCP_HOST,JString(logging.blaze:5170)), (SERVICE_NAME,JString(blaze-sparrow-service)), (SERVICE_TAGS,JString(vamp-managed))))), (constraints,JArray(List()))))
�[0;39m�[36m19:50:52.273�[0;39m �[34m| INFO  | i.v.c.marathon.MarathonDriverActor       | amp/user/govamp/marathon-driver-actor-23 | marathon update service: blaze-sparrow-service / wehkamp/blaze-sparrow-service:27-414d5b3
�[0;39m�[36m19:50:52.274�[0;39m �[34m| INFO  | i.v.c.marathon.MarathonDriverActor       | amp/user/govamp/marathon-driver-actor-23 | Deploying JObject(List((container,JObject(List((docker,JObject(List((parameters,JArray(List(JObject(List((key,JString(user)), (value,JString(498:498)))), JObject(List((key,JString(log-opt)), (value,JString(tag=docker.blaze-sparrow-service)))), JObject(List((key,JString(log-driver)), (value,JString(fluentd)))), JObject(List((key,JString(label)), (value,JString(role=sparrow)))), JObject(List((key,JString(label)), (value,JString(team=Checkout)))), JObject(List((key,JString(log-opt)), (value,JString(labels=team))))))), (image,JString(wehkamp/blaze-sparrow-service:27-414d5b3)), (portMappings,JArray(List(JObject(List((containerPort,JInt(5000)), (hostPort,JInt(0)), (protocol,JString(tcp))))))), (privileged,JBool(true)), (network,JString(BRIDGE))))), (type,JString(DOCKER))))), (healthChecks,JArray(List(JObject(List((timeoutSeconds,JInt(10)), (path,JString(/status)), (portIndex,JInt(0)), (gracePeriodSeconds,JInt(30)), (maxConsecutiveFailures,JInt(3)), (intervalSeconds,JInt(60)), (protocol,JString(HTTP))))))), (labels,JObject(List((io.vamp.deployment,JString(blaze-sparrow-service)), (io.vamp.cluster,JString(blaze-sparrow-service)), (io.vamp.service,JString(blaze-sparrow-service:27-414d5b3))))), (uris,JArray(List(JString(file:///etc/.dockercfg)))), (id,JString(/govamp/deployment-blaze-sparrow-service-service-36f623682f06ebaca1dddd55b8b0066e4bd9d926)), (instances,JInt(1)), (cpus,JDouble(0.1)), (mem,JInt(500)), (env,JObject(List((LOG_JSON_TCP_HOST,JString(logging.blaze:5170)), (SERVICE_NAME,JString(blaze-sparrow-service)), (SERVICE_TAGS,JString(vamp-managed))))), (constraints,JArray(List()))))
�[0;39m�[36m19:51:02.273�[0;39m �[34m| INFO  | i.v.c.marathon.MarathonDriverActor       | amp/user/govamp/marathon-driver-actor-23 | marathon update service: blaze-sparrow-service / wehkamp/blaze-sparrow-service:27-414d5b3
�[0;39m�[36m19:51:02.275�[0;39m �[34m| INFO  | i.v.c.marathon.MarathonDriverActor       | amp/user/govamp/marathon-driver-actor-23 | Deploying JObject(List((container,JObject(List((docker,JObject(List((parameters,JArray(List(JObject(List((key,JString(user)), (value,JString(498:498)))), JObject(List((key,JString(log-opt)), (value,JString(tag=docker.blaze-sparrow-service)))), JObject(List((key,JString(log-driver)), (value,JString(fluentd)))), JObject(List((key,JString(label)), (value,JString(role=sparrow)))), JObject(List((key,JString(label)), (value,JString(team=Checkout)))), JObject(List((key,JString(log-opt)), (value,JString(labels=team))))))), (image,JString(wehkamp/blaze-sparrow-service:27-414d5b3)), (portMappings,JArray(List(JObject(List((containerPort,JInt(5000)), (hostPort,JInt(0)), (protocol,JString(tcp))))))), (privileged,JBool(true)), (network,JString(BRIDGE))))), (type,JString(DOCKER))))), (healthChecks,JArray(List(JObject(List((timeoutSeconds,JInt(10)), (path,JString(/status)), (portIndex,JInt(0)), (gracePeriodSeconds,JInt(30)), (maxConsecutiveFailures,JInt(3)), (intervalSeconds,JInt(60)), (protocol,JString(HTTP))))))), (labels,JObject(List((io.vamp.deployment,JString(blaze-sparrow-service)), (io.vamp.cluster,JString(blaze-sparrow-service)), (io.vamp.service,JString(blaze-sparrow-service:27-414d5b3))))), (uris,JArray(List(JString(file:///etc/.dockercfg)))), (id,JString(/govamp/deployment-blaze-sparrow-service-service-36f623682f06ebaca1dddd55b8b0066e4bd9d926)), (instances,JInt(1)), (cpus,JDouble(0.1)), (mem,JInt(500)), (env,JObject(List((LOG_JSON_TCP_HOST,JString(logging.blaze:5170)), (SERVICE_NAME,JString(blaze-sparrow-service)), (SERVICE_TAGS,JString(vamp-managed))))), (constraints,JArray(List()))))
�[0;39m�[36m19:51:12.276�[0;39m �[34m| INFO  | i.v.c.marathon.MarathonDriverActor       | amp/user/govamp/marathon-driver-actor-23 | marathon update service: blaze-sparrow-service / wehkamp/blaze-sparrow-service:27-414d5b3
�[0;39m�[36m19:51:12.276�[0;39m �[34m| INFO  | i.v.c.marathon.MarathonDriverActor       | amp/user/govamp/marathon-driver-actor-23 | Deploying JObject(List((container,JObject(List((docker,JObject(List((parameters,JArray(List(JObject(List((key,JString(user)), (value,JString(498:498)))), JObject(List((key,JString(log-opt)), (value,JString(tag=docker.blaze-sparrow-service)))), JObject(List((key,JString(log-driver)), (value,JString(fluentd)))), JObject(List((key,JString(label)), (value,JString(role=sparrow)))), JObject(List((key,JString(label)), (value,JString(team=Checkout)))), JObject(List((key,JString(log-opt)), (value,JString(labels=team))))))), (image,JString(wehkamp/blaze-sparrow-service:27-414d5b3)), (portMappings,JArray(List(JObject(List((containerPort,JInt(5000)), (hostPort,JInt(0)), (protocol,JString(tcp))))))), (privileged,JBool(true)), (network,JString(BRIDGE))))), (type,JString(DOCKER))))), (healthChecks,JArray(List(JObject(List((timeoutSeconds,JInt(10)), (path,JString(/status)), (portIndex,JInt(0)), (gracePeriodSeconds,JInt(30)), (maxConsecutiveFailures,JInt(3)), (intervalSeconds,JInt(60)), (protocol,JString(HTTP))))))), (labels,JObject(List((io.vamp.deployment,JString(blaze-sparrow-service)), (io.vamp.cluster,JString(blaze-sparrow-service)), (io.vamp.service,JString(blaze-sparrow-service:27-414d5b3))))), (uris,JArray(List(JString(file:///etc/.dockercfg)))), (id,JString(/govamp/deployment-blaze-sparrow-service-service-36f623682f06ebaca1dddd55b8b0066e4bd9d926)), (instances,JInt(1)), (cpus,JDouble(0.1)), (mem,JInt(500)), (env,JObject(List((LOG_JSON_TCP_HOST,JString(logging.blaze:5170)), (SERVICE_NAME,JString(blaze-sparrow-service)), (SERVICE_TAGS,JString(vamp-managed))))), (constraints,JArray(List()))))�

And just the following final line regarding the container:

�[0;39m�[36m19:51:22.416�[0;39m �[1;31m| ERROR | io.vamp.common.notification.Notification |                                          | Deployment service error for deployment 'blaze-sparrow-service' and service 'blaze-sparrow-service:27-414d5b3'.

@bench87 having any luck in reproducing?

harmw commented

Am I right in my understanding that the health-workflow is responsible for communication between Vamp and the orchestration engine (in our scenario Marathon/Mesos) and as such is the component to manage this? (eg. putting things on healthy/unhealthy)

hm, looks like some event with type synchronization is key here.

Health workflow is run the "health" node.JS script (stored in the health breed artifact), it only generates a "health" metric type to the event API so the UI can display this. Health workflow is thus not influencing anything in the scheduler engines, it's read-only.

it is reproducing other app and then the other app,

my event log is blow
event | elasticsearch_pulse:ERROR info | HttpClientException

11:10:50.269 | INFO  | io.vamp.pulse.ElasticsearchPulseActor    | mp/user/vamp/elasticsearch-pulse-actor-2 | Percolator successfully removed for 'stream://user/StreamSupervisor-2/flow-800851-1-actorRefSource'.
11:10:50.269 | INFO  | io.vamp.pulse.ElasticsearchPulseActor    | mp/user/vamp/elasticsearch-pulse-actor-2 | Percolator 'stream://user/StreamSupervisor-2/flow-800851-1-actorRefSource' has been registered for tags ''.
11:10:50.270 | ERROR | io.vamp.common.http.HttpClient           |                                          | rsp [HttpMethod(POST) http://192.168.190.56:9200/vamp-pulse-3e33bd3325992a28f16bce1253215b858c919245/_search] - unexpected status code: 400
11:10:50.271 | ERROR | io.vamp.common.notification.Notification |                                          | Pulse response error.
11:10:50.272 | ERROR | io.vamp.common.notification.Notification |                                          | {"error":{"root_cause":[{"type":"parsing_exception","reason":"no [query] registered for [filtered]","line":1,"col":22}],"type":"parsing_exception","reason":"no [query] registered for [filtered]","line":1,"col":22},"status":400}
io.vamp.common.http.HttpClientException: {"error":{"root_cause":[{"type":"parsing_exception","reason":"no [query] registered for [filtered]","line":1,"col":22}],"type":"parsing_exception","reason":"no [query] registered for [filtered]","line":1,"col":22},"status":400}
	at io.vamp.common.http.HttpClient.$anonfun$httpWithEntity$9(HttpClient.scala:157)
	at scala.util.Success.$anonfun$map$1(Try.scala:251)
	at scala.util.Success.map(Try.scala:209)
	at scala.concurrent.Future.$anonfun$map$1(Future.scala:287)
	at akka.http.scaladsl.util.FastFuture$FulfilledFuture.transform(FastFuture.scala:84)
	at scala.concurrent.Future.map(Future.scala:287)
	at scala.concurrent.Future.map$(Future.scala:287)
	at akka.http.scaladsl.util.FastFuture$FulfilledFuture.map(FastFuture.scala:77)
	at io.vamp.common.http.HttpClient.$anonfun$httpWithEntity$3(HttpClient.scala:153)
	at akka.stream.impl.fusing.MapAsync$$anon$24.onPush(Ops.scala:1169)
	at akka.stream.impl.fusing.GraphInterpreter.processPush(GraphInterpreter.scala:747)
	at akka.stream.impl.fusing.GraphInterpreter.execute(GraphInterpreter.scala:649)
	at akka.stream.impl.fusing.GraphInterpreterShell.runBatch(ActorGraphInterpreter.scala:471)
	at akka.stream.impl.fusing.GraphInterpreterShell.receive(ActorGraphInterpreter.scala:423)
	at akka.stream.impl.fusing.ActorGraphInterpreter.akka$stream$impl$fusing$ActorGraphInterpreter$$processEvent(ActorGraphInterpreter.scala:603)
	at akka.stream.impl.fusing.ActorGraphInterpreter$$anonfun$receive$1.applyOrElse(ActorGraphInterpreter.scala:618)
	at akka.actor.Actor.aroundReceive(Actor.scala:496)
	at akka.actor.Actor.aroundReceive$(Actor.scala:494)
	at akka.stream.impl.fusing.ActorGraphInterpreter.aroundReceive(ActorGraphInterpreter.scala:529)
	at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
	at akka.actor.ActorCell.invoke(ActorCell.scala:495)
	at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
	at akka.dispatch.Mailbox.run(Mailbox.scala:224)
	at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
	at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
	at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
	at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
	at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)
11:10:50.272 | ERROR | io.vamp.http_api.HttpApiRoute            |                                          | Request to /events could not be handled normally: {"error":{"root_cause":[{"type":"parsing_exception","reason":"no [query] registered for [filtered]","line":1,"col":22}],"type":"parsing_exception","reason":"no [query] registered for [filtered]","line":1,"col":22},"status":400}
11:14:25.464 | INFO  | io.vamp.common.akka.LogPublisherHub$     |                                          | Stopping log publisher: Actor[akka://vamp/user/StreamSupervisor-2/flow-796724-1-actorRefSource#821577879]
11:14:25.464 | INFO  | io.vamp.http_api.ws.WebSocketActor       | kka://vamp/user/vamp/web-socket-actor-25 | WebSocket session closed [5d6eed80-d9c8-44d4-9b60-15cf462c28ac]
11:14:25.464 | INFO  | io.vamp.pulse.ElasticsearchPulseActor    | mp/user/vamp/elasticsearch-pulse-actor-2 | Percolator successfully removed for 'stream://user/StreamSupervisor-2/flow-796724-1-actorRefSource'.

@olafmol
my new issued app blueprint

name: nginx-blueprint
kind: blueprint
metadata: {}
gateways: {}
clusters:
  nginx:
    metadata: {}
    services:
    - breed:
        name: nginx-a:1.0.0
        kind: breed
        metadata: {}
        deployable:
          definition: nginx:1.13.9
        ports:
          webport: 80/http
        environment_variables: {}
        constants: {}
        arguments: []
        dependencies: {}
      environment_variables: {}
      scale:
        cpu: 0.1
        memory: 32.00MB
        instances: 1
      arguments:
      - privileged: 'true'
      health_checks:
      - path: /
        port: webport
        initial_delay: 10s
        timeout: 5s
        interval: 10s
        failures: 10
        protocol: HTTP
      dialects: {}
      health:
        staged: 0
        running: 1
        healthy: 1
        unhealthy: 0
    - breed:
        name: nginx-b:1.0.0
        kind: breed
        metadata: {}
        deployable:
          definition: nginx:1.13.9
        ports:
          webport: 80/http
        environment_variables: {}
        constants: {}
        arguments: []
        dependencies: {}
      environment_variables: {}
      scale:
        cpu: 0.1
        memory: 32.00MB
        instances: 1
      arguments:
      - privileged: 'true'
      health_checks:
      - path: /
        port: webport
        initial_delay: 10s
        timeout: 5s
        interval: 10s
        failures: 10
        protocol: HTTP
      dialects: {}
      health:
        staged: 0
        running: 1
        healthy: 1
        unhealthy: 0
    gateways:
      webport:
        sticky: null
        virtual_hosts:
        - webport.nginx.nginx-a.vamp
        - vamp.nginx.dev.toss.bz
        routes:
          nginx-b:1.0.0:
            lookup_name: 34e2933d9f1189fc3a301cfcc01e19d956b8dec7
            weight: 48%
            balance: default
            condition:
              condition: hdr(X-Real-IP) -m ip 192.168.201.122
            condition_strength: 100%
            rewrites: []
          nginx-a:1.0.0:
            lookup_name: 44a00980d450a4e9f026dafef48d83bd13f19dde
            weight: 52%
            balance: default
            condition:
              condition: hdr(X-Real-IP) -m ip 192.168.201.121
            condition_strength: 100%
            rewrites: []
    dialects: {}
environment_variables: {}
dialects: {}

Hi @bench87 what version of Elasticsearch are you running and using?

I think health workflow is dead...

@olafmol my elasticsearch 5.5

@bench87 it seems some incompatibility between the workflow script and ES 5.5 is happening. It should be compatible with ES 5.x though. We're investigating. In the meantime you could use ES 2.x (we provide a container: https://vamp.io/documentation/installation/v0.9.5/dcos/#step-1-install-elasticsearch)

@olafmol even though I use es 2.x, it reproduced..

Ok. Can you check the status of ES? Is all "green" and does it have enough storage space to store indexes?

image
image
image
image
I think this is not related to ES. It seems health check in background works well but UI doesn't show the right data.

belows are logs from health workflow.
2018/03/13 12:08:32.923979 run.go:124: WORKFLOW - health: [["deployments:nginx-a","clusters:nginx","service","services:nginx-b:1.0.0","health"]] - 1
2018/03/13 12:08:32.923999 run.go:124: WORKFLOW - API PUT /events {"tags":["deployments:nginx-a","clusters:nginx","service","services:nginx-b:1.0.0","health"],"type":"health"}
2018/03/13 12:08:32.938849 run.go:124: WORKFLOW - health: [["gateways:nginx-a/nginx/webport","route","routes:nginx-a/nginx/nginx-b:1.0.0/webport","health"]] - 1
2018/03/13 12:08:32.938870 run.go:124: WORKFLOW - API PUT /events {"tags":["gateways:nginx-a/nginx/webport","route","routes:nginx-a/nginx/nginx-b:1.0.0/webport","health"],"type":"health"}

Can you check the following:

  • what do the logs of the health workflow container say?
  • what happens when you change the scale of the service through the Vamp UI?

tailLog.txt
This is the json log last 10000 lines from the health workflow container. When I get this log, there is no new log. I guess that health workflow hung.
I counl't change the scale of the service because vamp didn't response to UI request. I wiped out vamp and started again.
When I meet this situation, then I try to change the scale.

@olafmol Vamp bootstrap container process have huge IO waiting...

vamp is getting slower with huge IO waiting, I run 14 apps with vamp

Hi @bench87 can you give us some more details, we're trying to reproduce but without success. tnx!

@olafmol sure, but is is not this issue hm.... I will reproduce IO issue and then create new issue

maybe this log help you to reproduce this issue

[36m02:19:44.665�[0;39m �[1;31m| ERROR | io.vamp.common.notification.Notification |                                          | Ask timed out on [Actor[akka://vamp/user/vamp/gateway-driver-actor-3#-642274261]] after [5000 ms]. Sender[Actor[akka://vamp/user/vamp/gateway-synchronization-actor-12#429980743]] sent message of type "io.vamp.gateway_driver.GatewayDriverActor$Pull$".
�[0;39makka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://vamp/user/vamp/gateway-driver-actor-3#-642274261]] after [5000 ms]. Sender[Actor[akka://vamp/user/vamp/gateway-synchronization-actor-12#429980743]] sent message of type "io.vamp.gateway_driver.GatewayDriverActor$Pull$".
	at akka.pattern.PromiseActorRef$.$anonfun$apply$1(AskSupport.scala:604)
	at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
	at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:864)
	at scala.concurrent.BatchingExecutor.execute(BatchingExecutor.scala:109)
	at scala.concurrent.BatchingExecutor.execute$(BatchingExecutor.scala:103)
	at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:862)
	at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)
	at akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)
	at akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)
	at akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)
	at java.lang.Thread.run(Thread.java:745)
�[36m02:20:15.451�[0;39m �[1;31m| ERROR | io.vamp.http_api.HttpApiRoute            |                                          | Request to http://10.20.0.100:8080/vamp/api/v1/events could not be handled normally: Ask timed out on [Actor[akka://vamp/user/vamp/elasticsearch-pulse-actor-2#1184752261]] after [10000 ms]. Sender[null] sent message of type "io.vamp.pulse.PulseActor$Publish".
�[0;39m�[36m02:20:15.491�[0;39m �[1;31m| ERROR | io.vamp.http_api.HttpApiRoute            |                                          | Request to http://10.20.0.100:8080/vamp/api/v1/events could not be handled normally: Ask timed out on [Actor[akka://vamp/user/vamp/elasticsearch-pulse-actor-2#1184752261]] after [10000 ms]. Sender[null] sent message of type "io.vamp.pulse.PulseActor$Publish".
�[0;39m�[36m02:20:16.115�[0;39m �[1;31m| ERROR | io.vamp.http_api.HttpApiRoute            |                                          | Request to http://10.20.0.100:8080/vamp/api/v1/events could not be handled normally: Ask timed out on [Actor[akka://vamp/user/vamp/elasticsearch-pulse-actor-2#1184752261]] after [10000 ms]. Sender[null] sent message of type "io.vamp.pulse.PulseActor$Publish".
�[0;39m�[36m02:20:15.460�[0;39m �[1;31m| ERROR | io.vamp.http_api.HttpApiRoute            |                                          | Request to http://10.20.0.100:8080/vamp/api/v1/events could not be handled normally: Ask timed out on [Actor[akka://vamp/user/vamp/elasticsearch-pulse-actor-2#1184752261]] after [10000 ms]. Sender[null] sent message of type "io.vamp.pulse.PulseActor$Publish".
�[0;39m�[36m02:20:16.115�[0;39m �[1;31m| ERROR | io.vamp.http_api.HttpApiRoute            |                                          | Request to http://10.20.0.100:8080/vamp/api/v1/events could not be handled normally: Ask timed out on [Actor[akka://vamp/user/vamp/elasticsearch-pulse-actor-2#1184752261]] after [10000 ms]. Sender[null] sent message of type "io.vamp.pulse.PulseActor$Publish".
�[0;39m�[36m02:20:15.471�[0;39m �[1;31m| ERROR | io.vamp.http_api.HttpApiRoute            |                                          | Request to http://10.20.0.100:8080/vamp/api/v1/events could not be handled normally: Ask timed out on [Actor[akka://vamp/user/vamp/elasticsearch-pulse-actor-2#1184752261]] after [10000 ms]. Sender[null] sent message of type "io.vamp.pulse.PulseActor$Publish".
�[0;39m�[36m02:20:15.461�[0;39m �[1;31m| ERROR | io.vamp.http_api.HttpApiRoute            |                                          | Request to http://10.20.0.100:8080/vamp/api/v1/events could not be handled normally: Ask timed out on [Actor[akka://vamp/user/vamp/elasticsearch-pulse-actor-2#1184752261]] after [10000 ms]. Sender[null] sent message of type "io.vamp.pulse.PulseActor$Publish".
�```

closing this due to lack of reproducible actions. Please re-open of create a new issue if more info is available.