aws-greengrass/aws-greengrass-nucleus

(Nucleus): Nucleus shuts down when no deployment available

Closed this issue · 3 comments

Describe the bug
When Nucleus runs in a Docker deployment, it exists after a while due to no deployment.

To Reproduce
Run Nucleus in Docker container and provide no deployment.

Expected behavior
We expect Nucleus container to keep running even if there is no deployment available.

Actual behavior
Nucleus exists (SIGTERM)

Environment

  • OS: Amazon Linux 2
  • JDK version: OpenJDK Runtime Environment Corretto-11.0.17.8.1 (build 11.0.17+8-LTS)
  • Nucleus version: 2.9.1

Right before Nucleus container stops

2022-11-21T17:27:05.314Z [INFO] (pool-2-thread-14) com.aws.greengrass.lifecyclemanager.GenericExternalService: generic-service-finished. Nothing done. {serviceName=main, currentState=STARTING}
2022-11-21T17:27:05.319Z [INFO] (main-lifecycle) com.aws.greengrass.lifecyclemanager.GenericExternalService: service-set-state. {serviceName=main, currentState=FINISHED, newState=STOPPING}
2022-11-21T17:27:05.327Z [INFO] (pool-2-thread-14) com.aws.greengrass.lifecyclemanager.GenericExternalService: Shutdown initiated. {serviceName=main, currentState=STOPPING}
2022-11-21T17:27:05.329Z [INFO] (pool-2-thread-14) com.aws.greengrass.lifecyclemanager.GenericExternalService: generic-service-shutdown. {serviceName=main, currentState=STOPPING}
2022-11-21T17:27:05.326Z [INFO] (aws.greengrass.Nucleus-lifecycle) com.aws.greengrass.lifecyclemanager.GenericExternalService: service-set-state. {serviceName=aws.greengrass.Nucleus, currentState=STOPPING, newState=FINISHED}
2022-11-21T17:27:05.331Z [INFO] (main-lifecycle) com.aws.greengrass.lifecyclemanager.GenericExternalService: service-set-state. {serviceName=main, currentState=STOPPING, newState=FINISHED}
2022-11-21T17:27:05.405Z [INFO] (main) com.aws.greengrass.status.FleetStatusService: fss-status-update-published. Status update published to FSS. {trigger=NUCLEUS_LAUNCH, serviceName=FleetStatusService, currentState=RUNNING}
Launched Nucleus successfully.
2022-11-21T17:27:15.673Z [INFO] (Thread-3) com.aws.greengrass.mqttclient.AwsIotMqttClient: Connecting to AWS IoT Core. {clientId=swx-pi-gateway-f533bacad8cc075a4fc4a80188b7274f}
2022-11-21T17:27:15.676Z [INFO] (Thread-3) com.aws.greengrass.mqttclient.AwsIotMqttClient: Connection purposefully interrupted. {clientId=swx-pi-gateway-f533bacad8cc075a4fc4a80188b7274f}
2022-11-21T17:27:16.660Z [INFO] (Thread-3) com.aws.greengrass.mqttclient.AwsIotMqttClient: Successfully connected to AWS IoT Core. {clientId=swx-pi-gateway-f533bacad8cc075a4fc4a80188b7274f, sessionPresent=false}
2022-11-21T17:27:18.097Z [INFO] (Thread-3) com.aws.greengrass.deployment.IotJobsHelper: No deployment job found. {ThingName=swx-pi-gateway-f533bacad8cc075a4fc4a80188b7274f}
++ echo Received SIGTERM
Received SIGTERM
++ sigterm_received=1
++ kill -TERM 24
++ wait 24
2022-11-21T17:27:20.335Z [WARN] (Thread-1) com.aws.greengrass.lifecyclemanager.Kernel: Shutting down Nucleus due to external signal. {}
2022-11-21T17:27:20.339Z [INFO] (Thread-1) com.aws.greengrass.lifecyclemanager.KernelLifecycle: system-shutdown. {main=services.main:FINISHED}
2022-11-21T17:27:20.357Z [INFO] (pool-2-thread-7) com.aws.greengrass.lifecyclemanager.GenericExternalService: service-close. Service is now closing. {serviceName=main, currentState=FINISHED}
2022-11-21T17:27:20.361Z [INFO] (pool-2-thread-12) com.aws.greengrass.lifecyclemanager.GenericExternalService: service-close. Service is now closing. {serviceName=aws.greengrass.Nucleus, currentState=FINISHED}
2022-11-21T17:27:20.363Z [INFO] (pool-2-thread-6) com.aws.greengrass.lifecyclemanager.UpdateSystemPolicyService: service-close. Service is now closing. {serviceName=UpdateSystemPolicyService, currentState=RUNNING}
2022-11-21T17:27:20.364Z [INFO] (Thread-1) com.aws.greengrass.lifecyclemanager.KernelLifecycle: Waiting for services to shutdown. {}
2022-11-21T17:27:20.361Z [INFO] (pool-2-thread-14) com.aws.greengrass.status.FleetStatusService: service-close. Service is now closing. {serviceName=FleetStatusService, currentState=RUNNING}
2022-11-21T17:27:20.367Z [INFO] (pool-2-thread-15) com.aws.greengrass.telemetry.TelemetryAgent: service-close. Service is now closing. {serviceName=TelemetryAgent, currentState=RUNNING}
2022-11-21T17:27:20.363Z [INFO] (pool-2-thread-11) com.aws.greengrass.deployment.DeploymentService: service-close. Service is now closing. {serviceName=DeploymentService, currentState=RUNNING}
2022-11-21T17:27:20.387Z [INFO] (DeploymentService-lifecycle) com.aws.greengrass.deployment.DeploymentService: service-set-state. {serviceName=DeploymentService, currentState=RUNNING, newState=STOPPING}
2022-11-21T17:27:20.388Z [INFO] (FleetStatusService-lifecycle) com.aws.greengrass.status.FleetStatusService: service-set-state. {serviceName=FleetStatusService, currentState=RUNNING, newState=STOPPING}
2022-11-21T17:27:20.392Z [INFO] (UpdateSystemPolicyService-lifecycle) com.aws.greengrass.lifecyclemanager.UpdateSystemPolicyService: service-set-state. {serviceName=UpdateSystemPolicyService, currentState=RUNNING, newState=STOPPING}
2022-11-21T17:27:20.395Z [INFO] (TelemetryAgent-lifecycle) com.aws.greengrass.telemetry.TelemetryAgent: service-set-state. {serviceName=TelemetryAgent, currentState=RUNNING, newState=STOPPING}
2022-11-21T17:27:20.398Z [INFO] (pool-2-thread-2) com.aws.greengrass.telemetry.TelemetryAgent: Cancelling all telemetry scheduled tasks.. {serviceName=TelemetryAgent, currentState=STOPPING}
2022-11-21T17:27:20.400Z [INFO] (UpdateSystemPolicyService-lifecycle) com.aws.greengrass.lifecyclemanager.UpdateSystemPolicyService: Stopping backingTask start. {serviceName=UpdateSystemPolicyService, currentState=STOPPING}
2022-11-21T17:27:20.402Z [INFO] (DeploymentService-lifecycle) com.aws.greengrass.deployment.DeploymentService: Stopping backingTask start. {serviceName=DeploymentService, currentState=STOPPING}
2022-11-21T17:27:20.405Z [WARN] (pool-2-thread-8) com.aws.greengrass.lifecyclemanager.UpdateSystemPolicyService: service-run-interrupted. Service interrupted while running startup. {serviceName=UpdateSystemPolicyService, currentState=STOPPING}
2022-11-21T17:27:20.409Z [INFO] (FleetStatusService-lifecycle) com.aws.greengrass.status.FleetStatusService: service-set-state. {serviceName=FleetStatusService, currentState=STOPPING, newState=FINISHED}
2022-11-21T17:27:20.408Z [WARN] (pool-2-thread-13) com.aws.greengrass.deployment.DeploymentService: service-run-interrupted. Service interrupted while running startup. {serviceName=DeploymentService, currentState=STOPPING}
2022-11-21T17:27:20.400Z [INFO] (TelemetryAgent-lifecycle) com.aws.greengrass.telemetry.TelemetryAgent: service-set-state. {serviceName=TelemetryAgent, currentState=STOPPING, newState=FINISHED}
2022-11-21T17:27:20.412Z [INFO] (DeploymentService-lifecycle) com.aws.greengrass.deployment.DeploymentService: service-set-state. {serviceName=DeploymentService, currentState=STOPPING, newState=FINISHED}
2022-11-21T17:27:20.420Z [INFO] (UpdateSystemPolicyService-lifecycle) com.aws.greengrass.lifecyclemanager.UpdateSystemPolicyService: service-set-state. {serviceName=UpdateSystemPolicyService, currentState=STOPPING, newState=FINISHED}
2022-11-21T17:27:20.483Z [INFO] (Thread-1) com.aws.greengrass.lifecyclemanager.Kernel: effective-config-dump-complete. {file=/greengrass/v2/config/effectiveConfig.yaml}
2022-11-21T17:27:20.492Z [WARN] (pool-2-thread-10) com.aws.greengrass.mqttclient.MqttClient: Shutting down spooler task. {}
2022-11-21T17:27:20.503Z [INFO] (Serialized listener processor) com.aws.greengrass.lifecyclemanager.KernelLifecycle: executor-service-shutdown-initiated. {}
2022-11-21T17:27:20.507Z [INFO] (Thread-1) com.aws.greengrass.lifecyclemanager.KernelLifecycle: Waiting for executors to shutdown. {}
2022-11-21T17:27:20.509Z [INFO] (Thread-1) com.aws.greengrass.lifecyclemanager.KernelLifecycle: executor-service-shutdown-complete. {executor-terminated=true, scheduled-executor-terminated=true}
2022-11-21T17:27:20.514Z [INFO] (Thread-1) com.aws.greengrass.lifecyclemanager.KernelLifecycle: context-shutdown-initiated. {}
2022-11-21T17:27:20.533Z [INFO] (Thread-1) com.aws.greengrass.lifecyclemanager.KernelLifecycle: context-shutdown-complete. {}
++ echo Killed child PID
Killed child PID
+ kernel_exit_code=143
+ echo 'Nucleus exit at code: 143'
Nucleus exit at code: 143
+ case ${kernel_exit_code} in
+ echo 'Nucleus exited 143. Retrying 1 times'
Nucleus exited 143. Retrying 1 times
+ j=2
+ '[' 2 -le 3 ']'
+ '[' 1 -eq 0 ']'
+ '[' 1 -eq 0 ']'
+ exit 143

The only error on AWS IoT Logs we can see are these:

{
  "_index": ".ds-logs-awsiotlogsv2-2022.07.06-000001",
  "_id": "zytXm4QBkmoHrQnfm-ZX",
  "_version": 1,
  "_score": 1,
  "_source": {
    "agent": {
      "name": "AWS IoT Logs"
    },
    "log": {
      "file": {
        "path": "8fe0640d-a847-4340-bca7-7601bbb0cba3_836403829680_0"
      },
      "logger": "AWSIotLogsV2"
    },
    "network": {
      "protocol": "MQTT"
    },
    "cloud": {
      "provider": "aws",
      "service": {
        "name": "aws iot"
      },
      "region": "us-east-1",
      "account": {
        "id": "836403829680"
      }
    },
    "ecs": {
      "version": "8.3.0"
    },
    "swx": {
      "mqtt": {
        "topic_name": "$aws/things/<THING_NAME_REDACTED>/shadow/name/AWSManagedGreengrassV2Deployment/get"
      }
    },
    "log.level": "error",
    "event": {
      "ingested": "2022-11-21T17:58:34.216Z",
      "created": "2022-11-21T17:58:34.216Z",
      "action": "GetThingShadow",
      "id": "f7e2677b-936d-4610-aa70-d52f4e882b49",
      "outcome": "failure"
    },
    "message": "No shadow exists with name: '<THING_NAME_REDACTED>~AWSManagedGreengrassV2Deployment'",
    "@timestamp": "2022-11-21T17:58:34.216Z",
    "swx.mqtt.device_shadow_name": "<THING_NAME_REDACTED>"
  },
  "fields": {
    "cloud.region": [
      "us-east-1"
    ],
    "message": [
      "No shadow exists with name: '<THING_NAME_REDACTED>~AWSManagedGreengrassV2Deployment'"
    ],
    "log.logger": [
      "AWSIotLogsV2"
    ],
    "swx.mqtt.topic_name": [
      "$aws/things/<THING_NAME_REDACTED>/shadow/name/AWSManagedGreengrassV2Deployment/get"
    ],
    "cloud.provider": [
      "aws"
    ],
    "event.ingested": [
      "2022-11-21T17:58:34.216Z"
    ],
    "event.action": [
      "GetThingShadow"
    ],
    "@timestamp": [
      "2022-11-21T17:58:34.216Z"
    ],
    "cloud.service.name": [
      "aws iot"
    ],
    "network.protocol": [
      "MQTT"
    ],
    "cloud.account.id": [
      "<REDACTED>"
    ],
    "ecs.version": [
      "8.3.0"
    ],
    "log.file.path": [
      "8fe0640d-a847-4340-bca7-7601bbb0cba3_836403829680_0"
    ],
    "event.created": [
      "2022-11-21T17:58:34.216Z"
    ],
    "log.level": [
      "error"
    ],
    "agent.name": [
      "AWS IoT Logs"
    ],
    "swx.mqtt.device_shadow_name": [
      "<THING_NAME_REDACTED>"
    ],
    "event.id": [
      "f7e2677b-936d-4610-aa70-d52f4e882b49"
    ],
    "event.outcome": [
      "failure"
    ]
  }
}

Greengrass does not exit when there are no deployments.

echo Received SIGTERM and Shutting down Nucleus due to external signal proves that Greengrass was asked to shutdown by you or Docker.

What is the exact command that you are using to run the Docker container? And what are you doing after starting the container?

Okay, just figured this out. We had a healthcheck in docker-compose in Greengrass V1:

    healthcheck:
      test: ["CMD-SHELL", "curl -f http://localhost:8000/2016-11-01/health/workers"]
      interval: 5s
      timeout: 10s
      retries: 5
    stop_grace_period: 20s

This seem not supported anymore in Greengrass V2. Is there any way to health check Greengrass V2 Nucleus in Docker?
Thanks

That makes sense. Glad that you found it. There is no specific health check for Greengrass V2.