aws-greengrass/aws-greengrass-nucleus

(nucleus): greengrass system service cannot launch

Closed this issue · 3 comments

Describe the bug
The greengrass system service stuck in an activating state and kept exiting.

To Reproduce

  1. Create a deployment that has only one component (zipped artifact 20MB ) and that component only has Install script to run.
  2. After the deployment is sent to an on-premise device but before it starts to execute bootstraps, I ran the following commands sudo systemctl restart greengrass.service and sudo systemctl stop greengrass.service 3 times.
  3. The greengrass system service stuck in an activating state.
burro@burro-8-883:~$ sudo systemctl status greengrass.service
● greengrass.service - Greengrass Core
     Loaded: loaded (/etc/systemd/system/greengrass.service; enabled; vendor preset: enabled)
     Active: activating (auto-restart) (Result: exit-code) since Tue 2022-10-18 18:25:27 EDT; 1s ago
    Process: 13146 ExecStart=/bin/sh /greengrass/v2/alts/current/distro/bin/loader (code=exited, status=127)
   Main PID: 13146 (code=exited, status=127)
  1. After rebooting, the greengrass system service still cannot launch.
  2. In the end, I need to follow the instrruction to uninstall the greengrass service, delete the greengrass IOT device/core device on the cloud, then reinstall the greengrass service back.

Expected behavior
The greengrass service should be alive after rebooting the computer

Actual behavior
The greengrass system service is stuck in an activating state

Environment

  • OS: Ubuntu 20.04
  • JDK version: 8
  • Nucleus version: probably 2.5.x ( not sure )

Additional context
The log I was looking into what happened to the greengrass service

burro@burro-8-883:~$ sudo tail -50 /greengrass/v2/logs/greengrass.log
2022-10-18T22:05:08.770Z [INFO] (TelemetryAgent-lifecycle) com.aws.greengrass.telemetry.TelemetryAgent: service-set-state. {serviceName=TelemetryAgent, currentState=STOPPING, newState=FINISHED}
2022-10-18T22:05:09.631Z [INFO] (DeploymentService-lifecycle) com.aws.greengrass.deployment.DeploymentService: Stopping backingTask start. {serviceName=DeploymentService, currentState=STOPPING}
2022-10-18T22:05:09.631Z [INFO] (DeploymentService-lifecycle) com.aws.greengrass.deployment.DeploymentService: service-set-state. {serviceName=DeploymentService, currentState=STOPPING, newState=FINISHED}
2022-10-18T22:05:09.631Z [WARN] (pool-2-thread-14) com.aws.greengrass.deployment.DeploymentService: service-run-interrupted. Service interrupted while running startup. {serviceName=DeploymentService, currentState=STOPPING}
2022-10-18T22:05:09.640Z [INFO] (Serialized listener processor) com.aws.greengrass.lifecyclemanager.GenericExternalService: service-config-change. Requesting restart for component. {configNode=services.burro-configurator.lifecycle.Bootstrap.Script, serviceName=burro-configurator, currentState=FINISHED}
2022-10-18T22:05:09.640Z [INFO] (pool-2-thread-4) com.aws.greengrass.deployment.DeploymentDirectoryManager: Persist configuration snapshot. {file=/greengrass/v2/deployments/50d5cf45-d057-4c58-8292-9b5e5fcbe796/target_config.tlog}
2022-10-18T22:05:09.643Z [INFO] (Serialized listener processor) com.aws.greengrass.lifecyclemanager.GenericExternalService: service-config-change. Requesting reinstallation for component. {configNode=services.burro-configurator.version, serviceName=burro-configurator, currentState=FINISHED}
2022-10-18T22:05:09.646Z [INFO] (Serialized listener processor) com.aws.greengrass.lifecyclemanager.GenericExternalService: service-config-change. Requesting restart for component. {configNode=services.main.lifecycle, serviceName=main, currentState=FINISHED}
2022-10-18T22:05:09.663Z [INFO] (pool-2-thread-4) com.aws.greengrass.deployment.bootstrap.BootstrapManager: Saving bootstrap task list to file. {filePath=/greengrass/v2/deployments/50d5cf45-d057-4c58-8292-9b5e5fcbe796/bootstrap_task.json}
2022-10-18T22:05:09.667Z [INFO] (pool-2-thread-4) com.aws.greengrass.deployment.bootstrap.BootstrapManager: Bootstrap task list is saved to file. {filePath=/greengrass/v2/deployments/50d5cf45-d057-4c58-8292-9b5e5fcbe796/bootstrap_task.json}
2022-10-18T22:05:09.667Z [INFO] (pool-2-thread-4) com.aws.greengrass.lifecyclemanager.KernelAlternatives: Setting up launch directory for new Nucleus. {}
2022-10-18T22:05:09.669Z [INFO] (pool-2-thread-4) com.aws.greengrass.lifecyclemanager.KernelAlternatives: Finished setup of launch directory for new Nucleus. {}
2022-10-18T22:05:09.669Z [INFO] (pool-2-thread-4) com.aws.greengrass.deployment.bootstrap.BootstrapManager: Execute component bootstrap step. {componentName=burro-configurator}
2022-10-18T22:05:27.085Z [WARN] (Thread-1) com.aws.greengrass.lifecyclemanager.Kernel: Shutting down Nucleus due to external signal. {}
2022-10-18T22:05:27.085Z [INFO] (Thread-1) com.aws.greengrass.lifecyclemanager.KernelLifecycle: system-shutdown. {main=services.main:FINISHED}
2022-10-18T22:05:27.087Z [INFO] (pool-2-thread-4) com.aws.greengrass.deployment.bootstrap.BootstrapManager: Saving bootstrap task list to file. {filePath=/greengrass/v2/deployments/50d5cf45-d057-4c58-8292-9b5e5fcbe796/bootstrap_task.json}
2022-10-18T22:05:27.097Z [INFO] (pool-2-thread-4) com.aws.greengrass.deployment.bootstrap.BootstrapManager: Bootstrap task list is saved to file. {filePath=/greengrass/v2/deployments/50d5cf45-d057-4c58-8292-9b5e5fcbe796/bootstrap_task.json}
2022-10-18T22:05:27.097Z [INFO] (Thread-1) com.aws.greengrass.lifecyclemanager.Kernel: effective-config-dump-complete. {file=/greengrass/v2/config/effectiveConfig.yaml}
2022-10-18T22:05:27.097Z [INFO] (pool-2-thread-20) com.aws.greengrass.status.FleetStatusService: service-close. Service is now closing. {serviceName=FleetStatusService, currentState=FINISHED}
2022-10-18T22:05:27.097Z [INFO] (pool-2-thread-9) com.aws.greengrass.lifecyclemanager.UpdateSystemPolicyService: service-close. Service is now closing. {serviceName=UpdateSystemPolicyService, currentState=FINISHED}
2022-10-18T22:05:27.097Z [INFO] (pool-2-thread-14) com.aws.greengrass.lifecyclemanager.GenericExternalService: service-close. Service is now closing. {serviceName=aws.greengrass.Nucleus, currentState=FINISHED}
2022-10-18T22:05:27.097Z [INFO] (pool-2-thread-16) com.aws.greengrass.lifecyclemanager.GenericExternalService: service-close. Service is now closing. {serviceName=main, currentState=FINISHED}
2022-10-18T22:05:27.097Z [INFO] (pool-2-thread-2) com.aws.greengrass.deployment.DeploymentService: service-close. Service is now closing. {serviceName=DeploymentService, currentState=FINISHED}
2022-10-18T22:05:27.097Z [INFO] (Thread-1) com.aws.greengrass.lifecyclemanager.KernelLifecycle: Waiting for services to shutdown. {}
2022-10-18T22:05:27.097Z [INFO] (pool-2-thread-9) com.aws.greengrass.lifecyclemanager.GenericExternalService: service-close. Service is now closing. {serviceName=burro-configurator, currentState=FINISHED}
2022-10-18T22:05:27.097Z [INFO] (pool-2-thread-14) com.aws.greengrass.telemetry.TelemetryAgent: service-close. Service is now closing. {serviceName=TelemetryAgent, currentState=FINISHED}
2022-10-18T22:05:27.097Z [INFO] (pool-2-thread-4) com.aws.greengrass.deployment.activator.DeploymentActivator: merge-config. Rolling back failed deployment. {deploymentId=50d5cf45-d057-4c58-8292-9b5e5fcbe796}
com.aws.greengrass.deployment.exceptions.ServiceUpdateException: Fail to execute bootstrap step for burro-configurator, exit code: 143
	at com.aws.greengrass.deployment.bootstrap.BootstrapManager.executeAllBootstrapTasksSequentially(BootstrapManager.java:435)
	at com.aws.greengrass.deployment.activator.KernelUpdateActivator.activate(KernelUpdateActivator.java:84)
	at com.aws.greengrass.deployment.DeploymentConfigMerger.updateActionForDeployment(DeploymentConfigMerger.java:150)
	at com.aws.greengrass.deployment.DeploymentConfigMerger.lambda$mergeInNewConfig$0(DeploymentConfigMerger.java:102)
	at com.aws.greengrass.lifecyclemanager.UpdateSystemPolicyService.runUpdateActions(UpdateSystemPolicyService.java:95)
	at com.aws.greengrass.lifecyclemanager.UpdateSystemPolicyService.lambda$startup$0(UpdateSystemPolicyService.java:165)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)

2022-10-18T22:05:27.097Z [INFO] (pool-2-thread-4) com.aws.greengrass.deployment.DeploymentDirectoryManager: Persist deployment metadata. {file=/greengrass/v2/deployments/50d5cf45-d057-4c58-8292-9b5e5fcbe796/deployment_metadata.json, deploymentId=50d5cf45-d057-4c58-8292-9b5e5fcbe796}
2022-10-18T22:05:27.098Z [WARN] (FleetStatusService-lifecycle) com.aws.greengrass.status.FleetStatusService: service-state-transition-interrupted. Service lifecycle thread interrupted. Thread will exit now. {serviceName=FleetStatusService, currentState=FINISHED}
2022-10-18T22:05:27.098Z [WARN] (pool-2-thread-17) com.aws.greengrass.mqttclient.MqttClient: Shutting down spooler task. {}
2022-10-18T22:05:27.098Z [INFO] (pool-2-thread-5) com.aws.greengrass.deployment.DeploymentService: deployment-task-execution. Cancelled deployment merge future due to interrupt, update may not get cancelled if it is already being applied. {deploymentId=50d5cf45-d057-4c58-8292-9b5e5fcbe796, serviceName=DeploymentService, currentState=FINISHED}
2022-10-18T22:05:27.099Z [INFO] (Serialized listener processor) com.aws.greengrass.lifecyclemanager.KernelLifecycle: executor-service-shutdown-initiated. {}
2022-10-18T22:05:27.099Z [INFO] (Thread-1) com.aws.greengrass.lifecyclemanager.KernelLifecycle: Waiting for executors to shutdown. {}
2022-10-18T22:05:27.099Z [INFO] (Thread-1) com.aws.greengrass.lifecyclemanager.KernelLifecycle: executor-service-shutdown-complete. {executor-terminated=false, scheduled-executor-terminated=true}
2022-10-18T22:05:27.099Z [INFO] (Thread-1) com.aws.greengrass.lifecyclemanager.KernelLifecycle: context-shutdown-initiated. {}
2022-10-18T22:05:27.102Z [INFO] (Thread-3) com.aws.greengrass.mqttclient.AwsIotMqttClient: Connection interrupted. {clientId=burro-8-883}
2022-10-18T22:05:27.102Z [INFO] (Thread-1) com.aws.greengrass.lifecyclemanager.KernelLifecycle: context-shutdown-complete. {}

I then checked why the ExecStart script kept failing, but the /greengrass/v2/alts/current symlink is gone

burro@burro-8-883:~$ /bin/sh /greengrass/v2/alts/current/distro/bin/loader
/bin/sh: 0: Can't open /greengrass/v2/alts/current/distro/bin/loader
burro@burro-8-883:~$ sudo su
root@burro-8-883:/home/burro# /bin/sh /greengrass/v2/alts/
50d5cf45-d057-4c58-8292-9b5e5fcbe796/ fe7fed58-cc49-4b27-8c96-51dd1cef1fef/ 
broken/                               old/    

The impact of the bug is that I need to manually remove the greengrass system service, delete cloud agents and install the greengrass system service back.

E.g. what is the impact of the bug?

Hello @DeboBurro,

Could you share the full greengrass log?

From what you shared, I can see the deployment was executing a bootstrap step when it receives external shutdown signal. Because of that, deployment would fail after Nucleus restart.

With more log I can understand better what you mean by greengrass system service stuck in an activating state.

Hi @junfuchen99 , I don't have the log anymore since I already reinstalled the greengrass service entirely.
The Nucleus was trying to activate but failed to activate immediately, I think it's because the /greengrass/v2/alts/current symbolic link is removed somehow during the shutdown process.
We can reopen this issue next time I encounter again.

Thanks, please reach out to us if you run into this issue again.