aws-greengrass/aws-greengrass-nucleus

provisioning: multiple improvements

Closed this issue · 2 comments

Feature Description
I guess multiple improvements in provisioning:

  1. Unify automated and fleet provisioning in sync/async aspect.
    Currently automated provisioning returns from GreengrassSetup.main() when provisioning is finished. But that is not true for fleet provisioning (via plugin) and execution flow leaves main() immediately after initialization. On successful fleet provisioning Nucleus starts internal services.
    With that behaviour is hard to get a) status of provisioning b) event when provisioning is done (successfully or not)

  2. Make provisioning fail safe.
    Provisioning can be interrupted by network issues or power failures. Best practice is provisioning step-by-step like a transaction and do not repeat successfully passed steps.
    Also in all cases we should provide clear and unambiguous status of provisioning to the caller for example to external scripts. That also means provisioning of any type should be separated from running Nucleus as a service due to service running infinite.

Use Case
That is needed to make provisioning by Nucleus stable

Proposed Solution
I have no solution yet.

  • 👋 I may be able to implement this feature request
  • ⚠️ This feature might incur a breaking change

These should already be supported by implementing the provisioning plugin interface. What isn't supported by the provisioning plugin?
https://docs.aws.amazon.com/greengrass/v2/developerguide/develop-custom-provisioning-plugins.html

Hello Michael!

Issue is not related to the DeviceIdentityInterface interface.

Currently we observe different behaviour when doing provisioning 'automated' and via fleet provisioning plugin.
In case of automated provisioning going synchronously, and when execution flow returns from GreengrassSetup.main() that means provisioning is done and we can check if it finished successfully or not.
In case of fleet provisioning, execution flow returns from GreengrassSetup.main() immediately after Nucleus internals have been initialized. In that case we should add additional synchronization objects to wait for provisioning completion.

Also the result of provisioning in both cases is not reported to top level methods.

It is about how we use existing provisioning methods inside Nucleus.

And second is about failsafe aspects of provisioning.
Provisioning procedure usually contains several steps like:

  1. request to generate new keys
  2. request to create an IoT thing with the keys,
  3. check connectivity to AWS IoT Core after thing creation
    Whole procedure can be interrupted in the middle of each step or between steps due to software or hardware failures, connectivity issues, power down or cloud-size issues.
    Will be nice to restart provisioning from the last not completed step.
    For example in some cases clients have on IoT Core lambdas implements white list of hardware id allowed to provisioning and also prevents second time provisioning of the same device.