wttech/gradle-aem-plugin

Intermittent provisioning error

Closed this issue · 10 comments

* What went wrong:
Execution failed for task ':env:aem:instanceProvision'.
> Cannot perform provision step 'enableCrxDe' on LocalInstance(name='dev-author', httpUrl='http://localhost:4502')! Cause: Cannot save repository node '/var/gap/provision/step/enableCrxDe' on LocalInstance(name='dev-author', httpUrl='http://localhost:4502'). Cause: Failed request to POST http://localhost:4502/var/gap/provision/step/enableCrxDe HTTP/1.1! Cause: Repository error. Unexpected response from LocalInstance(name='dev-author', httpUrl='http://localhost:4502'): HTTP/1.1 500 Server Error
  {"referer":"","path":"/var/gap/provision/step/enableCrxDe","parentLocation":"/var/gap/provision/step","location":"/var/gap/provision/step/enableCrxDe","status.message":"org.apache.sling.api.resource.PersistenceException: Unable to commit changes to session.","title":"Error while processing /var/gap/provision/step/enableCrxDe","status.code":500,"error":{"class":"org.apache.sling.api.resource.PersistenceException","message":"Unable to commit changes to session."},"changes":[]}

once again similar error -
image

another error when doing deployment

[2022-06-02T13:59:44.696Z] Execution failed for task ':env:local:aem:instanceDeploy'.
[2022-06-02T13:59:44.696Z] > Cannot read properties of node '/etc/packages/com.acme/acme-web-frontend-package-4.26.0-SNAPSHOT.zip/jcr:content/vlt:definition' on LocalInstance(name='dev-author', httpUrl='http://localhost:4502'). Cause: Failed request to GET http://localhost:4502/etc/packages/com.acme/acme-web-frontend-package-4.26.0-SNAPSHOT.zip/jcr:content/vlt:definition.json HTTP/1.1! Cause: Repository error. Unexpected response from LocalInstance(name='dev-author', httpUrl='http://localhost:4502'): HTTP/1.1 503 Service Unavailable
[2022-06-02T13:59:44.696Z]   <html>
[2022-06-02T13:59:44.696Z]   <head>
[2022-06-02T13:59:44.696Z]   <meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
[2022-06-02T13:59:44.696Z]   <title>Error 503 AuthenticationSupport service missing. Cannot authenticate request.</title>
[2022-06-02T13:59:44.696Z]   </head>
[2022-06-02T13:59:44.696Z]   <body><h2>HT

both reading properties and checking existence could have retry mechanism to avoid such problems

initial issue seems to be fixed. instance deploy is now checking stability before.
but node existence checking need to have retry mechanism for sure and that need to be improved.

Current error, while performing backup recover to running instance:
Execution failed for task ':env:instanceUpgrade'.> Cannot check repository node existence: /var/gap/package/deploy/c11484cb on LocalInstance(name='local-publish', httpUrl='http://localhost:4503'). Cause: Failed request to HEAD http://localhost:4503/var/gap/package/deploy/c11484cb.json HTTP/1.1! Cause: Unexpected status code '503' while checking node '/var/gap/package/deploy/c11484cb' existence on LocalInstance(name='local-publish', httpUrl='http://localhost:4503')!

this line need to be protected with retry mechanism; this is caused because deployAvoidance feature which could be disabled to avoid the error for now

for the initial error, how about increasing default actionRetry from 1 to 3?

override val actionRetry = common.retry { afterSquaredSecond(aem.prop.long("instance.provision.step.actionRetry") ?: 1L) }

instance.packageManager.deployAvoidance=false as the temporary solution for Cersei project

closing as probably fixed, reopen if needed