Alfresco/acs-deployment

failed to deploy "alfresco/alfresco-content-services" helmchart with community_values

JuliusDigel opened this issue · 3 comments

Hey!
so im trying to deploy alfresco via helmchart to a locally managed kubernetes instance. I tried my own k3s-cluster aswell as the suggested installation route via docker-desktop https://github.com/Alfresco/acs-deployment/blob/master/docs/helm/docker-desktop-deployment.md.

In both installation routines i get these errors:

  1. Error: INSTALLATION FAILED: execution error at (alfresco-content-services/charts/alfresco-search/templates/secret-solr-jtoolopt.yml:12:51): You need to provide a shared secret for Solr/repo authentication , see https://github.com/Alfresco/acs-deployment/tree/master/docs/helm
    This one is just a outdated documentation i think since its easily fixable with adding "--set global.tracking.sharedsecret=$(openssl rand -hex 24)"

  2. coalesce.go:223: warning: destination for postgresql.persistence.storageClass is a table. Ignoring non-table value () Could it be that the value "null" under the following part of the helmchart is outdated`?

persistence:
    # -- set the storageClass to use for dynamic provisioning.
    # setting it to null means "default storageClass".
    storageClass: null
    # -- provide an existing persistent volume claim name to persist SQL data
    # Make sure the root folder has the appropriate permissions/ownhership set.
    existingClaim: null
    subPath: "alfresco-content-services/database-data"

When deploying the installation follows through but t the activemq & cs-repository containers are stuck and eventually the whole helm-installation is failing with this error message Error: INSTALLATION FAILED: release acs failed, and has been uninstalled due to atomic being set: timed out waiting for the condition

grafik

The last log-entries of the activemq container are the following:

2023-02-14 11:40:14,231 main ERROR Null object returned for RollingRandomAccessFile in Appenders.
2023-02-14 11:40:14,233 main ERROR Unable to locate appender "RollingFile" for logger config "root"
2023-02-14 11:40:14,234 main ERROR Unable to locate appender "AuditLog" for logger config "org.apache.activemq.audit"
Loading message broker from: xbean:activemq.xml 
INFO | Using Persistence Adapter: KahaDBPersistenceAdapter[/opt/activemq/data/kahadb]
INFO | Database /opt/activemq/data/kahadb/lock is locked by another server. This broker is now in slave mode waiting a lock to be acquired 

These are the full logs of the two pods which are not working properly:

acs-activemq-7db859c48-s7jzg_activemq.log

acs-alfresco-cs-repository-6d65bb74d4-s6z2t_alfresco-content-services.log

Thanks for the help!

gionn commented

Hello!

Error: INSTALLATION FAILED: execution error at (alfresco-content-services/charts/alfresco-search/templates/secret-solr-jtoolopt.yml:12:51): You need to provide a shared secret for Solr/repo authentication , see https://github.com/Alfresco/acs-deployment/tree/master/docs/helm

Indeed this a small doc issue and easy to fix.

  1. coalesce.go:223: warning: destination for postgresql.persistence.storageClass is a table. Ignoring non-table value () Could it be that the value "null" under the following part of the helmchart is outdated`?

I am not sure what trigger this but it should be just an harmless warning.

acs-activemq-7db859c48-s7jzg_activemq.log

acs-alfresco-cs-repository-6d65bb74d4-s6z2t_alfresco-content-services.log

Cannot replicate your issue, but logs show permission issues on storage, can you share more details of your cluster setup?

i am testing this on two systems:

  1. Docker-Desktop installation on a Windows10 Pro.
    EDIT: I just noticed that the problems im facing with my K3S Cluster seem to be unrelated to the ones im facing with docker-desktop. So please ignore the Docker-Desktop Issues.

  2. K3S Cluster with 3 Nodes and Longhorn as Storage-Provider. The PVC get created with these values:

State: attached
Health: healthy
Ready for workload:Ready
Conditions:
restore
scheduled
Frontend:Block Device
Attached Node & Endpoint:
k3s1
/dev/longhorn/pvc-e8195585-a6ac-493f-96b4-ba06757543e7
Size:
20 Gi
Actual Size:268 Mi
Data Locality:disabled
Access Mode:ReadWriteOnce
Engine Image:longhornio/longhorn-engine:v1.3.2
Created:2 minutes ago
Encrypted:False
Node Tags:
Disk Tags:
Last Backup:
Last Backup At:
Replicas Auto Balance:ignored
Instance Manager:
instance-manager-e-69a9a340
Namespace:alfresco
PVC Name:alfresco-volume-claim
PV Name:pvc-e8195585-a6ac-493f-96b4-ba06757543e7
PV Status:Bound
Revision Counter Disabled:False
Pod Name:acs-activemq-7db859c48-wqtpx
Pod Status:Running
Workload Name:acs-activemq-7db859c48
Workload Type:ReplicaSet
Pod Name:acs-alfresco-cs-repository-6d65bb74d4-42c9b
Pod Status:Running
Workload Name:acs-alfresco-cs-repository-6d65bb74d4
Workload Type:ReplicaSet

grafik
Is it normal that the ActiveMQ and the AlfrescoCS pods both share the same PVC?

Ist it also possible that my deployment is failing because of longhorn since ReadWriteMany(RWX) is required for the PVCs of Alfresco? And i need to create a NFS provisioner? I checked your installation Routine on the AWS Installation and it seems you are using EFS instead of EBS.

Thank you very much for the help!

Okay so the trick was separating the PVCs for each Pod as described here and generating existing PVC's manually in Longhorn (which i then assigned to the alfresco pods). Now the activemq aswell as the repostiory have its own existing PVC and the deployment works very nicely.
Thanks for the help!