Alfresco/acs-deployment

Alfresco Engine Container High Throughput

marcokonsultex opened this issue · 2 comments

Hello everybody.

Today I use Alfresco through the docker-compose.yml architecture in the following versions:

Docker Image versions

ALFRESCO_CE_TAG=7.2.0
SEARCH_CE_TAG=2.0.3
SHARE_TAG=7.2.0
ACA_TAG=2.9.0
TRANSFORM_ENGINE_TAG=2.5.7
ACTIVEMQ_TAG=5.16.1

And with Postgresql version 13.3 on another server configuration with db_pool_max at 300 in docker-compose.yml and in the Postgres server's postgresql.conf at 400.
The Alfresco Server is Ubuntu 22.04 with 98 GB RAM and 24 cores, while the Postgres server is also Ubuntu 22.04 with 26 GB RAM and 16 cores.

During a few times a day (sporadically) the alfresco container reaches 1600% CPU processing (through docker stats). The alfresco container's memory configuration in docker-compose.yml is 68 GB and even so it reaches these high CPU processing numbers.

At these moments, Postgresql begins to generate time outs in active processes running and thus the entire environment needs to be rebooted (Postgresql and Alfresco).

We have already advised development teams to no longer use CMIS and only REST API to send, search, update and delete nodes in Alfresco. There are thousands of GET/PUT requests that are received by Alfresco. All applications use only 1 user to connect to Alfresco.

In Postgresql we monitor in particular a query that executes as follows:
select
assoc.id as id,
parentNode.id as parentNodeId,
parentNode.version as parentNodeVersion,
parentStore.protocol as parentNodeProtocol,
parentStore.identifier as parentNodeIdentifier,
parentNode.uuid as parentNodeUuid,
childNode.id as childNodeId,
childNode.version as childNodeVersion,
childStore.protocol as childNodeProtocol,
childStore.identifier as childNodeIdentifier,
childNode.uuid as childNodeUuid,
assoc.type_qname_id as type_qname_id,
assoc.child_node_name_crc as child_node_name_crc,
assoc.child_node_name as child_node_name,
assoc.qname_ns_id as qname_ns_id,
assoc.qname_localname as qname_localname,
assoc.is_primary as is_primary,
assoc.assoc_index as assoc_index
from
alf_child_assoc assoc
join alf_node parentNode on (parentNode.id = assoc.parent_node_id)
join alf_store parentStore on (parentStore.id = parentNode.store_id)
join alf_node childNode on (childNode.id = assoc.child_node_id)
left join alf_store childStore on (childStore.id = childNode.store_id)
where
parentNode.id = 988

All processes in Postgresql basically run this query and it returns thousands of documents at a time (sometimes millions). This query gets stuck running for a long time in Postgresql with many deadlocks and when the processes start to turn red it shows an error:
ERROR: relation "alf_bootstrap_lock" does not exist at character 15

Has anyone come across this type of scenario? If you need more data, I will provide it without any problems.

Thanks

Hi @marcokonsultex,

At first sight, this query looks like the regular query to get children of parent node alfresco.node.select.children (again... at first sight). If such queries return thousands or even millions of nodes then you have a problem as it is well known that having too many children for a parent will result in poor performances whenever one will run "browsing" activities.
That said, this repo is more versed into deployment issue rather than performance or general product troubleshooting.
If you have an enterprise subscription you can of course raise a ticket to Alfresco support, otherwise I would rather raise that topic on the Alfresco Hub. Feel free to cross-reference you post on the hub here so one (maybe me) can take a look at it when we have time

Thanks, open the ticket in ALfresco Hub cause are Community.