codeclou/docker-atlassian-confluence-data-center

How to increase heap space of nodes?

ataraxie opened this issue · 16 comments

Hi and first of all thank you for this amazing piece of work! I hope it's ok that I created an issue for my question.

I have been trying to increase the -Xmx and -Xms args for Confluence nodes without success. Is there a simple way to increase the heap space?

Things I've tried:

  • set env var START_CONFLUENCE_JAVA_OPTS. This gets overridden by CATALINA_OPTS
  • create my own Docker image and some sed -i in CMD directive. Clustering doesn't work anymore (new nodes are not recognized in Confluence Admin)
  • other messing around

hi,

Should be possible via docker-entrypoint: https://github.com/codeclou/docker-atlassian-confluence-data-center/blob/confluencenode-6.15.3/docker-entrypoint.sh

-Xms1024m -Xmx1024m -XX:+UseG1GC se omewhere around lin44 you could add yourself a custom line to patch set-env.sh accordingly.

Secondly you must provide the docker daemon itself with enough memory.
There is also some special --shm-size="" variable that could be of use.

cheers

Hi @clouless - I wanted to thank you for your reply. I had a follow-up question but it's totally fine if you don't reply because I don't want to steal much more of your time.

I added the options to docker-entrypoint.sh with sed -i. I'm not a Docker pro. What I did is I then created a new image and pushed it to DockerHub as fgrund/docker-atlassian-confluence-data-center. Then, I edited the manage...sh script to use my image instead of yours.

I was just wondering if there's an easier way? This way felt like quite some pain.

P.S. I also created a Selenium procedure that does the full Confluence setup after starting the Docker container. I'm just pressing a button and all the clicking that you describe in your manual is done automatically. I'm happy to share the code if you're interested.

Hi @ataraxie,

Short: Yes :)

You can always use existing docker images and simply extend from them.
Xor what is even better, simply overwrite the docker-entrypoint in your docker run ...
statements.

So you would only have to do:

(1) Make a local copy of the manage...sh thingy
(2) Put a copy of the docker entrypoint e.g. to /Users/clouless/entry/jiranode-entrypoint.sh
(3) Patch the /Users/clouless/entry/jiranode-entrypoint.sh to your needs.
(4) Patch the manage...sh so that docker run statements override the entrypoint:

    chmod +x /Users/clouless/entry/jiranode-entrypoint.sh

    docker run \
        --rm \
        --name confluence-cluster-${CONFLUENCE_VERSION_DOT_FREE}-node${1} \
        --net=confluence-cluster-${CONFLUENCE_VERSION_DOT_FREE} \
        --net-alias=confluence-cluster-${CONFLUENCE_VERSION_DOT_FREE}-node${1} \
        -v /Users/clouless/entry/:/entry \
        --entrypoint="/entry/jiranode-entrypoint.sh" \
        --env NODE_NUMBER=${1} \
        -v confluence-shared-home-${CONFLUENCE_VERSION_DOT_FREE}:/confluence-shared-home \
        -d codeclou/docker-atlassian-confluence-data-center:confluencenode-${CONFLUENCE_VERSION}

What did I do here? Basically two things:

(a) with -v /Users/clouless/entry/:/entry I "injected" the local directory /User/clouless/entry into the container under /entry so that your patched docker-entrypoint.sh is accessible from inside the container
(b) with --entrypoint="/entry/jiranode-entrypoint.sh" you overwrite the default entrypoint and your patched one will be used.


You can test this with a simpler example:

mkdir -p /Users/clouless/entry

touch /Users/clouless/entry/simple-entry.sh
chmod +x /Users/clouless/entry/simple-entry.sh
vim /Users/clouless/entry/simple-entry.sh

Put this stuff in the file

#!/bin/bash

set -e

umask u+rxw,g+rwx,o-rwx

echo "It'se me docker entrypoint ey"

exec "$@"

Now you can run

docker run -i -t \
     -v /Users/clouless/entry/:/entry \
     --entrypoint="/entry/simple-entry.sh" \
     ubuntu:18.04 \
     bash

It will look like so:

image

So basically, if you use an entrypoint right, with exec "$@" at the end, it will be
executed BEFORE the actual command in our case bash is executed.
And "before" is more like the entrypoint will be executed and gets the command passed and executes it once done with exec "$@"


So TL;DR:

  • Create your own entrypoint shell script
  • mount a volume where the entrypoint shell script sits
  • use the --entrypoint to override the default entrypoint.

And for the RAM question before, you could simply add this to your entrypoint

sed -i 's/-Xms1024m -Xmx1024m/-Xms2048m -Xmx4096m/' /confluence/atlassian-confluence-latest/bin/setenv.sh

Hope that helps you out a little :)

Wow, that's awesome!! The sed command is almost exactly what I have. I just didn't get that I can run the existing image with a different entrypoint. Thanks for the lecture! I had actually already created the entrypoint script and used the --entrypoint flag to docker run. But I was missing the mount! Thank you! I love Docker.

Do you think it would make sense generally to automate the setup procedure after the container has started? i.e. so no clicking is needed and after docker run the user logs in as admin/admin and sees a start space?

nice to hear :)

Yeah the clicking stuff is annoying and on the JIRA version of the scripts at least the database.xml file exists and saves some steps. I did not want to invest any more time into this specific topic,
since it would always only work for one exact version. If Jira changes stuff (happens a lot) you will have more todo keeping the stuff working.

What you could do is:
(1) Patch the docker run ... and mount a volume for /jira-home to make it persistent
(2) Do clickthrough the wizard to set it fully up.
(3) Backup (tar/zip) the home dir

Same for the database. Back it up somehow after full installation.

Then you could patch your manage script to "restore" the jira-home and database before starting up the node. Basically that would work.

good luck :)

Misunderstanding. I already have a Selenium script that automates all the setup procedure. My question was just if you thought this would be interesting for the general public from your point of view. I'm not really sure if a virtual/Selenium user clicking through the setup is the best approach but it definitely works.

ok nice :)
Thanks for the offer, but this what I mean with stuff that can break easily.
If Jira changes the HTML of the login screen or any other forms (which happes often) then this will break. So it is good for you if you have time to maintain it, but I think it is out of scope for this project. I want to keep things as simple as possible :)

Totally right I see the point. Having the procedure you describe with the backup would be the better solution anyway, because it wouldn't require populating the DB any time the container is started. Maybe I'll start messing around a bit at some point.

Thanks again for all your effort and OVER.

P.S. Just read your about page (https://codeclou.io/about/). Sounds awesome. I'm also German and working for a Munich-based company but currently living in Vancouver/Canada. We used your container for the data center performance testing procedure for which we developed our own framework (https://github.com/scandio/e4-framework). It's currently still in flux, but your container was a significant part that made our lives a lot easier! We mentioned it very positively in our test reports that we submitted to Atlassian :)

cool :) Canada is very nice :)

Thanks for the mentions :D I am currently in the middle of Data Center Performance Testing of my Jira App. And the Confluence Apps follow. I will have a look at your framework :)

Could you mabye tell me how you created the "large dataset test-data" in confluence? For Jira I use the https://marketplace.atlassian.com/apps/1210725/data-generator-for-jira?hosting=server&tab=overview app. But for Confluence there does not seem to be such a thing :(

Have you used the "official" tools?

JPT for JIRA:

I do not want to invest time into the wrong tool :D If you have any tipps that would be great :D


I had a look at your framework, it looks very similar to JPT :) Kotling for the win :D

Haha, now we are talking. Funny you're in the same process!

There is in fact a good data generator for Confluence! Don't use the one you mentioned. It lives here: https://bitbucket.org/goodsoftwareco/good-confluence-data-generator/src/master/

You simply build it and install it. Its purpose is exactly to populate Confluence with a dataset that is as big as Atlassian wants it to be. Be careful though, it BIG! i.e. you certainly have to give your nodes more heap space (hence this issue). And it takes really long. It's implemented nicely though. It's a job that runs immediately after plugin install and runs in batches, checking in your instance if the requirements for the current dataset are met. This way you can reinstall the plugin if something goes wrong and will just continue its thing. Now that I'm thinking: it would in fact be awesome to setup the Docker image so the dataset is already in the DB/ConfluenceHomeDir. We really performed the population for each of our test runs (after our Selenium script had run the start config).

We started using E3 (https://bitbucket.org/atlassian/elastic-experiment-executor) for our apps. But honestly. It... I can't express it in public. Let's just say, it has some unmet requirements. Other than with JPT, you can't do Selenium with it (or at least not with it as it is right now). You'll have only REST calls as a tool. We didn't find a way to isolate our app actions with REST calls. We needed users running a browser for testing. That's why we implemented our own framework. It's app-independent and we intend to use it for our Jira and Bitbucket apps as well. It avoids what I don't like about the other frameworks at all: it doesn't take all control away from you and doesn't hide what's going on! I have also tried JPT for testing our Jira app. I had a feeling it's pretty good, but I just can't deal with the fact that everything is hidden away from me. I feel like I have no power to understand what's going on with these frameworks.

You can see that docs are missing in the E4 repo. It's super fresh. The gist of it is:

  • Define a test package for your app containing selenium scenarios (we have some library for Confluence that I'd say is pretty good right now)
  • Implement virtual users that represent app users
  • Distribute the scenarios with weights onto the virtual users
  • In the actions that virtual users run, you can define what (sub-)action specifically you want to measure with this action
  • Tell the framework where your app is running (e.g. we ran it with your Docker container on an AWS EC2 xlarge instance)
  • Start arbitrarily many E4 "workers" anywhere. These will have REST endpoints open listening for the actions. Essentially this is one endpoint /prepare and one endpoint /start.
  • Make the local E4 client know where your workers are. For now we ran just one worker on one AWS xlarge instance.
  • A worker is also a Docker container (extends some JDK image) running our E4 app
  • Each virtual user becomes one Java thread, i.e. in the case of 250 users on one worker only, the worker would spin up 250 threads. If you have multiple workers the users will be distributed.
  • Tell the local E4 client (the same e4 app without the --worker-only flag) where your workers run (i.e. base urls)
  • Tell the local E4 client what test package you want to run on these workers, for how long, with how many concurrent users
  • START. In the end each worker will produce a SQLite DB file containing a table with the measurements for each action.

I just realized that I wrote quite a lot. But it also helps me retrospect what we did. I'm happy to share the first document with test results that we submitted by mail.

Cool, thanks for the tip. I had a lot of bugs with the Data Generator for Jira and have just found deeply hidden somewhere in the code existing DataSets for Jira but for MySQL. So for Jira I will use this one:


I am currently writing scripts to run my docker images inside my bare-metal Kubernetes cluster ^^
And all is going well, just this test-data and "snapshots" between tests are a little annoying.


I will definetly check our framework out in more detail.


Currently I have invested a lot of time understanding (code digging) the JPT and I was able
to write own Scenarios and Actions for "pages" and "rest" stuff. But it was hard to understand what exactly all this thing is doing ^^. I fully understand the "memory" classes and their crazy process now :D And have implemented my own memories (to remember ids of stuff and randomly get ids for tests).


I don't get why they have differen frameworks for confluence, jira and bitbucket. I already dislike the python driven Confluence one but we will see ^^

thanks a lot for your infos :)

Just fyi: created a small presentation on E4
https://slides.com/fgrund/e4/

You are too fast. Link updated.

Looks nice, but the command boxes are a little to small on my mac
image

Very nice :)

Dugh. Thanks. Browsers...
Should be fixed (I hope).

now its fine :) Ok gotta make some dinner now. cu