xorpaul/g10k

"deploy module" appears to update the entire environment anyway?

tmack0 opened this issue · 4 comments

I am doing initial testing to switch from r10k to g10k due to performance issues once we split our monolithic puppet repo into repo-per-module, with > 200 modules. g10k shows a very dramatic improvement for full environment deploys (from ~6m to ~30s), but not when updating a single module.

With r10k, the command 'r10k deploy module <module>' updates the cache of the module repo, scans the Puppetfiles of all environments for the given module and updates the module in each environment where the change is relevant (ie: tracking latest, or the updated branch, etc).
With g10k, I try 'g10k deploy module <module>' and I see it scans and updates all module cached repos across all environments, equivalent to a 'g10k deploy environment -puppetfile'. The update time is ~28s, same as the full -puppetfile update, while the r10k run completes the module deploy in ~2.3s. Did I miss something?

The bulk of our updates are on modules in testing environments, making this a problem for switching fully to g10k.

More specific stats/commands:
Our "control_repo" contains the Puppetfile, so branches of it are used by r10k to deploy puppet environments. All other modules are included in the puppetfile, usually with ':branch => :control_branch' and ':default_branch => "production"'.

# time sudo -u puppet g10k -verbose -config /etc/g10k.yaml deploy module test_r10k
2021/01/22 00:24:02 Executing ssh-agent bash -c 'ssh-add ; git --git-dir /var/cache/g10k/environments/test_control_repo.git remote update --prune' took 0.39529s
2021/01/22 00:24:02 Executing git --git-dir /var/cache/g10k/environments/test_control_repo.git branch took 0.00223s
2021/01/22 00:24:02 Executing git --git-dir /var/cache/g10k/environments/test_control_repo.git rev-parse --verify 'Feature_Branch2^{object}' took 0.00959s
...
many more "rev-parse --verify branchname^{object}" and "ls-tree" lines in environments and "remote-update --prune" lines in all the module cache dirs, followed by more "re-parse --verify some_branch^{object}" for all (environment) branches of all modules in the module cache dirs...
... 
Synced /etc/g10k.yaml with 13814 git repositories and 0 Forge modules in 31.4s with git (3.2s sync, I/O 21.7s) and Forge (0.0s query+download, I/O 0.0s) using 50 resolve and 20 extract workers

real	0m31.690s
user	0m34.756s
sys	0m3.236s

Where r10k took less than 3s:

# time sudo -u puppet /usr/local/bin/r10k -v -c /etc/r10k.yaml deploy module test_r10k
INFO	 -> Using Puppetfile '/etc/puppet/environments/feature_branch1/Puppetfile'
INFO	 -> Using Puppetfile '/etc/puppet/environments/other_branch/Puppetfile'
INFO	 -> Using Puppetfile '/etc/puppet/environments/test_canary/Puppetfile'
INFO	 -> Using Puppetfile '/etc/puppet/environments/test_update/Puppetfile'
...
INFO	 -> Deploying module /etc/puppet/environments/feature_branch1/modules/test_r10k
INFO	 -> Deploying module /etc/puppet/environments/other_branch/modules/test_r10k
INFO	 -> Deploying module /etc/puppet/environments/production/modules/test_r10k
INFO	 -> Deploying module /etc/puppet/environments/test_canary/modules/test_r10k
INFO	 -> Deploying module /etc/puppet/environments/test_update/modules/test_r10k

real	0m2.294s
user	0m0.772s
sys	0m0.444s

Digging into the -verbose output of g10k, its doing the "rev-parse --verify " on all environment branches of all modules once, and on the "production" environment branch once per (we have 51 environment branches currently, it ran it 51x on the production branch of each module)

This seems to be doing way more work than is necessary

Hmm, yes. To be honest I forgot that I had added the deploy module to g10k 😏

I mostly work with large Puppetfiles and multiple module developers, so I almost exclusively update the whole control repository.
But you are right the current deploy module <module_name> behaviour is unnecessary.

Just out of interest: how long is a simple time sudo -u puppet g10k -verbose -config /etc/g10k.yaml sync instead of trying to update a single module for you? Roughly the same (30 seconds)?

Roughly (a bit longer since it had several commits to deal with since I ran it yesterday). From the verbose output it appears to be doing a full run as well.

I've had another look at this and found out that you're using the wrong syntax for updating a single module.

The correct syntax for g10k is:

g10k -verbose -config /etc/g10k.yaml -module test_r10k

Looks like it skips every parameter after the unknown deploy parameter 🤷

I still have to add a test for this (#183), but you can test if the following version does what you want:
g10k-linux-amd64-debug.zip