jgraichen/salt-tower

when using with salt-ssh, tower runs twice and final pillar data gets wrong

cr1st1p opened this issue · 3 comments

Describe the bug
When used with salt-ssh to run a state, I end up with wrong data - tower is run twice, and due to merge strategies for arrays (I think), I get doubled data. If running just salt-ssh pillar.items I get the correct data.

To Reproduce
Steps to reproduce the behavior:

  1. Salt master / ext pillar configuration
    nothing out of the ordinary
  2. Broken component e.g. tower ext_pillar or yamlet renderer
    salt-ssh and/or tower (salt-ssh is known for behaving wrong sometimes)
  3. Demonstrating pillar data files, tower.sls, etc.
  4. Minion configuration (grains, os, ...)
  5. Pillar output
    can't give the output.

Expected behavior
Expecting the pillar data to be correct (tower running only once) when running an actual state via salt-ssh

Environment information:

  • Salt version: 3002.2
  • Python version: 3.9
  • Operating system Manjaro (Arch based)

Additional context
I added a print stack trace in tower/pillar/...: ext_pillar() - before doing the computation, and it looks like tower is run twice:

  1. probably before doing the actual ssh - to create pillar cache:
  File "/usr/lib/python3.9/site-packages/salt/client/ssh/__init__.py", line 1079, in run
    stdout, retcode = self.run_wfunc()
  File "/usr/lib/python3.9/site-packages/salt/client/ssh/__init__.py", line 1171, in run_wfunc
    pillar_data = pillar.compile_pillar()
  File "/usr/lib/python3.9/site-packages/salt/pillar/__init__.py", line 1187, in compile_pillar
  1. at start of state run, to ... build the pillar:
  File "/usr/lib/python3.9/site-packages/salt/client/ssh/__init__.py", line 1079, in run
    stdout, retcode = self.run_wfunc()
  File "/usr/lib/python3.9/site-packages/salt/client/ssh/__init__.py", line 1248, in run_wfunc
    result = self.wfuncs[self.fun](*self.args, **self.kwargs)
  File "/usr/lib/python3.9/site-packages/salt/client/ssh/wrapper/state.py", line 174, in sls
    st_ = salt.client.ssh.state.SSHHighState(
  File "/usr/lib/python3.9/site-packages/salt/client/ssh/state.py", line 82, in __init__
    self.state = SSHState(opts, pillar, wrapper)
  File "/usr/lib/python3.9/site-packages/salt/client/ssh/state.py", line 44, in __init__
    super(SSHState, self).__init__(opts, pillar)
  File "/usr/lib/python3.9/site-packages/salt/state.py", line 760, in __init__
    self.opts["pillar"] = self._gather_pillar()
  File "/usr/lib/python3.9/site-packages/salt/state.py", line 825, in _gather_pillar
    return pillar.compile_pillar()
  File "/usr/lib/python3.9/site-packages/salt/pillar/__init__.py", line 1187, in compile_pillar

Thanks for reporting! Can you check if the pillar argument to ext_pillar contains the pillar data from the first run when run the second time?

def ext_pillar(minion_id, pillar, *args, **_kwargs):
                          # ^ This one

External pillar are called with the "normal" pillar data, that salt-tower merges all new pillar data into. If salt or salt-ssh call ext_pillar() twice but with the data from the previous run, arrays will be merged again. If salt/salt-ssh does this, you should see the pillar data being passed to ext_pillar() on the second run.

If that is the case, it probably is an issue with salt/salt-ssh, but I would try to look for a workaround.

1st time, the argument contains pillar from non tower files
2nd time it contains all the pillar data computed 1st time (i.e. also from tower)
I added a workaround in tower.sls - everything in there is guarded by the presence of a pillar value, which would be set the first time tower runs.

But then I bumped into other issues with salt-ssh (saltstack-formulas/openssh-formula#201)
Salt is so buggy I wonder if I should switch to ansible - but I see it is also having >1k issues reported ... :-(

I do feel with you. They all do have bugs and in my opinion salt suffers a bit from doing so many different things, but I also had some experience with them responding and fixing bugs very well, when reported with good details.

Personally, I do use salt because I use the client-server architecture for more things, such as getting ACME certificates or executing remote checks. I have some projects with ansible and bundlewrap too, they all have different pros and cons. It surely is good to try them all and learn a bit about their differences. There isn't a clear winner.

If you want (or already have) opened an issue at saltstack/salt please link that here, they might be able to fix the root cause upstream. I will see if I can find some time to devise a workaround too.