capistrano/sshkit

Commands and command output seem to get different UUID values

lazyatom opened this issue · 3 comments

In the logs for our deployments, I often see output like this:

  INFO [01e05be1] Running /usr/bin/env git remote update --prune as user@hostname
 DEBUG [01e05be1] Command: cd /data/repo && ( export GIT_ASKPASS="/bin/echo" GIT_SSH="/tmp/git-ssh-app-env-username.sh" ; /usr/bin/env git remote update --prune )
 DEBUG [b183598d] 	Fetching origin
 DEBUG [b183598d] 	From github.com:organisation/app-repo
 x [deleted]         (none)     ->some_branch
 DEBUG [b183598d] 	 x [deleted]         (none)     -> some_other_branch
 DEBUG [b183598d] 	remote: Enumerating objects: 42, done.
remote: Counting objects:   2% (1/42)
remote: Counting objects:   4% (2/42)
remote: Counting objects:   7% (3/42) 
remote: Counting objects:   9% (4/42)
...
 DEBUG [b183598d] 	Unpacking objects:  95% (40/42)   
 DEBUG [b183598d] 	Unpacking objects:  97% (41/42)   
 DEBUG [b183598d] 	Unpacking objects: 100% (42/42) 
  INFO [b183598d] Finished in 4.171 seconds with exit status 0 (successful).

Ignoring the specific command, what is causing me some problems are the different command UUIDs that are appearing, so in this instance:

  • 01e05be1 for the top-level command ("Running" / "Command" lines)
  • b183598d for the output from the command (output and "Finished" lines)

The user-level problem here is that having different UUIDs makes it much harder for me to trace issues in deployments with multiple servers.

This seems to happen regardless of whether or not the deploy is to a single server or multiple servers (i.e. I see it in both types of Capistrano stage), and isn't specific to git commands (i.e. I see it in bundle installs and pretty much every other command that isn't trivial).

I've tried to replicate this in the simplest possible case so I can figure out why it's happening, but that's the weird part: I can't. Given this task:

task :dummy_task do
  on roles([:solo, :app]) do
    within release_path do
      execute("/usr/bin/env ruby -v")
    end
  end
end

Here's the logging I see:

  INFO [1433932f] Running /usr/bin/env ruby -v as user@host
 DEBUG [1433932f] Command: /usr/bin/env ruby -v
 DEBUG [1433932f] 	ruby 2.4.5p335 (2018-10-18 revision 65137) [x86_64-linux-gnu]
  INFO [1433932f] Finished in 0.183 seconds with exit status 0 (successful).

... the same UUID for both the "Running" / "Command" lines, and for the output from the command. This seems to be the case regardless of the command that I put into the execute call. So unfortunately I don't seem to be able to generate a minimal replicable case ☹️

Do you have any idea why this might be happening? Is this expected behaviour? If I can at least get a minimal test case together, I'd be very happy to explore a fix.


Environment details:

sshkit (1.17.0)
capistrano (3.11.0)
capistrano-bundler (1.3.0)
capistrano-faster-assets (1.1.0)
capistrano-maintenance (1.2.0)
capistrano-rails (1.4.0)
airbrussh (1.3.0)

This strikes me as a legitimate bug. IIRC the way that SSHKit determines the UUID is lazily generating a random hash when Command#uuid is first called. There could be a race condition there. Or if the Command object is duped at a point before uuid has been called, that could lead to two copies of the command with different UUIDs.

I haven't had a chance to look at the code yet to confirm all this. But if my hunch is right, perhaps the solution would be to calculate the UUID eagerly during Command#initialize?

I'll try that and let you know if it changes the logging during one of the real deploys. Thanks!

That does seem to fix things, which is great. The only spot I can see which dups the command is the with_redaction method, which is only called in the backend's execute_command method:

def execute_command(cmd)
  output.log_command_start(cmd.with_redaction) # will use a dup'd command
  cmd.started = true
  exit_status = nil
  with_ssh do |ssh|
    ssh.open_channel do |chan|
      chan.request_pty if Netssh.config.pty
      chan.exec cmd.to_command do |_ch, _success|
        chan.on_data do |ch, data|
          cmd.on_stdout(ch, data)
          output.log_command_data(cmd, :stdout, data) # uses original command
        end
# etc...

So that seems like a very likely culprit. I'll work up a pull request with the fix.