facebook/taste-tester

taste-tester needs an option for customizing errors when ssh fails

nishchintraina opened this issue · 9 comments

The SSH Util code deals with building a vanilla ssh command for logging into a host, where the user gets to customize the command options using hooks:

jumps = TasteTester::Config.jumps ?
"-J #{TasteTester::Config.jumps}" : ''
"#{TasteTester::Config.ssh_command} #{jumps} -T -o BatchMode=yes " +
'-o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no ' +
"-o ConnectTimeout=#{TasteTester::Config.ssh_connect_timeout}"

The building blocks of this vanilla ssh command could be generated out of a "config specific" command which is more user-friendly for the context for which the config is made. Example custom config:

ssh_cmd_prefix =  Mixlib::ShellOut.new("ssh_cmd_prefix generator command").run_command.stdout.chomp
ssh_command ssh_cmd_prefix

When a taste-tester test run fails because of ssh authentication failing, it spews out the vanilla command directly:

error = <<~ERRORMESSAGE
SSH returned error while connecting to #{TasteTester::Config.user}@#{@host}
The host might be broken or your SSH access is not working properly
Try doing
#{ssh_base_cmd} -v #{ssh_target}
to see if ssh connection is good.
If ssh works, add '-v' key to taste-tester to see the list of commands it's
trying to execute, and try to run them manually on destination host
ERRORMESSAGE

This might be vastly different from the actual custom command ("ssh_cmd_prefix generator command") the owner of the config might want to display during the failure so the developer running taste-tester has a command to run directly to validate the ssh connection first.

There is no place in taste-tester code to customize this behavior. An idea here was to try this in post_test hook but that doesn't fire when there is an ssh failure.

Hence what we need here is some sort of mechanism, hook or just a simple config variable - which can be configured to display the appropriate "ssh_cmd_prefix generator command".

I could see 1) a hook for customizing the ssh error messaging specifically or 2) a more generic failure hook to be called on any failure which could be used in this case. Not sure which is better or if there's a better 3rd option.

OK so let's add some context here for everyone who's not on the FB Cloud team :)

You can, due to the config being Ruby, generate the ssh_command in the config dynamically:

ssh_command Mixlib::ShellOut.new('/usr/bin/generate_ssh_command').run_command.stdout

We can generic-ify this problem a bit... there's another way someone could do this. Due to Ruby's ability to monkeypatch, and TT's many hooks, one could have an early hook that monkeypatches the ssh_command config to be different than the config, ala:

# the command we passed in, in the config, is actually a generator. Run it, and jam the results
# back into the config
from_config = TasteTester::Config.ssh_command
TasteTester::Config.ssh_command = Mixlib::ShellOut.new('from_config').run_command.stdout

In either case, the resulting error message will be the generated command not the generator command, which is really great - TT is showing you what actually got run, so I think TT is correct here. However the request, as I understand it, is "well, the user probably might need to run the generator which may give more context or customize it for their keys or... whatever, the case is I just want to show the user the generator command".

I will argue adamantly here that the command-actually-run must be printed out, but it seems valid to me to want some way to give the user the generator command as well.

So, to be clear here, there's no bug, but we're down a rabbit hole of monkeypatching and we'd like some extra hooks to make that rabbit hole a bit more comfortable. Sounds reasonable to me.

I see three possibilities. First two, same as KC:

  1. Hook for customizing errors. This feels very bad to me, you'll have to work pretty hard to convince me this is the right path.
  2. Some sort of hook that runs on any error and could print out additional information. This seems OK to me. You'd have to duplicate your logic a bit though - have the generator in the config and in your error hook.

There's also a third option:

  1. We could add a ssh_command_generator config which does this for you and then can update the error to something like:
    error = <<~ERRORMESSAGE 
  SSH returned error while connecting to #{TasteTester::Config.user}@#{@host} 
  The host might be broken or your SSH access is not working properly. Here's some
  commands to try to reproduce and debug the issue.

  Try doing 
 
      #{ssh_base_cmd} -v #{ssh_target} 
 
  to see if ssh connection is good. 
    ERRORMESSAGE

    if TasteTester::Config.ssh_command_generator
      error += <<~ERRORMESSAGE

The above command was generated, and it may be useful to run the generator directly instead:

      #{TasteTester::Config.ssh_command_generator}
      ERRORMESSAGE
    end
    error += <<~ERRORMESSAGE

If ssh works, add '-v' key to taste-tester to see the list of commands it's 
trying to execute, and try to run them manually on destination host 
    ERRORMESSAGE

I'm fine with 2 or 3. 2 is less work, though 3 is probably a bit cleaner. However if you go down the path of 3, you'll want to be careful (and probably add some unit tests around the ssh code first). I combined the copy-pasta ssh code a lot recently, but it's still quite brittle.

Cool. My thought pricess was aliging around 2 and 3 as well. It might be worth doing both 2 and 3 because:

  1. Having 2 gives us the ability of doing some other error handling/cleanup - for ex kill an loaded ssh-agent
  2. Having 3 means that we can save the generator command in the config itself, simplifies the logic a lot and also it fires for all taste-tester commands which need ssh auth (untlest, keeptesting) so it's more comprehensive.

Will submit a pull request for this. :)

@nishchintraina sounds great! I have two requests, however:

  1. Make those two separate pull requests. The error hook will be easier, I'd start with that.
  2. When you do the ssh_command_generator PR, pretty please add some tests that pass with the existing code, and then modify the code and the tests, to ensure we're not breaking stuff. It'll make it easier for you to write the code, and WAY easier for us to review the code. If need some input on what those tests should look like myself or @NaomiReeves can definitely give you some good pointers.

Thanks and I'm looking forward to the PRs!

Pull request for 2: #168

@jaymzh I have the PR for 2. done, can you or @NaomiReeves point me to what tests you mean? I don't notice any unit-test framework in the repo. taste-tester/test dir doesn't look like it. An example would be perfect!

Sorry I am a ruby/chef n00b. :)

The "tests" in the /test directory are meant for some manual tests, you can ignore those. Instead look at the tests in between_meals (which taste-tester uses) as an example (but your new tests will need to be in taste-tester):
https://github.com/facebook/between-meals/tree/master/spec

Off-hand, tests that should be there:

  • Building base ssh command
    • Has all options from config
    • Has all extra options expected
    • Handles user properly
    • Handles base64 encoding properly in both sudo and non-sudo modes
  • Building tunnel ssh command
    • Has tunnel options
    • Has options from config
    • Has any extra options expected
    • Handles user properly

Oh! And since there's not a lot of specs in BM, fb_storage has a TON of specs that are pure rspec and not chefspec, which are all great examples:

https://github.com/facebook/chef-cookbooks/tree/master/cookbooks/fb_storage/spec

I believe this was resolved with #168, #169, #170, #171, and #172