Can't launch clusters in availability zones that aren't in your current knife[:region]
redbeard opened this issue · 6 comments
This simply causes an opaque fog error such as:
ERROR: The AMI ID 'ami-2af9741a' does not exist (Fog::Compute::AWS::NotFound)
It tracks down to line #125 in lib/cluster_chef/discovery.rb:
def self.fog_connection
@fog_connection ||= Fog::Compute.new({
:provider => 'AWS',
:aws_access_key_id => Chef::Config[:knife][:aws_access_key_id],
:aws_secret_access_key => Chef::Config[:knife][:aws_secret_access_key],
:region => Chef::Config[:knife][:region]
})
end
As AWS doesn't let you launch through the API with availability zones outside the region the connection is for.
IMHO a good resolution would be:
- Ability to launch in any availability zone, or
- A clearer error message "the availability zone you've chosen is incompatible with your region".
Do you know if it's OK if I just not refer to Chef:Config[:knife][:region]?
I don't know if that was our invention or something from the knife-ec2
plugin.
Here are the options:
- you must always specify an AZ. The region is taken from the AZ;
Chef::Config[:knife][:region] is ignored. - you must specify either an AZ OR a region. If you specify an AZ it
always wins over the region, with no warning. - you must specify exactly one of AZ or region. It refuses to launch if
both are given.
I lean towards #2.
However the defacto state may be #1, as the defaults may inject an AZ; if
people agree that #2 is correct I'll back that out.
flip
On Thu, Dec 22, 2011 at 9:30 PM, Tal Rotbart <
reply@reply.github.com
wrote:
This simply causes an opaque fog error such as:
ERROR: The AMI ID 'ami-2af9741a' does not exist
(Fog::Compute::AWS::NotFound)
It tracks down to line #125 in lib/cluster_chef/discovery.rb:
def self.fog_connection @fog_connection ||= Fog::Compute.new({ :provider => 'AWS', :aws_access_key_id => Chef::Config[:knife][:aws_access_key_id], :aws_secret_access_key => Chef::Config[:knife][:aws_secret_access_key], :region => Chef::Config[:knife][:region] }) endAs AWS doesn't let you launch through the API with availability zones
outside the region the connection is for.IMHO a good resolution would be:
- Ability to launch in any availability zone, or
- A clearer error message "the availability zone you've chosen is
incompatible with your region".
Reply to this email directly or view it on GitHub:
https://github.com/infochimps/cluster_chef/issues/91
infochimps.com - discover data
I lean towards 2 as well, but from my experiments the defaults do inject a region (us-east-1).
A nice way to make it more transparent is put some info in the step("creating cloud server")
line. Maybe the provider (ec2, etc.) and key provider-specific info - for ec2, the AZ, bits and ebs/instance.
Good, let's go with #2; and remove from the defaults "us-east-1d". I think the rest of the cloud.rb defaults are reasonable.
While we're on defaults, I'd value any feedback on whether the ones in volume.rb are sound. The most controversial one I think is specifying 'xfs' in the volumes...
request for an announcement is a good one.
something I've thought about is putting in a "pause" -- launching wouldn't
require a "Yes" confirmation, but would wait for 1 second after printing
several newlines + a declaration of intent (should make sure that if debug
mode is enabled the banner stays visible). The existing --yes flag would
override a pause confirmation same as it does a query confirmation.
Thoughts?
I don't have any opinion (yet) regarding it-- just about to get to the part of configuring my cluster with volumes (mostly for static, non-HDFS data)
Is there any point in specifying the filesystem if the volume is snapshot based?
mostly no... here are two ways it kinda is still relevant:
- We may add an explicitly-separate 'xfsgrow' cookbook that would run xfs.growfs on a volume early in the launch cycle (this lets you use 'create at launch' but specify a volume size larger than the snapshot, and have it adapt).
- As more "aspect-based" helpers come on line, these declarations become integration test points; so specifying the fs in the volumes statement becomes a spec.
The strong argument for being opinionated here: I don't like the idea of having people type out mount_options => 'defaults,nouuid,noatime'
-- this is risk-prone if omitted or mis-specified. I've left the door open for a person to override the default defaults in their config file (by diddling that constant).
We just had internal reason to examine this issue closer. It turns out several things are more deeply tied to region than previously realized, making this difficult to tackle without changes to the structure of homebases (inclusive-or the way they interact with the AWS APIs).
Removing the region declaration (and its associated deletion) doesn't seem to materially affect the ability to launch. Switching to a AZ outside of the region does, which leads to the conclusion that underlying calls are relying on that knife variable being set. Wrapping with something which sets and then unsets that knife variable should let us isolate those calls; I'm hopeful that once isolated, we will find better ways to call that don't rely on that shim.
Caveat
This doesn't address how to multiplex (or force cross-region identity) for things like AWS key pairs, which comes along with this overall problem. That will almost certain force enough breaking changes to warrant a major version bump, with all the pain that comes with that. For now, the workaround is likely to be ( holds nose ) branching the credentials repository by region, and throwing errors if there's a region/AZ mismatch.
There's also the issue of AMIs per region: the EC2 tools provided by Amazon can't migrate EBS backed images, and the best-looking third party tool runs to completion, but the resulting image is inaccessible. The obvious and easiest solution is to burn an image in each expected region; the more correct solution is to move away from image-based deployment entirely, so we can use stock AMI (etc) wherever we chose.