cncf/demo

Azure support

Opened this issue · 26 comments

Azure support
hh commented

@Zilman do you have same Azure creds I could use?
Cheers @hh

@hh don't have credits for Azure on hand but I can try and get us some, meanwhile you get $200 for free at first I believe.

@dankohn I can't, only ever used my personal free account (I think), not an admin on the main one - at least that's what it says when I go to the "Active Directory".

hh commented

We had to migrate to a pay-as-you-go to get around as the free limits on cores is too low for quick-iterations on dev work. We will refactor for smaller instances later to allow free-tier to work after everything is functioning.

hh commented

We seem to have triggered block on our API access from NZ.
(az login and API access seems to hit a firewall somewhere)

We moved to an IP in the states for now.

Is there someone at Microsoft that might be interested in supporting / unblocking us as we move along?

I'm hopeful @brendandburns might know someone who could send CNCF a few Azure credits.

Can you send me the subscription ID (bburns [at] microsoft [dot] com) and I'll see what we can do on this side.

--brendan

hh commented

@brendandburns done and thanks!

hh commented

We started getting kubelet panics when we started using --cloud-provider=azure

Took a while to come across: kubernetes/kubernetes#42576

Now we're off to generate --cloud-config=azure.json

-- Logs begin at Mon 2017-03-20 19:46:01 UTC, end at Mon 2017-03-20 20:18:32 UTC. --                                                                                                                                                                                                                   
Mar 20 19:48:56 etcd-master1 systemd[1]: Starting kubelet.service...                                                                                                                                                                                                                                   
Mar 20 19:48:56 etcd-master1 systemd[1]: Started kubelet.service.                                                                                                                                                                                                                                      
Mar 20 19:48:56 etcd-master1 kubelet-wrapper[2257]: + exec /usr/bin/rkt run --volume dns,kind=host,source=/etc/resolv.conf --mount volume=dns,target=/etc/resolv.conf --volume rkt,kind=host,source=/opt/bin/host-rkt --mount volume=rkt,target=/usr/bin/rkt --volume                                  
Mar 20 19:48:59 etcd-master1 kubelet-wrapper[2257]: pubkey: prefix: "quay.io/coreos/hyperkube"                                                                                                                                                                                                         
Mar 20 19:48:59 etcd-master1 kubelet-wrapper[2257]: key: "https://quay.io/aci-signing-key"                                                                                                                                                                                                             
...
Mar 20 19:48:59 etcd-master1 kubelet-wrapper[2257]: Downloading signature:  473 B/473 B                                                                                                                                                                                                                
Mar 20 19:49:00 etcd-master1 kubelet-wrapper[2257]: Downloading ACI:  0 B/237 MB                                                                                                                                                                                                                       
...
Mar 20 19:49:07 etcd-master1 kubelet-wrapper[2257]: Downloading ACI:  237 MB/237 MB                                                                                                                                                                                                                    
Mar 20 19:49:37 etcd-master1 kubelet-wrapper[2257]: image: signature verified:                                                                                                                                                                                                                         
Mar 20 19:50:59 etcd-master1 kubelet-wrapper[2257]: panic: runtime error: invalid memory address or nil pointer dereference [recovered]                                                                                                                                                                
Mar 20 19:50:59 etcd-master1 kubelet-wrapper[2257]:         panic: runtime error: invalid memory address or nil pointer dereference                                                                                                                                                                    
Mar 20 19:50:59 etcd-master1 kubelet-wrapper[2257]: [signal 0xb code=0x1 addr=0x20 pc=0xa32559]                                                                                                                                                                                                        
Mar 20 19:50:59 etcd-master1 kubelet-wrapper[2257]: goroutine 1 [running]:                                                                                                                                                                                                                             
Mar 20 19:50:59 etcd-master1 kubelet-wrapper[2257]: panic(0x448ae60, 0xc820030060)                                                                                                                                                                                                                     
Mar 20 19:50:59 etcd-master1 kubelet-wrapper[2257]:         /usr/local/go/src/runtime/panic.go:481 +0x3e6                                                                                                                                                                                              
Mar 20 19:50:59 etcd-master1 kubelet-wrapper[2257]: io/ioutil.readAll.func1(0xc820acca40)                                                                                                                                                                                                              
Mar 20 19:50:59 etcd-master1 kubelet-wrapper[2257]:         /usr/local/go/src/io/ioutil/ioutil.go:30 +0x11e                                                                                                                                                                                            
Mar 20 19:50:59 etcd-master1 kubelet-wrapper[2257]: panic(0x448ae60, 0xc820030060)                                                                                                                                                                                                                     
Mar 20 19:50:59 etcd-master1 kubelet-wrapper[2257]:         /usr/local/go/src/runtime/panic.go:443 +0x4e9                                                                                                                                                                                              
Mar 20 19:50:59 etcd-master1 kubelet-wrapper[2257]: bytes.(*Buffer).ReadFrom(0xc820acc998, 0x0, 0x0, 0x0, 0x0, 0x0)                                                                                                                                                                                    
Mar 20 19:50:59 etcd-master1 kubelet-wrapper[2257]:         /usr/local/go/src/bytes/buffer.go:176 +0x239                                                                                                                                                                                               
Mar 20 19:50:59 etcd-master1 kubelet-wrapper[2257]: io/ioutil.readAll(0x0, 0x0, 0x200, 0x0, 0x0, 0x0, 0x0, 0x0)                                                                                                                                                                                        
Mar 20 19:50:59 etcd-master1 kubelet-wrapper[2257]:         /usr/local/go/src/io/ioutil/ioutil.go:33 +0x156                                                                                                                                                                                            
Mar 20 19:50:59 etcd-master1 kubelet-wrapper[2257]: io/ioutil.ReadAll(0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)                                                                                                                                                                                               
Mar 20 19:50:59 etcd-master1 kubelet-wrapper[2257]:         /usr/local/go/src/io/ioutil/ioutil.go:42 +0x51                                                                                                                                                                                             
Mar 20 19:50:59 etcd-master1 kubelet-wrapper[2257]: k8s.io/kubernetes/pkg/cloudprovider/providers/azure.NewCloud(0x0, 0x0, 0x0, 0x0, 0x0, 0x0)                                                                                                                                                         
Mar 20 19:50:59 etcd-master1 kubelet-wrapper[2257]:         /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/cloudprovider/providers/azure/azure.go:74 +0x81                                                                                                                  
Mar

Oh boy, on AWS all the stuff it wants in azure.json was simply inferred by Kubernetes (you'd just tag those resources with the cluster name and pass just that to it). This makes it much more convoluted.

This is an unfortunate discrepancy. As far as I know Azure also has the concept of tags.

hh commented

Looks like we need to populate this manually:

https://github.com/kubernetes/kubernetes/blob/master/pkg/cloudprovider/providers/azure/azure.go#L38

// Config holds the configuration parsed from the --cloud-config flag
// All fields are required unless otherwise specified
type Config struct {
	// The cloud environment identifier. Takes values from https://github.com/Azure/go-autorest/blob/ec5f4903f77ed9927ac95b19ab8e44ada64c1356/autorest/azure/environments.go#L13
	Cloud string `json:"cloud" yaml:"cloud"`
	// The AAD Tenant ID for the Subscription that the cluster is deployed in
	TenantID string `json:"tenantId" yaml:"tenantId"`
	// The ID of the Azure Subscription that the cluster is deployed in
	SubscriptionID string `json:"subscriptionId" yaml:"subscriptionId"`
	// The name of the resource group that the cluster is deployed in
	ResourceGroup string `json:"resourceGroup" yaml:"resourceGroup"`
	// The location of the resource group that the cluster is deployed in
	Location string `json:"location" yaml:"location"`
	// The name of the VNet that the cluster is deployed in
	VnetName string `json:"vnetName" yaml:"vnetName"`
	// The name of the subnet that the cluster is deployed in
	SubnetName string `json:"subnetName" yaml:"subnetName"`
	// The name of the security group attached to the cluster's subnet
	SecurityGroupName string `json:"securityGroupName" yaml:"securityGroupName"`
	// (Optional in 1.6) The name of the route table attached to the subnet that the cluster is deployed in
	RouteTableName string `json:"routeTableName" yaml:"routeTableName"`
	// (Optional) The name of the availability set that should be used as the load balancer backend
	// If this is set, the Azure cloudprovider will only add nodes from that availability set to the load
	// balancer backend pool. If this is not set, and multiple agent pools (availability sets) are used, then
	// the cloudprovider will try to add all nodes to a single backend pool which is forbidden.
	// In other words, if you use multiple agent pools (availability sets), you MUST set this field.
	PrimaryAvailabilitySetName string `json:"primaryAvailabilitySetName" yaml:"primaryAvailabilitySetName"`

	// The ClientID for an AAD application with RBAC access to talk to Azure RM APIs
	AADClientID string `json:"aadClientId" yaml:"aadClientId"`
	// The ClientSecret for an AAD application with RBAC access to talk to Azure RM APIs
	AADClientSecret string `json:"aadClientSecret" yaml:"aadClientSecret"`
}
hh commented

Yea, would be nice to have this for Azure:

https://github.com/kubernetes/kubernetes/blob/master/pkg/cloudprovider/providers/aws/tags.go#L54

	// ClusterID is our cluster identifier: we tag AWS resources with this value,
	// and thus we can run two independent clusters in the same VPC or subnets.
	// This gives us similar functionality to GCE projects.
	ClusterID string

Yes, you have no choice -- and that is not how the other providers are implemented:
https://github.com/kubernetes/kubernetes/blob/master/pkg/cloudprovider/providers/aws/aws.go#L398

Could be an interesting thing to suggest upstream.

Edit: Jinx. :)

I haven't read the azure provider, do you think it would be messy to add this to it?

@hh Please see https://github.com/Azure/acs-engine

I think you will find it a much more pleasant way to turn up an Azure kubernetes cluster.

I'll work on the subcription stuff.

hh commented

We have been looking pretty heavily at the individual config generation parts at https://github.com/Azure/acs-engine/tree/master/parts

They've been useful in integrating the acs-engine approach.

hh commented

Just starting multiple build/deploys at once on Azure, had to increase default core quota from 10 to 100. Should be able to do least ten concurrent builds soon.

RE: Azure relying on hostname == nodeName, this is due to lack of metadata service, so we make assumptions: https://github.com/kubernetes/kubernetes/blob/master/pkg/cloudprovider/providers/azure/azure_instances.go#L98

hh commented

Great to meet you in Berlin @colemickens !

We really are cutting ourselves on some of the bleeding edges of kubernetes + azure + terraform on this one. :)

Let me know if you want some help in creating those upstream issues for azure / kubernetes we talked about here: https://github.com/cncf/demo/blob/fd21acc7655e849a8cdda9faf0c547fa2916a0dc/azure/readme.org#notable-issues

I opened issues for terraform azurerm_dns_srv_record list and azure-sdk-for-go NetworkInterfaceDnsSettings.dnsServers resolution

We'll look into refactoring this sometime soon after another cloud or two and likely add a second Azure approach using acs-engine provisioning of the kubernetes cluster.

  • Azure dns_zones do not provide IPs - Please create a new issue on this repository so that I can get the appropriate folks into a thread with you. Otherwise I fear if I just take this internal I'm going to wind up being a middle-man between you and our folks.

  • Azure CNAME records don’t resolve correctly - Same as the previous one. Need a bit more details on this too - an exact scenario they can repro.

  • Terraform azurerm_dns_srv_records do not support multiple dynamic entries - I think this is the one you opened on Terraform already, if not, please do.

  • Azure Cluster-Autoscale Virtual Machine Scale Sets are not yet supported by kubernetes - We can't do anything about this until the platform grows support for disk attachment to nodes. I've been asking for it for a long time. Unfortunately, no ETA to share now. There are already upstream issue(s) for it.

  • Starting kubelet without –cloud-config=azure.json results in a panic - Please file a bug upstream, should be a quick fix...

  • When using –cloud-provider=azure not only must you use –cloud-config=azure.json, it seems you have to provide all the optional settings as well. Failure to do so results in a panic. - There are no optional settings in that file, I'd roll it into the same issue for the above bullet point.

Finally, I think you might have mentioned another issue where the Azure DNS nameservers weren't forwarding requests? Again, if you can file an Issue against cncf/demo and tag me in it, I will share all three with the right folks internally and get the discussion going.

@hh, I've got an internal mail drafted, just waiting for cncf/demo issues for the other one (or two) issues listed the previous post. Then I can start getting the right folks to chime in. Thanks!

hh commented

Seeing that private DNS zones are not yet supported, I suspect that's why CNAME record resolution is broken.

hh commented

I've been spinning up AWS and Azure side by side, and I can confirm that the reason we needed a work around for CNAME was due to Azure not yet providing support for private zones

Our work around was to use multiple A records.

We have a working Azure deploy for now, even if we are abusing the public DNS service records. :)

@colemickens I just ran into panic-when-optional-config-keys-missing issue. Did you fill an issue for that? Can't find it.