Support out-of-process and out-of-tree cloud providers
errordeveloper opened this issue Β· 119 comments
Feature Description:
Support out-of-tree and out-of-process cloud providers, a.k.a pluggable cloud providers.
Feature Progress:
In order to complete this feature, cloud provider dependencies need to be moved out the the following Kubernetes binaries, then docs and tests need to be added. The Links to the right hand side of the binary denote the PRs that lead to the completion of the sub-feature
- Kube-controller-manager -
- kubernetes/kubernetes#34273,
- kubernetes/kubernetes#36785,
- kubernetes/kubernetes#39394,
- kubernetes/kubernetes#41856,
- kubernetes/kubernetes#42604,
- kubernetes/kubernetes#43777
- Kubelet
- Docs
- Tests
e2e Tests - Incomplete
The cloud-specific functionality of the above features needs to be moved into a new binary called cloud-controller-manager that support a plugin architecture.
Primary Contact: @wlan0
Responsible SIG: @k8s-mirror-cluster-lifecycle-feature-re
Design Proposal Link: kubernetes/community#128
Reviewers:
@luxas
@roberthbailey
@thockin
Approver:
@thockin
Feature Target:
Alpha: 1.7
Beta: 1.8
Stable: 1.10
Here's an updated status report for this feature, please let me know if anything needs clarification:
Beta (starting v1.11)
- The common interface used by cloud providers has been well tested and support will not be dropped, though implementation details may change. Any methods that are deprecated should follow the Kubernetes Deprecation Policy.
- The cloud controller manager has been tested by various cloud providers and is considered safe to use for out-of-tree providers. Features to be deprecated that are part of the cloud controller manager (controllers, component flags, etc) will follow the Kubernetes Deprecation Policy.
- The cloud controller manager does not run in any cluster by default. It must be explicitly turned on and added like any other control plane component. Instructions for setup may slightly vary per cloud provider. More details here.
Reasoning for Graduation
There were a few things on our TODO list that we wanted to get done before graduating to beta such as collecting E2E tests from all providers & improving out-of-tree storage. However, many of these initiatives require collaboration from external parties that was delaying progress on this effort. In addition, there was uncertainty since we do not develop some of the components we rely on, a good example is whether CSI would be able to meet demands for out-of-tree storage that was on par with in-tree storage support. Though in hindsight we have more confidence in CSI, prior to its beta release it was unclear if it would meet our requirements. With this context in mind, we had decided to graduate to beta because:
- blocking out-of-tree cloud providers from going beta meant that less in-tree providers will adopt this feature.
- some goals (like E2E tests from cloud providers) requires a significant amount of collaboration and may unnecessarily block progress for many releases.
- features that are lacking from the cloud controller manager (mainly storage) would be handled by future projects from other SIGs (e.g. CSI by SIG Storage).
Goals for GA (targetted for v1.13/v1.14)
- Frequently collect E2E tests results from all in-tree & out-of-tree cloud providers kubernetes/community#2224
- Cloud Provider Documentation includes:
- βGetting Startedβ documentation - outlines the necessary steps required to stand up a Kubernetes cluster.
- Documentation outlining all cloud provider features such as LoadBalancers, Volumes, etc. There should be docs providing a high-level overview and docs that dig into sufficient details on how each feature works under the hood.
- Docs should also be centralized in an automated fashion where documentation from all cloud providers are placed into a central location (ideally https://kubernetes.io/docs/home/).
- A well-documented plan exists for how to migrate a cluster from using in-tree cloud provider to out-of-tree cloud provider, this only applies to AWS, Azure, GCP, OpenStack, and VMWare.
- All current cloud providers have implemented an out-of-tree solution, deprecation of in-tree code is preferred but not a requirement.
Benefits:
- Easier configuration for providers like Azure that require a "cloud config" flag on kubelet/kcm. This file could instead by made a Secret (or ConfigMap + Secret). Makes bootstrapping easier and would eliminate the need for kubeadm to have special functionality for handling the cloudprovider flags.
- Selectively enablement. Some people want to run their own overlay network, but still want auto-provisioned L4 load balancers. There's no way to do that today.
- Moves more things out of core Kubernetes repo/project, and enables faster turn-around for shipping new cloudproviders or iterating/testing changes.
Just a note, kubelet
uses cloudprovider too, in addition to KCM.
cc @kubernetes/sig-cluster-lifecycle @kubernetes/sig-network @kubernetes/sig-storage @kubernetes/sig-aws @kubernetes/sig-openstack
I endorse this idea in general. I think the built-in cloud provider logic has served its purpose and its time to modularize. I think there are a number of facets to this that we have to work out including but not limited to:
- CloudProvider and all the APIs therein
- Volume drivers and provisioner support
- Cluster turnup support
I think it would be worthwhile to start building a doc that details these and explores options for ejecting each one. I don't think there's anything here that hasn't been considered at SOME point. Once we get that written down, we can craft a roadmap...
Sounds useful. @errordeveloper, do we have any mailing lists or GitHub discussions with this question what we can refer to?
@idvoretskyi not yet, this is probably very much on the radar of @kubernetes/sig-cluster-lifecycle.
I don't know that anyone is working on speccing this. It touches on a few
SIGs, but it is not exactly any of them.
On Tue, Sep 20, 2016 at 5:35 AM, Ilya Dmitrichenko <notifications@github.com
wrote:
@idvoretskyi https://github.com/idvoretskyi not yet, this is probably
very much on the radar of @kubernetes/sig-cluster-lifecycle
https://github.com/orgs/kubernetes/teams/sig-cluster-lifecycle.β
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#88 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFVgVJJcJXb3gnueygHKf_uNVH7GkQ2nks5qr9MLgaJpZM4J6QkU
.
@thockin you are right. May be we should form sig-cloud
?
so. many. sigs. I don't think we need a SIG for this. I doubt if it is
going to garner much resistance. There are just a lot of details to hammer
out. Being on the radar for lifecycle is fine. The hardest part here is
balancing the desire for modularity with the need for simplicity. That's
what I want to see explored :)
On Wed, Sep 21, 2016 at 12:07 AM, Ilya Dmitrichenko <
notifications@github.com> wrote:
@thockin https://github.com/thockin you are right. May be we should
form sig-cloud?β
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#88 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFVgVA1M1spBOe4hF4Wsgy-YQsvppxUeks5qsNeygaJpZM4J6QkU
.
@errordeveloper no need in yet another SIG (SIG-Cloud sounds like an abstract and umbrella one). I agree with @thockin - the primary SIG has to be @kubernetes/sig-cluster-lifecycle; while on behalf of @kubernetes/sig-openstack I'm going to track this item.
Hope other cloud-SIG's will be involved in the process as well.
@justinsb and I have discussed this on Slack, and looks like we may be able to get closer to getting similar user-facing value by exposing flags via component config. It also turns out --configure-cloud-routes
is already there. It doesn't look like this should involve moving code as such.
I think there is additional value to moving code out to add-ons: it will enable further cloud providers to be added without enlarging the core of Kubernetes.
Example: kubernetes/kubernetes#32419
Ah, but it also looks like someone is working on this:
kubernetes/kubernetes#32419 (comment).
On Wed, 28 Sep 2016, 18:19 Bryan Boreham, notifications@github.com wrote:
I think there is additional value to moving code out to add-ons: it will
enable further cloud providers to be added without enlarging the core of
Kubernetes.Example: kubernetes/kubernetes#32419
kubernetes/kubernetes#32419β
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#88 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAPWS7hzMWW-U6R-KrehxlYilUUXAIqnks5quqGkgaJpZM4J6QkU
.
That does not read as someone working on it, to me. This is a big problem
with a lot of facets, and it needs a capital-O Owner.
On Wed, Sep 28, 2016 at 10:27 AM, Ilya Dmitrichenko <
notifications@github.com> wrote:
Ah, but it also looks like someone is working on this:
kubernetes/kubernetes#32419 (comment)
.On Wed, 28 Sep 2016, 18:19 Bryan Boreham, notifications@github.com
wrote:I think there is additional value to moving code out to add-ons: it will
enable further cloud providers to be added without enlarging the core of
Kubernetes.Example: kubernetes/kubernetes#32419
kubernetes/kubernetes#32419β
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#88 (comment)
,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAPWS7hzMWW-U6R-
KrehxlYilUUXAIqnks5quqGkgaJpZM4J6QkU>
.β
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#88 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFVgVHiMNL34Eyg1T0vaRtIvQ065ssuTks5quqOKgaJpZM4J6QkU
.
@thockin In reference to kubernetes/kubernetes#32419, Rancher would be up for being a guinea pig for this. @wlan0 will be working on this and if the scope is massive we will see if we can pull in more resources. I want to see if understand the approach you were proposing in kubernetes/kubernetes#32419 and see if we are on the same page.
What we would do is implement the existing cloudprovider.Interface with a new cloud provider called "external". Ideally we wouldn't change the existing Interface, but if we hit some oddities it might make sense to modify it. This new external implementation will not delegate via a plugin model but instead through k8s resources and expect one to write controllers. Upfront it seems like we would need some new resources like CloudProviderLoadBalancer, Instance, Zone, Cluster, Route. A new cloud provider would need to be a controller that interacted with these resources.
That all seems pretty straight forward to me. Now the weird part is volume plugins. While it's not part of the CloudProvider interface, there seems to be a back channel relationship between volume plugins and cloud providers. To decouple those I'd have to spend a bit more time researching.
@thockin Is this the basic approach you were thinking?
I replied to @wlan0, but for the record...
simpler.
My "external" suggestion was more about designating that we are not using a
built-in and any controller loops that use CloudProvider should be
disabled. "" may be just as viable.
Once the built-in controllers are nullified, we run a cloud-specific
controller manager. I propose that the starting point LITERALLY be a fork
of the kube-controller-manager code. But instead of linking in 8
CloudProviders and switching on a flag, just link one. Simplify and
streamline.
One possible result is a library pkg that accepts a type CloudProvider interface
. In doing this, I am sure you will find things that need
restructuring or that are significantly harder this way, and that is when
we should discuss design.
I would suggest leaving volumes for last :)
On Thu, Sep 29, 2016 at 2:37 PM, Darren Shepherd notifications@github.com
wrote:
@thockin https://github.com/thockin In reference to
kubernetes/kubernetes#32419
kubernetes/kubernetes#32419, Rancher would be
up for being a guinea pig for this. @wlan0 https://github.com/wlan0
will be working on this and if the scope is massive we will see if we can
pull in more resources. I want to see if understand the approach you were
proposing in kubernetes/kubernetes#32419
kubernetes/kubernetes#32419 and see if we are
on the same page.What we would do is implement the existing cloudprovider.Interface with a
new cloud provider called "external". Ideally we wouldn't change the
existing Interface, but if we hit some oddities it might make sense to
modify it. This new external implementation will not delegate via a plugin
model but instead through k8s resources and expect one to write
controllers. Upfront it seems like we would need some new resources like
CloudProviderLoadBalancer, Instance, Zone, Cluster, Route. A new cloud
provider would need to be a controller that interacted with these resources.That all seems pretty straight forward to me. Now the weird part is volume
plugins. While it's not part of the CloudProvider interface, there seems to
be a back channel relationship between volume plugins and cloud providers.
To decouple those I'd have to spend a bit more time researching.@thockin https://github.com/thockin Is this the basic approach you were
thinking?β
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#88 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFVgVEdVlIPVVXuLrBE-NameYsUfMHULks5qvC98gaJpZM4J6QkU
.
Why is one linked in at all? What is the difference between --cloud-provider=external
and simply not specifying the --cloud-provider
at all? Then the Service/Routes are established with standalone addon controllers?
Or maybe we're on the same page? And then you're proposing a generic implementation of these standalone controllers (for now?) that can take the existing CloudProvider
interface to preserve existing functionality?
On Fri, Sep 30, 2016 at 12:16 AM, Cole Mickens notifications@github.com wrote:
Why is one linked in at all? What is the difference between --cloud-provider=external and simply not specifying the --cloud-provider at all? Then the Service/Routes are established with standalone addon controllers?
Without inspecting, I don't know if "" disables the controllers today,
so I didn't want to break compat during the transition. That's all.
If "" works, that is simpler.
Or maybe we're on the same page? And then you're proposing a generic implementation of these standalone controllers (for now?) that can take the existing CloudProvider interface to preserve existing functionality?
I think same page. As a starting point, we would decompose the single
{kube-controller-manager (KCM) + 8 CloudProviders} into 8 * {KCM + 1
CloudProvider}. At that point, each cloud-controller could diverge if
they want to, or we could keep maintaining the cloud controller
manager as a library.
So the controller manager that embeds certain control loops. Some of these loops are cloud provider specific:
- nodeController
- volumeController
- routeController
- serviceController
but most are provider agnostic:
- replicationController
- endpointController
- resourcequotacontroller
- namespacecontroller
- deploymentController
etc
I wonder if it would make sense to split the controller into 2 parts: base-controller (k8s code base). And provider-specific-controller (external repo, deployed by the user by choice). This way it would be more similar to the ingress controller path with the only slight difference: controller loops should be maintained as a library as all the providers will share them. Only implementation - attach/detachDisk/etc - will be provider specific. To make it backwards compatible, we can disble initializing cloud provider specific controllers in the current controller-manager code if the provider is passed as empty on the kubernetes start.
Or may be I'm just stating what you've already meant by "keep maintaining the cloud controller
manager as a library" @thockin
I think we are saying the same thing. kube-controller-manager will still
exist after this, but it will eventually get rid of all the cloud-specific
stuff. All the cloud-stuff will move to per-cloud controller binaries.
I would leave volumes for VERY LAST :)
On Fri, Sep 30, 2016 at 4:10 PM, Alena Prokharchyk <notifications@github.com
wrote:
So the controller manager that embeds certain control loops. Some of these
loops are cloud provider specific:
- nodeController
- volumeController
- routeController
- serviceController
but most are provider agnostic:
- replicationController
- endpointController
- resourcequotacontroller
- namespacecontroller
- deploymentController etc
I wonder if it would make sense to split the controller into 2 parts:
base-controller (k8s code base). And provider-specific-controller (external
repo, deployed by the user by choice). This way it would be more similar to
the ingress controller path with the only slight difference: controller
loops should be maintained as a library as all the providers will share
them. Only implementation - attach/detachDisk/etc - will be provider
specific. To make it backwards compatible, we can disble initializing cloud
provider specific controllers in the current controller-manager code if the
provider is passed as empty on the kubernetes start.Or may be I'm just stating what you've already meant by "keep maintaining
the cloud controller
manager as a library" @thockin https://github.com/thockinβ
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#88 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFVgVEIPxqm6_wbc8Ss0QZuONiwM2IrNks5qvZbWgaJpZM4J6QkU
.
Hi @thockin Is this the one you mentioned to me in the hall way conversation in Barcelona? Just making sure!
Is this planned for v1.6 or what's the plan?
@wlan @ibuildthecloud @alena1108
@dims yeah, this is the one.
On Mon, Oct 31, 2016 at 8:17 PM, Davanum Srinivas notifications@github.com
wrote:
Hi @thockin https://github.com/thockin Is this the one you mentioned to
me in the hall way conversation in Barcelona? Just making sure!β
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#88 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFVgVAYCPv8n1JV8TNOunXBr0z8KR-SBks5q5j7ggaJpZM4J6QkU
.
Who can provide the actual status of the feature? /cc @errordeveloper @thockin
proposal for this change - kubernetes/kubernetes#37037
Proposal now at kubernetes/community#128
Renaming issue to (hopefully) make clearer the main goal
@mikedanese does this feature target 1.6 (alpha, beta, stable)? Is @wlan0 a responsible person for the feature or someone else?
@thockin thank you.
I'm also reviewing and am maybe going to pair-program with @wlan0 in the future if needed.
I think we can set stage/alpha on this, because we're aiming for alpha in v1.6
@luxas thank you.
@idvoretskyi I agree with what @thockin and @luxas said.
@luxas @thockin please, provide us with the release notes and documentation PR or link at https://docs.google.com/spreadsheets/d/1nspIeRVNjAQHRslHQD1-6gPv99OcYZLMezrBe3Pfhhg/edit#gid=0
Also, please, select the valid checkpoints at the Progress Tracker.
Pingity ping. Is there a Docs PR for this? Is one needed?
@jaredbhatti I'm just adding the docs as I'm typing now. Expect a PR today.
@wlan0 The docs PR is still missing. Launch is tomorrow. Status?
@devin-donnelly I made the docs PR on that day - kubernetes/website#2900
@wlan0 The docs PR is still missing. Launch is tomorrow. Status?
@luxas said this was the repo to make the PR, and @thockin reviewed it. I'm pretty sure this is the right repo.
@thockin @luxas @roberthbailey @chrislovecnm @liggitt @saad-ali @erictune @wlan0 @bgrant0607 I've got some available bandwidth and would like to help out with this effort. Is there a current status reflected somewhere?
@rrati thanks for the offer! We could use a hand in moving the existing cloud providers to the new format.
The only other part that's left is the kubelet and I have the PR ready, I'm waiting for a dependent PR(kubernetes/kubernetes#43777) to get merged before I make another.
If that doesn't get merged by today, I'm just going to make the kubelet PR.
@wlan0 What is the new format for the cloud providers? Is that documented somewhere? Is the new format similar to what is seen in #32419?
@rrati - First these two new methods need to be implemented for various cloud providers - https://github.com/kubernetes/kubernetes/pull/42604/files
Then, we need to create separate repositories to contain the external cloud-controller-manager for each of the cloud provider. For eg. kubernetes/aws-cloud-controller-manager repo will have the cloudprovider code for aws. It has to be built similar to this - https://github.com/rancher/rancher-cloud-controller-manager
@rrati There is actually one more step - The Persistent Volume Label admission controller needs to be moved to the cloud-controller-manager.
Cool. Just find me on the kubernetes slack. We can discuss further steps there. My username is wlan0.
Sidhartha gave a demo of out of tree cloud providers on a recent community hangout. Video link here: https://youtu.be/5ihrOO5mTIA?t=2416
I've also put together an example cloud provider to integrate keepalived load balancers - might be useful for someone as a reference in future: https://github.com/munnerz/keepalived-cloud-provider
@wlan0 could you comment here the current status using the https://github.com/kubernetes/features/blob/master/ISSUE_TEMPLATE.md and I'll update the first comment above so the features team has this in tracking...
@luxas Sure! Here you go -
Feature Description:
Support out-of-tree and out-of-process cloud providers, a.k.a pluggable cloud providers.
Feature Progress:
In order to complete this feature, cloud provider dependencies need to be moved out the the following Kubernetes binaries, then docs and tests need to be added. The Links to the right hand side of the binary denote the PRs that lead to the completion of the sub-feature
- Kube-controller-manager -
- kubernetes/kubernetes#34273,
- kubernetes/kubernetes#36785,
- kubernetes/kubernetes#39394,
- kubernetes/kubernetes#41856,
- kubernetes/kubernetes#42604,
- kubernetes/kubernetes#43777
- Kubelet
- Docs
- Tests
e2e Tests - Incomplete
The cloud specific functionality of the above features need to be moved into a new binary called cloud-controller-manager that support a plugin architecture.
Primary Contact: @wlan0
Responsible SIG: Sig-Cluster-Ops
Design Proposal Link: kubernetes/community#128
Reviewers:
@luxas
@roberthbailey
@thockin
Approver:
@thockin
Feature Target:
Alpha: 1.7
Beta: 1.8
Stabe: 1.10
@wlan0 @luxas @thockin please, provide us with the design proposal link and docs PR link (and update the features tracking spreadsheet with it).
cc @kubernetes/sig-cluster-lifecycle-feature-requests
@thockin @wlan0 @roberthbailey Are we gonna try to make this beta in v1.8?
@rrati is almost done with initializers for volume lifecycle mgmt. He'll confirm about the status of that work.
We need to add support for automatically pushing the cloud-controller-manager image to gcr.io on releases.
I have the code for kops to start the external cloud controller manager with me. I still have to add tests though. This will facilitate e2e tests.
At what stage can we call it beta?
Inital pass of using initializers to control setting labels with the cloud-controller-manager #44680
Works, although there really should be a way to disable the admission controller plugin when using the cloud-controller-manager imo.
@luxas I'm condensing the discussion we had about the requirements for beta. Please feel free to suggest any additions.
Things needed for beta
- Lots of docs (tutorials, blogs, walkthroughs etc)
- Testing with various clouds (dogfooding)
- Automatically build and create a image for cloud-controller-manager(CCM) on gcr.io
- Automatically run e2e tests for CCM
- bugfix for kubelet certificate ip addresses (discussed here - kubernetes/kubernetes#47152 (comment))
Hoping to get a working version of DigitalOcean CCM if it'll help move this to beta at all. As discussed earlier in this thread, getting volumes to work with CCM is a bit tricky since it requires some work in the kubelet (CSI coming soon?) and it is kept for last. Is anyone currently doing any work on that or are we just waiting for CSI to be a thing?
@andrewsykim We are waiting for CSI to be a thing
I spawned kubernetes/kubernetes#48690 to continue discussing this there... kubernetes/features should be pretty low volume.
@idvoretskyi Added this to the v1.8 milestone. Let me know if the feature description needs to be updated.
We're hoping to get this to beta.
We have done great progress on this effort in the v1.8 cycle, but due to the vast amount of tasks yet to be done and dependencies on other efforts, this has to stay in alpha for one or more cycles yet.
Relevant video discussion: https://youtu.be/XwhxuGoK0Ok
On the good side; we're also bootstrapping a dedicated WG for this effort instead of having just a few people syncing every now and then instead of on a regular interval.
@wlan0 is also gonna update the proposal we wrote earlier with more relevant information on new complexity this has to deal with.
@luxas Here's the latest update on the feature progress
Feature Description:
Support out-of-tree and out-of-process cloud providers, a.k.a pluggable cloud providers.
Feature Progress:
We have completed all of the major refactoring required to support this feature and have a working CCM.
@rrati merged the persistent volume refactoring changes recently, and that was the last major change made - kubernetes/kubernetes#44680
Kubernetes Core
In the Kubernetes core repository, the parts left now are bug fixes, e2e tests and minor additions to the core to enable every usecase of the cloud integration with k8s to be possible using CCM.
In summary for the core
- Bug Fixes (details below)
- Additions (details below)
- E2E tests
Features/Deprecations
Note: -> indicated PR for issue above
kubernetes/kubernetes#50926
-> kubernetes/kubernetes#51318
-> kubernetes/kubernetes#51528
-> kubernetes/kubernetes#51318
kubernetes/kubernetes#51406
kubernetes/kubernetes#50986
kubernetes/kubernetes#51409
kubernetes/kubernetes#44975
Bug Fixes
kubernetes/kubernetes#49202
kubernetes/kubernetes#51124
kubernetes/kubernetes#51629
kubernetes/kubernetes#51761
kubernetes/kubernetes#50289
kubernetes/kubernetes#50422
CCM Plugins Progress
In terms of plugins, We still need every cloudprovider to implement their CCM using the new model. I'm aware of most of efforts by cloudproviders to create CCMs for their cloud, but I'm not entirely sure of the status of these efforts. I know GCE has made a PR for this - kubernetes/kubernetes#50811
I would like to use this issue comment as a medium to ask the members representing the cloudproviders ( @andrewsykim @prydie @justinsb @jdumars @FengyunPan @cheftako @alena1108 @BaluDontu ) to provide an update on the status of their CCM efforts in a comment below. Please add a link to your implementation if it's available for others to take a look at. I'll retroactively add the status and link into this comment.
- GCE -
- AWS -
- Azure -
- OpenStack -
- Rackspace -
- Oracle -
- Digital Ocean - https://github.com/digitalocean/digitalocean-cloud-controller-manager
- Rancher -
- CloudStack -
- Ovirt -
- Photon -
- AliBaba -
- Vsphere
Primary Contact: @wlan0
Responsible SIG: Sig-Cluster-Ops
Design Proposal Link: kubernetes/community#128
Reviewers:
@luxas
@roberthbailey
@thockin
Approver:
@thockin
Feature Target:
Alpha: 1.7
Beta: 1.9
Stabe: 1.10
Great work on this effort everyone!
@wlan0 please add https://github.com/digitalocean/digitalocean-cloud-controller-manager for DigitalOcean. It's currently v1.7 compatible and will be v1.8 compatible shortly after v1.8 release.
Thanks for all your great work here!
@spacexnice and myself(@Crazykev) are working on Aliyun(AKA Alibaba Cloud) specified cloud provider. It's already on our v1.7.2 release, and will be updated with upstream after v1.8 released.
we're also bootstrapping a dedicated WG for this effort instead of having just a few people syncing every now and then instead of on a regular interval.
@luxas @wlan0 Is this WG already bootstrapped or any regular meetings? We'd like to join disscussion and see what we can do here.
Everybody that is interested in this effort; join https://groups.google.com/forum/#!forum/kubernetes-wg-cloud-provider and we'll take it from there
What part of this is being delivered in 1.8? Sorry if that's here somewhere.
@jdumars This feature has been pretty greatly improved generally in v1.8, but not enough to meet the beta criterias. Docs PR is here kubernetes/website#5400
Thanks for all the great works!
Base on your work, Alibaba Cloud has officially support out-of-tree cloudprovider which is also open sourced . see AliyunContainerService/kubernetes.
Currently, one more thing , Is there a central repository to hold all of these out-of-tree provider ? @luxas cc @Crazykev @denverdino
@errordeveloper π Please indicate in the 1.9 feature tracking board
whether this feature needs documentation. If yes, please open a PR and add a link to the tracking spreadsheet. Thanks in advance!
@errordeveloper Bump for docs βοΈ
/cc @idvoretskyi
@zacharysarah @idvoretskyi I don't own this any more, but @luxas and @wlan0 would have a good idea of the progress here.
@luxas We have adequate docs about concepts, how to create CCM for own cloud, and how to run CCM with examples.
/kind feature
@wlan0 @jagosan @luxas -- I checked the docs links ^^. If this feature is moving to beta with 1.10, looks as though the DaemonSet example needs to be updated, and also Administration. I can help, but please let me know feature status. Thanks!
/cc @idvoretskyi
@Bradamant3 We are not moving to beta in 1.10. We're still working out some issues and e2e tests haven't been added yet.
We are not moving to beta in 1.10. We're still working out some issues and e2e tests haven't been added yet.
FYI @nickchase ^
@wlan0 @luxas @thockin
Any plans for this in 1.11?
If so, can you please ensure the feature is up-to-date with the appropriate:
- Description
- Milestone
- Assignee(s)
- Labels:
stage/{alpha,beta,stable}
sig/*
kind/feature
cc @idvoretskyi
We're still discussing whether we want to push this feature to beta in v1.11. We'll have a better idea after tomorrow's WG meeting.
@andrewsykim please be sure to update the feature issue with the relevant details (mentioned above), as soon as you all make a decision tomorrow.
Feature freeze is today (EOD PDT).
To provide a few questions from my observation:
- Release
- How are the separated cloud controller manager be released, do we propose single process or each should be on their on?
- Who will be responsible for tagging the commit and uploading release packages? Does each sig own their release process?
- Doc location
-
There would be a docs dir in every single provider:
https://github.com/kubernetes/community/blob/master/wg-cloud-provider/cloud-provider-requirements.mdWhat's the relationship with the kubernetes website? Shall the document be periodly synced?
https://github.com/kubernetes/website/blob/master/docs/concepts/cluster-administration/cloud-providers.md
- Migration doc
- For users switching to standalone cloud controller provider mode, they would find the cloudprovider based volume plugins won't work.
There should be instruction in turning onexternal-cloud-volume-plugin
, addingprovider-id
, and so on.
- E2E baseline
- When running e2e tests against cloud provider, which Kubernets release to pick: the latest version or latest tagged version on the same major releasing branch?
Note there are two kubernetes versions to choose:
- The main release, providing hyperkube, which is to be deployed to cluster
- The test base, providing e2e test cases, which is to be run locally
This won't be a problem for testing k8s main repo, since it would always pick current build.
- In the long term , when out of tree providers become stable (removed from main), will kubernetes main repo still run e2e tests with specific cloud providers? If so which cloud provider verison will it pick?
How are the separated cloud controller manager be released, do we propose single process or each should be on their on?
It would be up to each provider to maintain their own schedule. Ideally it will be in sync with kubernetes releases but not a strict requirement.
Who will be responsible for tagging the commit and uploading release packages? Does each sig own their release process?
Each provider should have a set of OWNERS that will be responsible for that. For in-tree providers that transition into out-of-tree providers will likely adopt the current SIG OWNERs.
For users switching to standalone cloud controller provider mode, they would find the cloudprovider based volume plugins won't work. There should be instruction in turning on external-cloud-volume-plugin, adding provider-id, and so on.
Agreed, as we go from beta -> GA, docs will become increasingly important, however, we have a tested mechanism in place to provide external-cloud-volume-plugin as you mentioned which we think is enough to move forward for now.
When running e2e tests against cloud provider, which Kubernets release to pick: the latest version or latest tagged version on the same major releasing branch?
In the long term , when out of tree providers become stable (removed from main), will kubernetes main repo still run e2e tests with specific cloud providers? If so which cloud provider verison will it pick?
Yes we're still hashing this out, but we don't think it will change the API/interface we use for this feature, but definitely something that will be worked on for GA release. We do have a process in place for E2E testing cloud providers, openstack being early adopters there.
Happy to answer any questions in further details either here or in the weekly WG meetings. Overall we think there's a lot of work left for this feature but we think the feature set/API we provide to make this work has been well tested and is functional enough for a beta release. The biggest problems we'll face going forward is pushing SIGs/providers in-tree to adopt this but we don't think adoption from in-tree providers is a strict requirement for beta (but most likely for GA).
@wlan0 @andrewsykim please fill out the appropriate line item of the
1.11 feature tracking spreadsheet
and open a placeholder docs PR against the
release-1.11
branch
by 5/25/2018 (tomorrow as I write this) if new docs or docs changes are
needed and a relevant PR has not yet been opened.
Preliminary GA milestones
- E2E tests reported to test grid by all providers
- Automated Docs update - working with docs team to automatically fetch docs from provider repos and into kubernetes/website
- Production ready migration plan for in-tree providers to migrate to out-of-tree successfully
- Have a better story for integrating external clouds with the TLS bootstrapping feature - right now there's a dependency loop since TLS bootstrapping relies on the address types set by the cloud provider aware kubelet.
Looks like we still need some docs to get this feature ready for release @wlan0 @andrewsykim
Could I please get some help with that? If thereβs anything I can do to assist please let me know
At a minimum we're looking to have a placeholder PR on the kubernetes/website repo. The process is fairly straightforward: checkout release-1.11 branch, make a placeholder commit, push it to your fork, and raise a PR between it and the release-1.11 branch, with /hold status.
THANKS SO MUCH!!!!!
This one actually seems to have merged docs -- just the spreadsheet needs to be filled in. Can you do that, @zparnold ?
@MistyHacks Sure can and sure will!
@wlan0 @andrewsykim -- We're doing one more sweep of the 1.11 Features tracking spreadsheet.
Would you mind filling in any incomplete / blank fields for this feature's line item?
FYI we're realizing that this features issue is lacking so we will have a more detailed update soon on what we're expecting out of beta (v1.11) and GA (tentatively targeted for v1.13)
Here's an updated status report for this feature, please let me know if anything needs clarification:
Beta (starting v1.11)
- The common interface used by cloud providers has been well tested and support will not be dropped, though implementation details may change. Any methods that are deprecated should follow the Kubernetes Deprecation Policy.
- The cloud controller manager has been tested by various cloud providers and is considered safe to use for out-of-tree providers. Features to be deprecated that are part of the cloud controller manager (controllers, component flags, etc) will follow the Kubernetes Deprecation Policy.
- The cloud controller manager does not run in any cluster by default. It must be explicitly turned on and added like any other control plane component. Instructions for setup may slightly vary per cloud provider. More details here.
Reasoning for Graduation
There were a few things on our TODO list that we wanted to get done before graduating to beta such as collecting E2E tests from all providers & improving out-of-tree storage. However, many of these initiatives require collaboration from external parties that was delaying progress on this effort. In addition, there was uncertainty since we do not develop some of the components we rely on, a good example is whether CSI would be able to meet demands for out-of-tree storage that was on par with in-tree storage support. Though in hindsight we have more confidence in CSI, prior to its beta release it was unclear if it would meet our requirements. With this context in mind, we had decided to graduate to beta because:
- blocking out-of-tree cloud providers from going beta meant that less in-tree providers will adopt this feature.
- some goals (like E2E tests from cloud providers) requires a significant amount of collaboration and may unnecessarily block progress for many releases.
- features that are lacking from the cloud controller manager (mainly storage) would be handled by future projects from other SIGs (e.g. CSI by SIG Storage).
Goals for GA (targetted for v1.13/v1.14)
- Frequently collect E2E tests results from all in-tree & out-of-tree cloud providers kubernetes/community#2224
- Cloud Provider Documentation includes:
- βGetting Startedβ documentation - outlines the necessary steps required to stand up a Kubernetes cluster.
- Documentation outlining all cloud provider features such as LoadBalancers, Volumes, etc. There should be docs providing a high-level overview and docs that dig into sufficient details on how each feature works under the hood.
- Docs should also be centralized in an automated fashion where documentation from all cloud providers are placed into a central location (ideally https://kubernetes.io/docs/home/).
- A well-documented plan exists for how to migrate a cluster from using in-tree cloud provider to out-of-tree cloud provider, this only applies to AWS, Azure, GCP, OpenStack, and VMWare.
- All current cloud providers have implemented an out-of-tree solution, deprecation of in-tree code is preferred but not a requirement.
@justaugustus would you mind updating the features description with our progress above?
cc @kubernetes/features-maintainers
/sig cloud-provider
Thanks for keeping this up-to-date, @andrewsykim!
Should we do something @justaugustus to make it tracked/yes?