operator-framework/operator-sdk

Support hybrid operators

hasbro17 opened this issue · 18 comments

EDIT(@estroz): this issue originally addressed a bug where users could create apis, controllers, and k8s deepcopy code in non-Go projects. Since significant discussion related to hybrid operators is present here, I changed the title and am keeping this issue open to further track hybrid operator discussion. #672 fixed the original bug.


Original issue text:

The following SDK commands are only relevant to a Go type project and should not run in another project type like Ansible(or Helm in the future).

  • operator-sdk add api ...
  • operator-sdk add controller ...
  • operator-sdk generate k8s

We need to consider the use case of transitioning from a pure Ansible type project to a hybrid or Go type project before we restrict these commands from running in an ansible type project.

I was thinking that the steps for transition are:

  1. In Go path.
  2. drop cmd/manager/main.go on disk (this will need to be the correct main.go though maybe there is enough overlap between helm and ansible here that we can make this really easy and only 1 file?).
  3. add/change dockerfile to incorporate the binary
  4. dependencies for golang.

All of these are steps I could document today and we would have a transition path, before or after operator-sdk add api is called. Maybe it makes sense that we have to do these steps before you can call these? I am torn here because I could see myself creating the API and then doing the main file bits etc..., but on another hand sometimes taking away options just leads to fewer bugs.

I think either way the above commands must be in the gopath is that correct?

  1. drop cmd/manager/main.go on disk (this will need to be the correct main.go though maybe there is enough overlap between helm and ansible here that we can make this really easy and only 1 file?).

@shawn-hurley I think the only differences would be loading the watches.yaml file and calling Add() with the correct controller. If those aren't the only differences, I would think we could get there.

Also, do we need to consider transitions between or hybrid combinations of arbitrary types (e.g. Ansible to Helm, or hybrid Ansible/Helm)?

I think this discussion raises the question of what the strategy for the CLI design should be. The current design holds up well when there's only one operator type, but now that we'll have three, each adding new subcommands and/or implementing existing subcommands differently, some issues arise:

  • As a user, how do I know which subcommands make sense for my project?
  • If we support hybrid operators, what should the SDK do when more than one of the operator types is present in the project and a subcommand is executed that has different implementations for different operator types? Maybe we detect that multiple types exist and fail telling the user that --type is required for hybrid operators?

Is it possible to invert the precedence of operator type and subcommand in the CLI so that the operator type drives what subcommands are available and makes the subcommand execution explicit?

A combination of good transition/hybrid documentation and strict checks/plenty of warnings on CLI usage should be enough.

Instead of thinking of all the ways a user can combine operator types, we should be opinionated about how they should be combined, and to what extent combination is allowable, if at all. For example, what parts of an Ansible project can be customized with Go, and what cannot?

If we want to allow hybrid projects, we need to reconsider how OperatorType is used since type checks are currently exclusive. I suggest we implement a bitfield.

ref: #860
With #887 and #897 the operator-sdk now has a migrate command to transition Helm and Ansible operator projects to hybrid projects.

@joelanford and @mhrivnak I don't think we have docs or extra sections in the ansible and helm user-guides showing an example of migrating to a hybrid project. We have the CLI reference but I think an example would be beneficial.

Is there anything else that we would need with regards to hybrid operators before we close out this issue? We can follow up on the docs in separate issues.

On the ansible side (haven't looked closely at the helm side, but might be similar) we have a working migrate command that might be useful. But we have not put any thought yet into specifically how we'll enable users to add their own logic. We don't have a good story yet for: "ok, I have a main.go file. What now?"

What we do have lays the groundwork for improving the experience, and it was important to do in concert with the other refactoring around where and how we're building the ansible/helm operator binary and base image. But it might not yet be useful enough to make noise about, which is why I haven't rushed to write docs about it.

The next step would be to spend a little time designing what we want to expose in main.go that would be useful to a user who wants to add some go code to their ansible or helm operator. How that relates to this github issue depends on how you prefer to track it. I'm happy if you want to close this as done and follow up with a separate design effort, o if yo'd rather keep this open until it's fully baked.

Agreed. We should spend some time on the overall user story around hybrid operators.
I'll keep this issue open until we have something more concrete on that front.

Hi all. I'm an experienced programmer and also experienced with Kubernetes, but operators and golang are both new to me.

I'm in the process of developing a hybrid operator. I'm doing this by generating a Helm operator and then using the migrate command as I want the operator to depend on Helm templates for installs/uninstalls but write my own logic for anything outside of that.

I can keep a running log of things I run/have ran into in this process and then provide some feedback in this ticket if you'd like? I think I'm mostly just running into small things here and there that you probably wouldn't even notice or seem obvious to work around being that you are experienced with developing the operator-sdk and golang itself if that makes sense?

One of the small problems I'm currently running into I'll probably end-up submitting a patch for just so I can keep moving forward.

@devnulled This is great timing actually. Yesterday, I opened #1186 for discussion about how to prepare the Helm and Ansible operator code for a v1.0.0 release of the SDK. The primary discussion point for that will be how we support hybrid operators.

We will likely be unexporting some of our existing packages/types/functions so that we have more flexibility to implement new features without dealing with backwards compatibility guarantees. However, we want to make sure we leave enough of the operator internals exposed to make hybrid operators useful.

I'd definitely be interested in understanding more about your hybrid operator and the issues you've been running into. Definitely don't hesitate to comment here with your issues. Even if they seem minor to you, it's likely others have run into similar things. The more we know about how people are using hybrid operators, the better we'll be able to support them.

I think whatever problems I was having were somehow environment related, or maybe my dep cache being corrupted, not sure. I eventually ended-up clearing out dep, updating the operator-sdk again, regenerating a project, and then converting it to a hybrid one again and it worked just fine. I wanted to make sure I wasn't doing something wrong before commenting here again and glad I did. So the issues I had were self-inflicted.

I can give you some feedback on how I conceptually plan to use them if that's helpful? I know you are looking more for things about the API should be itself, but I'm just not that far along with building one yet at the moment.

Here's my own use case for why I prefer (at least conceptually) to create hybrid operators based on Helm operators:

As I see it, Helm seems to be the defacto package manager for Kubernetes at this point. Along those lines, I'd rather depend on community-developed Helm packages for software when possible and be able to contribute back to them as well (which I've done a couple of times) vs building the YAML by hand in a standard operator. I like that Helm basically handles all of the templating of YAML and has its own tools for debugging charts/YAML.

When you build an operator by hand with operator SDK, you end up with a whole bunch of statically compiled code that essentially represents configuration. In my opinion, that is a pretty hard thing to try and debug, and something that probably changes quite a bit until a given piece of software has matured. Also, when the Kubernetes API is updated, it seems like it would be a big pain point to have to update all that static code. I'm just especially not a fan of having what are essentially configuration templates as generated and staticly compiled code.

Depending on Helm templates allows you to still have the option to deploy a service manually or via scripts if needed rather than having to debug an operator in real time if you have some sort of fire in production. You can also depend on other helm charts as well; An example would be Kafka depending on Zookeeper. I mentioned it before, but you have the facilities of Helm available to debug the charts/YAML. That allows you to leave your operator to focus on functionality that is specific to your service.

So to me, an operator that uses a Helm chart to manage config/versioning/installs of services and is left mostly to bake stuff on top of that to monitor/fix/tune/maintain/ and scale up/down said service is a big win.

I'm in the process of building an operator around automatically managing a database platform to deal with the problems running them in the cloud presents. IE, monitoring for outages or performance problems and taking action.

The main use case I'll be tackling first is to automate the work required to fix pods in statefulsets if a node gets terminated and replaced with another one (the pod won't get rescheduled by k8s because of the way it works with statefulsets). So having to manually kill a pod in a statefulset if a node dies, do something with it's PVC, start up a new pod, any process that needs to run to move data again, deal with mounting local storage, join a cluster again, etc.

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

lilic commented

/remove-lifecycle stale

Still relevant.

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

/lifecycle frozen

@devnulled your comment
#670 (comment)
really resonates, I know its an old post but wondering if you had pursued your hybrid approach and might have some more insight

@huizengaJoe I'm no longer working at the company where I was working on that solution, and it's been some time since I was there. Something I will likely end-up digging back into within the next year or so, I think.

Hybrid operators are being supported currently with SDK. The current hybrid model involves both Go and Helm APIs to be scaffolded together: https://github.com/operator-framework/helm-operator-plugins/tree/main. Closing this issue for now. If there is any follow up needed, we can pick it up later.