latchset/kdcproxy

does kdcproxy perform layer 7 health check against kdc bankend?

t57root opened this issue · 6 comments

As the title says, if the kdcproxy perform health check, then it can act as a HA/fault tolerance mechanism at application layer, which I think is what kerberos currently leak of (I think kerberos only has a replica mechanism, but leak of loadblance/HA capability)

Careful - Kerberos the protocol != krb5, Heimdal, or whatever other Kerberos implementation. For instance, krb5 implements failover if multiple KDCs are defined. And since DNS resolution can be used for KDC lookups, it's possible to balance at that layer. It's also possible to provide a locator plugin (like sssd's) to krb5 for controlling resolution however you want.

Beyond that, I think your question is misguided, or perhaps I don't understand. Kerberos implementations are not responsible for the network health, and do not even provide a "layer" in the OSI model (they provide authentication and encryption services to other protocols, but do not define a wire transport). A functioning transport is required for Kerberos operation.

Also be aware that the OSI model doesn't map onto what we actually do in the real world. (For instance, TCP occupies layers 4, 5, and 6, while applications can bypass 4 and 5 but not 6 by using UDP, etc., etc.)

@frozencemetery Thanks for the reply. As far as I read the krb5 doc, krb5 seems to only provide a master-slave architecture. If the master instance is down, then the operator have to manually promote a slave and change DNS record to the newly promoted instance. A LVS server could only detect layer 4 fault, not layer 7(e.g.: a KDC backend couldn't reach LDAP database) . In HTTP protocol, nginx is a layer 7 load-balancor which could detect layer 7 malfunction.

The "locator plugin" you mentioned (I think it's referencing Server location interface) seems a good way to implement client-side load-balancing or fault tolerance. Thanks very much and I think I'm going this way.

By "layer 7 health check", I want to express a health check which can understand application protocol, and can not only make sure the target host/port are good, but also can make sure the remote program/application protocol is fulling working. Like the expression here.

Wow, they're really trying to sell a product.

I don't think kdcproxy is the right place for KDC health checks - that should go into some monitoring solution (nagios is what I've used, but it's been a while). Health checks could include: checking that the KDC responds on 88 and 464, and checking that kinit with a test principal works. How you tell whether it's "working" depends on what you care about, and also what you expect to go wrong (as in any other service).

(Closing since I think your question has been answered.)

@frozencemetery Thanks for the explanation. I have another kerberos-related question but has nothing to do with kdcproxy, not sure if here's the proper place to ask for advice. I'm wondering if there's any way to do password rotation automatically for service principals? Assuming I already have agent installed on every machine in the kerberos realm, which can help to do the rotation work.

Because service principals can't be restarted frequently, and the long-term passwords keep unchanged for a very very long time.

A complex random long-term password could reduce the risk of cracking, but there's a situation that a hacker gain access to the password accidentally for once(from sourcecode manage system), then he could access the system forever. A regular rotation mechanism could defend this scenario.

Thanks in advance!

You probably want the "kerberos" mailing list: http://web.mit.edu/kerberos/mail-lists.html Nonetheless, I have a standard reply to all rotation questions, which is this:

Keytab rotation is ugly. I recommend not doing it if you can avoid it largely because one of two things will happen:

  • All clients who have credentials against the old keytab will see messy, inexplicable authentication failures.
  • If you try to get around that by keeping the old entry in the keytab (i.e., multiple kvnos present for the principal), you haven't actually accomplished anything.

So there's a serious trade-off between any security benefit that might accrue and the burden of cleaning up afterward.

Since compromising the keytab requires compromising the service itself, I don't see a vector in which rotating them is helpful (unless you're worried about the strength of the underlying cryptography, and if you're worried about AES-256, I'm not sure there's much anyone can do to help).

@frozencemetery I see. Thanks for the kind help!