confluentinc/schema-registry

Reasons to set `host.name`?

Closed this issue · 7 comments

http://confluent.io/docs/current/schema-registry/docs/config.html

host.name -- The host name advertised in Zookeeper

        Type: string
        Default: thor
        Importance: low

Given that the importance of this setting is low, what would be a reason or use case to change the default value (and why the default of thor)? Which component would be interested in this value in ZK? (e.g. I looked at the Camus integration with schema-registry but it has its own schema.registry.url setting).

I found this snippet (a bit hidden) in http://confluent.io/docs/current/schema-registry/docs/deployment.html#schemaregistry-mirroring"

host.name Hostname to publish to ZooKeeper for clients to use. In IaaS environments, this may need to be different from the interface to which the broker binds. If this is not set, it will use the value returned from java.net.InetAddress.getCanonicalHostName().

  • Type: string
  • Default: host.name
  • Importance: high

I suppose the description should be expanded to give more context? The basic reason is that the hostname you get by trying to look it up dynamically isn't always the one you want to use, and the process needs to . We've set the importance low because in most/many cases, you will not need to adjust it, although given that it can be an issue for IaaS environments, it's possible we should bump it up to medium.

As for the confusing default -- the default value is actually dynamic, using InetAddress.getLocalHost().getCanonicalHostName(). I happened to be the one that generated it, on a host with hostname thor because that is a short, easy to type hostname that I find convenient when ssh'ing into it :) I would just edit it to say localhost, but that a) isn't accurate and b) would get overwritten the next time we generate the docs. We may need another way in our config classes to specify a special case for defaults where we can just enter a textual description.

Ah, I see. :-)

I think this means that host.name (here) is similar to Kafka's host.name parameter, the latter of which is described as:

host.name: Hostname of broker. If this is set, it will only bind to this address. If this is not set [default], it will bind to all interfaces, and publish one to ZK.
http://kafka.apache.org/documentation.html

Kafka also has advertised.host.name:

advertised.host.name: If this is set [default: it is not set] this is the hostname that will be given out to producers, consumers, and other brokers to connect to.

Since Schema Registry Mirroring says about host.name "Hostname to publish to ZooKeeper for clients to use", does that mean the current host.name option in schema-registry is a combination of Kafka's host.name and advertised.host.name?

What about something like this for HOST_DOC:

Hostname of the schema registry. If this is set, the schema registry process will only bind to this address. If this is not set (default), it will dynamically determine the hostname to bind to via java.net.InetAddress.getLocalHost().getCanonicalHostName(). One scenario where you may need to explicitly set this parameter is in IaaS environments. The value of host.name will also be advertised in ZooKeeper, where it is used for e.g. leader election of schema registry instances.

FWIW the host.name section in Schema Registry Mirroring would possibly need to be in sync with HOST_DOC (the current snippet in this page is IMHO a bit misleading: it talks about "to which the broker binds" although schema-registry does not run any broker; the default value of host.name is listed as host.name).

Hmm, this is unfortunately confusing. Right now, all rest-utils gives you control over for the listening socket is the port. MetricsSelectChannelConnector, doesn't set the hostname. So host.name in schema-regstry (and kafka-rest) is more like advertised.host.name. The docs in kafka-rest have this:

The host name used to generate absolute URLs in responses. If empty, the default canonical hostname is used

which is application-specific, but perhaps a bit clearer about the fact that the intent is to control the hostname used anywhere we need to provide some other host a hostname they will be able to reach us at.

So right now, the second half of your suggestion would make sense. I think if we wanted to have the first half, we'd have to patch rest-utils to provide that functionality, then introduce the two different options just as Kafka has. That said, I'm not sure how frequently people actually use Kafka's host.name setting and whether it makes sense to support the same in schema-registry and kafka-rest.

That said, I'm not sure how frequently people actually use Kafka's host.name setting and whether it makes sense to support the same in schema-registry and kafka-rest.

I can only speak for us but we do use Kafka's host.name for brokers, which are multi-NIC machines. Our reason for setting host.name is that not setting the parameter (= default) will cause Kafka to listen on 0.0.0.0:<port> (which is ok) but Kafka will only register one of the broker's many hostnames/IPs with ZK, and the one picked may be the wrong one (this is the bad part). We also used advertised.host.name for a short time in the past, but the reason has escaped my memory.

If I were to summarize I think there are three things that users may or may not need to control:

  1. The hostname/IP on which the process listens.
  2. The hostname that gets registered with ZK (AFAIK in Kafka this is coupled with (1) with no separate control).
  3. The hostname used to generate e.g. absolute URLs in responses (as is the case in kafka-rest).

I think the reasons, if any, for decoupling some of the three in schema-registry would be similar to the reasons behind decoupling them for a Kafka broker (via host.name and advertised.host.name).

Not sure whether this feedback is of any help!

Hmm, this is unfortunately confusing. Right now, all rest-utils gives you control over for the listening socket is the port. MetricsSelectChannelConnector, doesn't set the hostname. So host.name in schema-regstry (and kafka-rest) is more like advertised.host.name.

Ah, so you mean schema-registry will always listen on all interfaces (0.0.0.0) -- with only the port being configurable -- and the only effect of host.name is to control the hostname -- out of possibly many for a machine -- will be advertised in ZK and will be handed out to clients in absolute URLs?

Right, that's the current state. If we can figure out a path to making all of those things configurable separately, that would be great.