Reasons to set `host.name`?
Closed this issue · 7 comments
http://confluent.io/docs/current/schema-registry/docs/config.html
host.name -- The host name advertised in Zookeeper
Type: string
Default: thor
Importance: low
Given that the importance of this setting is low, what would be a reason or use case to change the default value (and why the default of thor
)? Which component would be interested in this value in ZK? (e.g. I looked at the Camus integration with schema-registry but it has its own schema.registry.url
setting).
I found this snippet (a bit hidden) in http://confluent.io/docs/current/schema-registry/docs/deployment.html#schemaregistry-mirroring"
host.name
Hostname to publish to ZooKeeper for clients to use. In IaaS environments, this may need to be different from the interface to which the broker binds. If this is not set, it will use the value returned fromjava.net.InetAddress.getCanonicalHostName()
.
- Type: string
- Default: host.name
- Importance: high
I suppose the description should be expanded to give more context? The basic reason is that the hostname you get by trying to look it up dynamically isn't always the one you want to use, and the process needs to . We've set the importance low because in most/many cases, you will not need to adjust it, although given that it can be an issue for IaaS environments, it's possible we should bump it up to medium.
As for the confusing default -- the default value is actually dynamic, using InetAddress.getLocalHost().getCanonicalHostName()
. I happened to be the one that generated it, on a host with hostname thor
because that is a short, easy to type hostname that I find convenient when ssh'ing into it :) I would just edit it to say localhost
, but that a) isn't accurate and b) would get overwritten the next time we generate the docs. We may need another way in our config classes to specify a special case for defaults where we can just enter a textual description.
Ah, I see. :-)
I think this means that host.name
(here) is similar to Kafka's host.name
parameter, the latter of which is described as:
host.name
: Hostname of broker. If this is set, it will only bind to this address. If this is not set [default], it will bind to all interfaces, and publish one to ZK.
http://kafka.apache.org/documentation.html
Kafka also has advertised.host.name
:
advertised.host.name
: If this is set [default: it is not set] this is the hostname that will be given out to producers, consumers, and other brokers to connect to.
Since Schema Registry Mirroring says about host.name
"Hostname to publish to ZooKeeper for clients to use", does that mean the current host.name
option in schema-registry is a combination of Kafka's host.name
and advertised.host.name
?
What about something like this for HOST_DOC
:
Hostname of the schema registry. If this is set, the schema registry process will only bind to this address. If this is not set (default), it will dynamically determine the hostname to bind to via
java.net.InetAddress.getLocalHost().getCanonicalHostName()
. One scenario where you may need to explicitly set this parameter is in IaaS environments. The value ofhost.name
will also be advertised in ZooKeeper, where it is used for e.g. leader election of schema registry instances.
FWIW the host.name
section in Schema Registry Mirroring would possibly need to be in sync with HOST_DOC
(the current snippet in this page is IMHO a bit misleading: it talks about "to which the broker binds" although schema-registry does not run any broker; the default value of host.name
is listed as host.name
).
Hmm, this is unfortunately confusing. Right now, all rest-utils gives you control over for the listening socket is the port. MetricsSelectChannelConnector, doesn't set the hostname. So host.name
in schema-regstry (and kafka-rest) is more like advertised.host.name
. The docs in kafka-rest have this:
The host name used to generate absolute URLs in responses. If empty, the default canonical hostname is used
which is application-specific, but perhaps a bit clearer about the fact that the intent is to control the hostname used anywhere we need to provide some other host a hostname they will be able to reach us at.
So right now, the second half of your suggestion would make sense. I think if we wanted to have the first half, we'd have to patch rest-utils to provide that functionality, then introduce the two different options just as Kafka has. That said, I'm not sure how frequently people actually use Kafka's host.name
setting and whether it makes sense to support the same in schema-registry and kafka-rest.
That said, I'm not sure how frequently people actually use Kafka's host.name setting and whether it makes sense to support the same in schema-registry and kafka-rest.
I can only speak for us but we do use Kafka's host.name
for brokers, which are multi-NIC machines. Our reason for setting host.name
is that not setting the parameter (= default) will cause Kafka to listen on 0.0.0.0:<port>
(which is ok) but Kafka will only register one of the broker's many hostnames/IPs with ZK, and the one picked may be the wrong one (this is the bad part). We also used advertised.host.name
for a short time in the past, but the reason has escaped my memory.
If I were to summarize I think there are three things that users may or may not need to control:
- The hostname/IP on which the process listens.
- The hostname that gets registered with ZK (AFAIK in Kafka this is coupled with (1) with no separate control).
- The hostname used to generate e.g. absolute URLs in responses (as is the case in kafka-rest).
I think the reasons, if any, for decoupling some of the three in schema-registry would be similar to the reasons behind decoupling them for a Kafka broker (via host.name
and advertised.host.name
).
Not sure whether this feedback is of any help!
Hmm, this is unfortunately confusing. Right now, all rest-utils gives you control over for the listening socket is the port. MetricsSelectChannelConnector, doesn't set the hostname. So host.name in schema-regstry (and kafka-rest) is more like
advertised.host.name
.
Ah, so you mean schema-registry will always listen on all interfaces (0.0.0.0
) -- with only the port being configurable -- and the only effect of host.name
is to control the hostname -- out of possibly many for a machine -- will be advertised in ZK and will be handed out to clients in absolute URLs?
Right, that's the current state. If we can figure out a path to making all of those things configurable separately, that would be great.