odpi/egeria-docs

Not obvious in the docs how to use local.server.id kafka configuration parameter

davidradl opened this issue · 5 comments

I see in https://egeria-project.org/connectors/resource/kafka-open-metadata-topic-connector/?h=local.server.id#configuration

there is a line "local.server.id": "{{consumerId}}", which it says a unique consumer identifier in {{consumerId}

I see an example of it https://egeria-project.org/features/cohort-operation/overview/?h=local.server.id#configuration-commands
and it looks like a uuid.

I see https://egeria-project.org/connectors/integration/open-lineage-event-receiver-integration-connector/?h=virtualconnection#configuration where it says "local.server.id": "{{localServerId}}", and the integration daemon's server id in {{localServerId}}

In the last case it implies that I should be supplying this value as the "the integration daemon's server id".

This field looks like a Kafka configuration property. I cannot see it documented https://kafka.apache.org/documentation/ and there are no relevant hits in google that I can find.

I am looking for guidance on how to set this configuration parameter and what it means if I don't.

The last change on the code in this area was made in odpi/egeria#6639 from the issue raised at #484.

The fix was in 3.11, and there's an entry in the release notes https://egeria-project.org/release-notes/3-11/?h=local.server.id explaining:

  • If no consumer group id is set then the current behaviour continues -- we use the server id as default, both for the default event bus config, as well as during the setup of embedded connections (default)
  • If local.server.id is specified in the default event bus config, this is used for all additional connections (intent of old behaviour)
  • If group.id is specified in the consumer properties, this is used instead (higher priority) (more standard approach)

However, I missed out updating the core docs. In the issue I see what happened - I transferred the issue over to docs, but it got auto-closed when I merged the code PR as I'd used 'Fixes #xxxx'

So there remains a task to update the docs which we can do here.

this is all about setting the consumer group id for kafka, and how we set that value based on other values we have in egeria. Beyond and above the use in kafka we may want to elaborate on that value more generally. More on how it's used is at https://www.oreilly.com/library/view/kafka-the-definitive/9781491936153/ch04.html -- and the parm eventually sent to kafka is 'group.id'. I'm not sure exactly what is permitted, but its effectively opaque and just used for matching

Does the above explanation make sense at least for kafka?

If no group.id is specified then it null pointers in the latest code. odpi/egeria#7135

I am unsure what value I should put here. The Kafka topic name or id, or the Egeria server name seem good candidates. I will only have 1 consumer group. I am thinking of using the Egeria server name of the integration connector.

If not set, the value should get set automatically (based on the hosting server). So in many cases it's opaque.

But we do need to expand the explanation of a) how it gets set b) how it gets used

the NPE is if the local server id is null. I think this only happens on manual removal, since the code that writes the configuration for the connector uses a default if the user does not provide one. I suspect the code prior to the 3.11 change would also npe, just at a different location, since there would still be no consumer group to set. Or perhaps it didn't get one, which would lead to potentially bad behaviour (worse?)