Make User-Agent to be filled correctly
chernser opened this issue · 4 comments
Describe the bug
ClickHouse Java Client is used by many connectors and products providing integration with ClickHouse. When such client within application is contacting a server such client send User-Agent
header where very important information is packed. Part of this information is used for analytics and making decisions.
Current implementation is may be not strict enough in this part because we do see strings like:
metabase/1.3.3/unknown (Linux/6.1.0-12-amd64; OpenJDK 64-Bit Server VM/Temurin-11.0.22+7; HttpURLConnection; rv:unknown)
And there is no 100% guarantee that some integration used expected JDBC driver. What is more difficult is to match integration version with expected java client version and all its core components (for example, was Apache or HttpURLConnection used).
Expected behaviour
Whatever integration is used, what ever User-Agent
header is passed there should be information from java client.
Look https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/User-Agent for reference.
According to the standard https://www.rfc-editor.org/rfc/rfc9110#field.user-agent format of the header is:
User-Agent = product *( RWS ( product / comment ) )
where product
:
product = token ["/" product-version]
product-version = token
Today's value like this:
ClickHouse-JavaClient/unknown (Windows 11/10.0; OpenJDK 64-Bit Server VM/JBR-17.0.9+8-1166.2-nomod; Apache-HttpClient/5.2.1; rv:unknown)
This is long, but fine. However we need to be sure that:
- If someone sets a custom
User-Agent
header value then java client identifier still added to theUser-Agent
header - If someone sets a client name then java client identifier still added to the
User-Agent
header
JavaClient should be always present. The version unknown
has no value for us.
Code references:
com.clickhouse.client.http.ClickHouseHttpConnection#getUserAgent
- it overrides User-Agent if com.clickhouse.client.config.ClickHouseClientOption#CLIENT_NAME
is set. This is incorrect. There is an com.clickhouse.client.http.config.ClickHouseHttpOption#SEND_HTTP_CLIENT_ID
to set Referer
header that is correct to use to identify particular client. Again, User-Agent
tell class of the software used for communication but doesn't tell exact application instance.
com.clickhouse.client.http.ClickHouseHttpConnection#createDefaultHeaders
- is another place where client name may override the User-Agent
header. This is not correct.
com.clickhouse.client.config.ClickHouseClientOption#buildUserAgent
- has another twist because it takes PRODUCT_NAME from Envs (PRODUCT_NAME.getEffectiveDefaultValue()
).
Optional tasks:
- Remove
rv:
part because it duplicates product version from the main part. - Remove
os
part because it looks useless for Java people - Shorten a java info because only version make sense for statistics.
Look https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/User-Agent for reference.
Does the language client spec do the trick?
Updated format: MyCustomClient ClickHouse-JavaClient/0.6.3 (OpenJDK 64-Bit Server VM/Corretto-17.0.6.10.1; HttpURLConnection)