influxdata/influxdb-observability

jaeger-influxdb: error in grpc server

lollo25 opened this issue · 5 comments

Hello @jacobmarble ,
I have this error running the image jacobmarble/jaeger-influxdb:latest
{"level":"error","ts":1693322424.6020224,"caller":"jaeger-influxdb/main.go:110","msg":"gRPC interceptor","error":"I/O: SqlState: \u0000\u0000\u0000\u0000\u0000, msg: rpc error: code = Unavailable desc = connection error: desc = \"error reading server preface: http2: frame too large\"","stacktrace":"main.run.func1\n\t/project/jaeger-influxdb/cmd/jaeger-influxdb/main.go:110\ngithub.com/jaegertracing/jaeger/proto-gen/storage_v1._SpanReaderPlugin_GetServices_Handler\n\t/go/pkg/mod/github.com/jaegertracing/jaeger@v1.47.0/proto-gen/storage_v1/storage.pb.go:1487\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/pkg/mod/google.golang.org/grpc@v1.56.1/server.go:1337\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/pkg/mod/google.golang.org/grpc@v1.56.1/server.go:1714\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.56.1/server.go:959"}
I have tried several versions but it doesn't work.
My docker compose is
jaeger-query:
image: jaegertracing/jaeger-query:1.45
stop_grace_period: 10s
ports:
- "16686:16686" # web UI
depends_on:
- jaeger-influxdb
environment:
LOG_LEVEL: info
SPAN_STORAGE_TYPE: grpc-plugin
GRPC_STORAGE_SERVER: jaeger-influxdb:17270
GRPC_STORAGE_CONNECTION_TIMEOUT: 30s
QUERY_HTTP_SERVER_HOST_PORT: :16686
ADMIN_HTTP_HOST_PORT: :16687
ADMIN_HTTP_TLS_ENABLED: false
QUERY_GRPC_TLS_ENABLED: false
QUERY_UI_CONFIG: /jaeger-ui-config.json
volumes:
- ./jaeger-ui-config.json:/jaeger-ui-config.json:ro

jaeger-influxdb:
image: jacobmarble/jaeger-influxdb:latest
ports:
- "17270:17270"
environment:
LOG_LEVEL: debug
LISTEN_ADDR: :17270
INFLUXDB_TIMEOUT: 30s
INFLUXDB_ADDR: influxdb:8086
INFLUXDB_TLS_DISABLE: true
INFLUXDB_TOKEN: influxuser-token
INFLUXDB_ORG: obs
INFLUXDB_BUCKET: telegraf

When does this error occur? When you click "Find Traces"? If so, does it still happen if you set "Limit Results" to something small like 3?

Pulling apart the error message from its JSON encoding,

error message:

I/O: SqlState: �����, msg: rpc error: code = Unavailable desc = connection error: desc = "error reading server preface: http2: frame too large"

stack trace:

main.run.func1
	/project/jaeger-influxdb/cmd/jaeger-influxdb/main.go:110
github.com/jaegertracing/jaeger/proto-gen/storage_v1._SpanReaderPlugin_GetServices_Handler
	/go/pkg/mod/github.com/jaegertracing/jaeger@v1.47.0/proto-gen/storage_v1/storage.pb.go:1487
google.golang.org/grpc.(*Server).processUnaryRPC
	/go/pkg/mod/google.golang.org/grpc@v1.56.1/server.go:1337
google.golang.org/grpc.(*Server).handleStream
	/go/pkg/mod/google.golang.org/grpc@v1.56.1/server.go:1714
google.golang.org/grpc.(*Server).serveStreams.func1.1
	/go/pkg/mod/google.golang.org/grpc@v1.56.1/server.go:959

The looks like the InfluxDB Jaeger plugin gRPC handler fails to execute a SQL query because a "frame is too large". That frame error comes from the Golang HTTP2 library, indicating that a frame header size exceeds a limit whose default value is 24MB.

The ADBC FlightSQL client doesn't expose this limit directly. It does have a config for an adjacent config value, adbc.flight.sql.client_option.with_max_msg_size, which limits the FlightSQL gRPC client "maximum message size in bytes the client can receive" and "maximum message size in bytes the client can send". The default limit is 16MB.

Try this, tell me if the behavior changes:

  • Increase the limit of the max FlightSQL gRPC message size by running jaeger-influxdb with flag --influxdb-query-metadata "adbc.flight.sql.client_option.with_max_msg_size=100000000". This increases the message size limit to 100MB.
  • Search for queries with a lower result limit, as I mentioned at the beginning of this comment.

First of all, sorry for the bad format of my first message.

I have tried to run the system with an empty influxdb, therefore the size of the results is zero and I have also tried in the docker compose with command: ["--influxdb-query-metadata=adbc.flight.sql.client_option.with_max_msg_size=100000000"] but nothing changes. I have the error as soon as I open jaeger query UI, before to click 'Find Traces'.

I attach the screenshot of the UI if it can help
image
and my compose:

version: '3'
services:
  # TODO: try when jaeger-influxdb has been fixed
  jaeger-query:
    image: jaegertracing/jaeger-query:latest
    stop_grace_period: 10s
    ports:
    - "16686:16686" # web UI
    depends_on:
    - jaeger-influxdb
    environment:
      LOG_LEVEL: info
      SPAN_STORAGE_TYPE: grpc-plugin
      GRPC_STORAGE_SERVER: jaeger-influxdb:17270
      GRPC_STORAGE_CONNECTION_TIMEOUT: 30s
      QUERY_HTTP_SERVER_HOST_PORT: :16686
      ADMIN_HTTP_HOST_PORT: :16687
      ADMIN_HTTP_TLS_ENABLED: false
      QUERY_GRPC_TLS_ENABLED: false
      QUERY_UI_CONFIG: /jaeger-ui-config.json
    volumes:
    - ./jaeger-ui-config.json:/jaeger-ui-config.json:ro
  jaeger-influxdb:
    image: jacobmarble/jaeger-influxdb:latest
    command: ["--influxdb-query-metadata=adbc.flight.sql.client_option.with_max_msg_size=50000000"]
    environment:
      LOG_LEVEL: debug
      LISTEN_ADDR: :17270
      INFLUXDB_TIMEOUT: 30s
      INFLUXDB_ADDR: influxdb:8086
      INFLUXDB_TLS_DISABLE: true
      INFLUXDB_TOKEN: influxuser-token
      INFLUXDB_ORG: obs
      INFLUXDB_BUCKET: telegraf 
      # INFLUXDB_QUERY_METADATA: "adbc.flight.sql.client_option.with_max_msg_size=100000000"
    depends_on:
      - influxdb
 influxdb:
    image: influxdb:latest
    restart: always
    environment:
      DOCKER_INFLUXDB_INIT_MODE: setup
      DOCKER_INFLUXDB_INIT_USERNAME: influxuser
      DOCKER_INFLUXDB_INIT_PASSWORD: influxpassword
      DOCKER_INFLUXDB_INIT_ORG: obs
      DOCKER_INFLUXDB_INIT_BUCKET: telegraf
      DOCKER_INFLUXDB_INIT_RETENTION: 1w
      DOCKER_INFLUXDB_INIT_ADMIN_TOKEN: influxuser-token
    ports:
      - '8086:8086'

What version of InfluxDB are you using?

INFLUXDB_VERSION=2.7.1 and INFLUX_CLI_VERSION=2.7.3

Ah. InfluxDB 2.7 doesn't have a SQL query engine, so I'm surprised the gRPC client fails with "frame too large", it should fail earlier than that.

Everything in this repository assumes InfluxDB 3.0. Currently, the best way to use InfluxDB 3.0 is with the Serverless product.

The reasons for this requirement:

  • InfluxDB 3.0 can handle data with much higher cardinality - traces are high-cardinality data
  • InfluxDB 3.0 has a SQL query engine, which this project depends on