rmosolgo/graphql-ruby

Multiple tracers in one schema are ignored

brendo opened this issue ยท 15 comments

Describe the bug

This bug almost appears to be a duplicate of #4542 but this time around it's with the ActiveSupport and OpenTelemetry tracers.

module Graph
  class Schema < GraphQL::Schema
    ...
    
    trace_with GraphQL::Tracing::ActiveSupportNotificationsTrace  end
end

This schema, with the default OpenTelemetry setup, sees the OpenTelemetry tracer completely ignored as spans are not included in traces yet metrics are emitted.

If I mix and match the old tracer API, I'm able to get both working:

module Graph
  class Schema < GraphQL::Schema
    ...

    tracer GraphQL::Tracing::ActiveSupportNotificationsTracing
  end
end

Versions

graphql: 2.3.10
opentelemetry-instrumentation-graphql: 0.28.2
rails: 7.1.3.4

Steps to reproduce

The test case demonstrates the opposite to the above due to how our app is built, but I think it's still sufficient to show the bug.

# frozen_string_literal: true

require "bundler/inline"

gemfile do
  gem "graphql", "2.3.10"
  gem 'opentelemetry-api'
  gem 'opentelemetry-common'
  gem 'opentelemetry-instrumentation-graphql'
  gem 'opentelemetry-sdk'
  gem 'rails'
end

module MultipleTracer
  class Thing < GraphQL::Schema::Object
    field :str, String

    def str; "blah"; end
  end

  class Named < GraphQL::Schema::Union
    possible_types Thing
    def self.resolve_type(obj, ctx)
      Thing
    end
  end

  class Query < GraphQL::Schema::Object
    include GraphQL::Types::Relay::HasNodeField

    field :int, Integer, null: false

    def int
      1
    end

    field :thing, Thing
    def thing; :thing; end

    field :named, Named, resolver_method: :thing
  end

  class TestSchema < GraphQL::Schema
    query(Query)

    trace_with(GraphQL::Tracing::ActiveSupportNotificationsTrace)
  end
end

# Comment from here.... 
ENV['OTEL_TRACES_EXPORTER'] = 'console'
OpenTelemetry::SDK.configure do |c|
  c.use_all('OpenTelemetry::Instrumentation::GraphQL' => { schemas: [MultipleTracer::TestSchema] })
end
# ...to here to demonstrate bug

ActiveSupport::Notifications.monotonic_subscribe("execute_query.graphql") do |name, start, finish, _message_id, data|
  puts "#{name} handled"
  puts "#{data.inspect}"
end

query = "{ int thing { str } named { ... on Thing { str } } }"
MultipleTracer::TestSchema.execute(query)

The console will be messy, but if you search for execute_query.graphql handled it will be absent. If these lines are commented out, and the command is re-run, you'll see the ActiveSupport handled messages.

ENV['OTEL_TRACES_EXPORTER'] = 'console'
OpenTelemetry::SDK.configure do |c|
  c.use_all('OpenTelemetry::Instrumentation::GraphQL' => { schemas: [MultipleTracer::TestSchema] })
end
$ ruby inline.rb
execute_query.graphql handled
{:query=>query ...}

Expected behavior

Both tracers should be able to emit information about the GraphQL operations. So with the above example there should be a wall of trace info, as well as the two littles lines added by the ActiveSupport subscriber.

Actual behavior

The span information is emitted, but nothing from the ActiveSupport subscriber.

Hey, thanks for the detailed report and sorry for the trouble! Yes, these should definitely work well together.

I think the problem is that these methods in the OpenTelementry trace module are missing super calls (which would call other trace modules): https://github.com/open-telemetry/opentelemetry-ruby-contrib/blob/35ddf66f9c42a228298ef1c9f333abb65d6ddc45/instrumentation/graphql/lib/opentelemetry/instrumentation/graphql/tracers/graphql_trace.rb#L91-L110

I noticed that other methods there did include super calls, and that your test case was on a top-level instrumentation hook, which made me suspect this. I can take a look soon at adding those, or feel free to add them yourself if you get time.

Nope, it's more complicated than that ... I've got a failing test over at #5045, I have to massage the inheritance code to make it work ๐Ÿ˜…

Well, maybe it's both. Looking at your example, I realized its different than the one I encountered in the OpenTelementry test suite. My initial suggestion, adding super calls, actually did make your script pass for me. So I opened a PR with those changes: open-telemetry/opentelemetry-ruby-contrib#1090

However, the tests aren't passing because of the problem identified in #5045 ... Nevermind, I think the weirdness is in the test suite there, because of how schema classes are reset and re-instrumented for each test.

Thanks so much for the speedy response! I'll try to test that branch of open-telemetry to confirm it's working today :)

I retested today with the newly released opentelemetry-instrumentation-graphql (0.28.4) and unfortunately I'm still able to reproduce the bug in my application, but not the test case above. Will have to dig and see what's the difference ๐Ÿคž

Do you pass schemas: [ ... ] in your app's installation config? If not, I suspect wonkiness there... (Without schemas: [ ... ], the OpenTelemetry stuff is added to GraphQL::Schema, but that might not be handled well through inheritance, which I stumbed into in #5045)

Do you pass schemas: [ ... ] in your app's installation config?

We don't actually. Nice pickup. Let me find out how I can set that in our service (the configuration of OpenTelemetry is done via another gem by default).

๐Ÿ‘‹ did you find any clues about this? I'd love to get it working properly...

Hey! Sorry, I thought I posted back. We experimented with setting the schemas via the OTEL_RUBY_INSTRUMENTATION_GRAPHQL_CONFIG_OPTS variable but this didn't quite work because the tracer installers expect the option to be an instance of the schema, not a string:

https://github.com/open-telemetry/opentelemetry-ruby-contrib/blob/f999e702430a55bc9f0fdf7b4748dc2f9b6341b8/instrumentation/graphql/lib/opentelemetry/instrumentation/graphql/instrumentation.rb#L92-L96

I attempted some hacking to use const_get/constantize but this failed with some autoloading errors (schema hadn't been discovered yet in the boot process). I haven't had a chance to get back into looking at alternative options yet and at the moment are "making it work" by mixing the deprecated and new trace APIs.

Ok, thanks for the update. I tinkered with the original script above to try to get it to fail, but I couldn't...

I also reopened #5045 in case that's the cause, and I pushed a fix for the issue spotted there. Could you try your app with that patch?

Oops, that issue autoclosed this one. It might fix it -- if you get a chance to try out master now that it's been merged, let me know what you find.

Oops, that issue autoclosed this one. It might fix it -- if you get a chance to try out master now that it's been merged, let me know what you find.

Sorry, a retest confirms the bug. Switching from tracer GraphQL::Tracing::ActiveSupportNotificationsTracing, silence_deprecation_warning: true to trace_with GraphQL::Tracing::ActiveSupportNotificationsTrace causes the trace to lose the Otel spans.

I don't know if this helps at all, but here's a screenshot from a debug which shows both modules appear to be added:

Screenshot 2024-08-20 at 11 37 06โ€ฏAM

I'm missing a lot of context, but is the problem that both modules (OpenTelemetry::Instrumentation::GraphQL::Tracers::GraphQLTrace and GraphQL::Tracing::ActiveSupportNotificationsTrace) define the same method names and because the AS:N is added last, it overrides the OTel module?

eg. analyze_query:

Ahhh yes! I hadn't looked at the AS::N trace yet, but I guess that's the issue. It doesn't call super, either. I was able to mix up the script to demonstrate that bug:

# frozen_string_literal: true

require "bundler/inline"

gemfile(true) do
  gem "graphql", "2.3.10" # path: "./"
  gem 'opentelemetry-api'
  gem 'opentelemetry-common'
  gem 'opentelemetry-instrumentation-graphql'
  gem 'opentelemetry-sdk'
  gem 'rails'
end

ENV['OTEL_TRACES_EXPORTER'] = 'console'
OpenTelemetry::SDK.configure do |c|
  c.use('OpenTelemetry::Instrumentation::GraphQL')
end

module MultipleTracer
  class Thing < GraphQL::Schema::Object
    field :str, String

    def str; "blah"; end
  end

  class Named < GraphQL::Schema::Union
    possible_types Thing
    def self.resolve_type(obj, ctx)
      Thing
    end
  end

  class Query < GraphQL::Schema::Object
    include GraphQL::Types::Relay::HasNodeField

    field :int, Integer, null: false

    def int
      1
    end

    field :thing, Thing
    def thing; :thing; end

    field :named, Named, resolver_method: :thing
  end

  class TestSchema < GraphQL::Schema
    query(Query)
    trace_with(GraphQL::Tracing::ActiveSupportNotificationsTrace)
  end
end


ActiveSupport::Notifications.monotonic_subscribe("execute_query.graphql") do |name, start, finish, _message_id, data|
  puts "#{name} handled"
  puts "#{data.inspect}"
  # raise "Horray, it worked"
end

query = "{ int thing { str } named { ... on Thing { str } } }"
MultipleTracer::TestSchema.execute(query)

I'll work up a patch for it.

If you get a chance to try master again, please let me know what you find ๐Ÿ˜… Thanks again for your help tracking this down.

That did it, thank you! Really appreciate the back and forth and patience to get this one solved :)