DataDog/dd-trace-rb

`IOError` during tracing when running `PG::Connection#exec` in a thread

Opened this issue · 1 comments

tdeo commented

Current behaviour

Hello,

We're getting the following IOError when using the parallel gem in the following way:

IOError: stream closed in another thread (IOError)
  from ddtrace (1.23.3) lib/datadog/tracing/contrib/pg/instrumentation.rb:27:in `exec'
  from ddtrace (1.23.3) lib/datadog/tracing/contrib/pg/instrumentation.rb:27:in `block in exec'
  from ddtrace (1.23.3) lib/datadog/tracing/contrib/pg/instrumentation.rb:142:in `block in trace'
  from ddtrace (1.23.3) lib/datadog/tracing/trace_operation.rb:198:in `block in measure'
  from ddtrace (1.23.3) lib/datadog/tracing/span_operation.rb:150:in `measure'
  from ddtrace (1.23.3) lib/datadog/tracing/trace_operation.rb:198:in `measure'
  from ddtrace (1.23.3) lib/datadog/tracing/tracer.rb:385:in `start_span'
  from ddtrace (1.23.3) lib/datadog/tracing/tracer.rb:159:in `block in trace'
  from ddtrace (1.23.3) lib/datadog/tracing/context.rb:45:in `activate!'
  from ddtrace (1.23.3) lib/datadog/tracing/tracer.rb:158:in `trace'
  from ddtrace (1.23.3) lib/datadog/tracing.rb:18:in `trace'
  from ddtrace (1.23.3) lib/datadog/tracing/contrib/pg/instrumentation.rb:105:in `trace'
  from ddtrace (1.23.3) lib/datadog/tracing/contrib/pg/instrumentation.rb:26:in `exec'
  from packs/devtools/db/app/services/devtools/fork_database_helper.rb:217:in `block in analyze'

My usage:

# packs/devtools/db/app/services/devtools/fork_database_helper.rb

def analyze(tables_to_analyze)
  Parallel.each(tables_to_analyze, in_threads: 4) do |table|
    conn = PG::Connection.new(connection_string)
    conn.exec("ANALYZE #{table}") # This is line 217
  ensure
    conn.finish if conn && !conn.finished?
  end
end

Expected behaviour

No error is thrown

Steps to reproduce

Running the code above

Environment

  • datadog version: ddtrace (1.23.3), libdatadog (7.0.0.1.0)
  • Configuration block (Datadog.configure ...):
  Datadog.configure do |c|
    c.diagnostics.debug if ENV['DD_DEBUG'].present?
    c.tracing.partial_flush.enabled = %w[t true 1].include?(ENV['DD_PARTIAL_FLUSH'])

    c.profiling.enabled = true
    c.profiling.advanced.force_enable_gc_profiling = true

    c.env = Rails.env.to_s
    c.service = Rails.application.class.module_parent.name.downcase
    c.version = ENV['SOURCE_VERSION']

    c.tracing.instrument :rails, request_queuing: :exclude_request
    c.tracing.instrument :sidekiq, tag_args: true, service_name: 'sidekiq'
    if ENV['REDIS_URL'].present?
      c.tracing.instrument :redis, describes: { url: ENV['REDIS_URL'] }, service_name: 'redis'
    end
    if ENV['REDIS_CACHE_URL'].present?
      c.tracing.instrument :redis, describes: { url: ENV['REDIS_CACHE_URL'] }, service_name: 'redis-cache'
    end
    c.tracing.instrument :http, split_by_domain: true
    c.tracing.instrument :httpclient, split_by_domain: true
    c.tracing.instrument :stripe
    c.tracing.instrument :aws
    c.tracing.instrument :rake
    c.tracing.instrument :pg, comment_propagation: 'full'

    # Track read-replica separately from primary
    c.tracing.instrument :active_record, describes: :primary_replica, service_name: 'jeancaisse-postgres-replica'

    # Enable DataDog runtime metrics (currently in beta phase)
    c.runtime_metrics.enabled = true

    # Add tags to all traces (documentation: https://datadoghq.dev/dd-trace-rb/#environment-and-tags)
    # The repository URL is hardcoded because it's not expected to change frequently.
    # If it does change, it will break the source code integration for DataDog, which can be easily fixed.
    c.tags = {
      'git.repository_url' => 'https://github.com/pennylane-hq/jeancaisse',
      'git.commit.sha' => ENV['SOURCE_VERSION'],
    }
  end
  • Ruby version: ruby 3.3.4 (2024-07-09 revision be1089c8ec) [aarch64-linux]
  • Operating system:: aarch64 GNU/Linux running in a Docker image originating from ruby:3.3.4-slim-bookworm
  • Relevant library versions: parallel (1.26.3)

Hey @tdeo! Thanks for the report, and for the patience with our slow answer >_>

Yesterday I set aside some time to try to reproduce this and... I wasn't successful.

It sounds like you may be able to trigger this this on your side. If you're still up to helping us debug and fix this (if not -- that's ok! We did take a bunch of time to get back to you), can I ask you to try:

  • If you can still trigger this issue with c.profiling.enabled = false
  • If you can still trigger this issue without c.tracing.instrument :pg, comment_propagation: 'full'

While we don't know of issues with either of those two currently, I think those would be the most likely culprits for pinning this down.