jjatria/perl-opentelemetry-exporter-otlp

OTLP exporter doesn't exit (or send traces?!)

Closed this issue · 9 comments

abh commented

follow-up to jjatria/perl-opentelemetry-sdk#5 :-)

#!/usr/bin/env perl
use v5.38.0;
use strict;
use warnings;
use Log::Any::Adapter ('Stderr');
use OpenTelemetry::SDK;

my $logger   = OpenTelemetry->logger;
my $provider = OpenTelemetry->tracer_provider;
my $tracer   = $provider->tracer(name => 'web', version => '1.0');

$tracer->in_span(
    outer => sub ($span, $context) {
        $logger->debug("outer");
        $tracer->in_span(
            inner => sub ($span, $context) {
                $logger->debug("inner");
            }
        );
    }
);

warn "done";
exit;

Running this with export OTEL_TRACES_EXPORTER=console the traces are output to the console and the program exits.

Running with either no OTEL variables or with the ones specified below I don't get any connections to localhost port 4318 and the program doesn't seem to exit.

% env | grep OTEL
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
OTEL_TRACES_EXPORTER=otlp
OTEL_EXPORTER_OTLP_PROTOCOL=http/json
OTEL_BSP_EXPORT_TIMEOUT=4000
OTEL_BSP_SCHEDULE_DELAY=1000
OTEL_SERVICE_NAME=perl-test

the program doesn't exit. I also don't see any connections to port 4318 happening (looking with tcpdump or listening with nc instead of otelcol).

% perl bin/tracing
outer
inner
done at bin/tracing line 23.
on_read_ready loop at /Users/ask/perl5/perlbrew/perls/perl-5.38.0/lib/site_perl/5.38.0/IO/Async/Stream.pm line 996 during global destruction.
on_read_ready loop at /Users/ask/perl5/perlbrew/perls/perl-5.38.0/lib/site_perl/5.38.0/IO/Async/Stream.pm line 996 during global destruction.
on_read_ready loop at /Users/ask/perl5/perlbrew/perls/perl-5.38.0/lib/site_perl/5.38.0/IO/Async/Stream.pm line 996 during global destruction.
on_read_ready loop at /Users/ask/perl5/perlbrew/perls/perl-5.38.0/lib/site_perl/5.38.0/IO/Async/Stream.pm line 996 during global destruction.

If my test program isn't correct, then a simpler example program in the documentation would be helpful. :-)

abh commented

I tried adding some debug in OTLP.pm and best I can tell the export method doesn't get called.

debug.patch

Adding the warnings (or $logger->debug() calls) made the output change to repeating some other warning infinitely (or at least for a long time):

on_read_ready loop at /Users/ask/perl5/perlbrew/perls/perl-5.38.0/lib/site_perl/5.38.0/IO/Async/Stream.pm line 996 during global destruction.
Use of uninitialized value in bitwise and (&) at /Users/ask/perl5/perlbrew/perls/perl-5.38.0/lib/site_perl/5.38.0/IO/Async/Stream.pm line 998 during global destruction.
Use of uninitialized value in bitwise and (&) at /Users/ask/perl5/perlbrew/perls/perl-5.38.0/lib/site_perl/5.38.0/IO/Async/Stream.pm line 999 during global destruction.
on_read_ready loop at /Users/ask/perl5/perlbrew/perls/perl-5.38.0/lib/site_perl/5.38.0/IO/Async/Stream.pm line 996 during global destruction.
Use of uninitialized value in bitwise and (&) at /Users/ask/perl5/perlbrew/perls/perl-5.38.0/lib/site_perl/5.38.0/IO/Async/Stream.pm line 998 during global destruction.

I checked that all the dependencies of this module are up to date.

You are correct: export is never getting called in the OTLP exporter, because that uses the batch span processor, which will collect spans to be exported and send them once a batch is ready (to minimise network churn). If you are not setting it anywhere, it will use its default batch size of 512, which means it won't actually do anything until you've called end on 512 spans. You can verify this by generating more spans or reducing the batch size (the easiest way is to set the OTEL_BSP_MAX_EXPORT_BATCH_SIZE environment variable).

Alternatively, you can tell the exporter that it needs to flush any unexported spans. The modified snippet in jjatria/perl-opentelemetry-sdk#5 ended with a call to both force_flush and shutdown on the global tracer provider. force_flush will tell the exporter that it needs to process any spans in its internal queue (which is part of what you want). shutdown will do the same (it implies force_flush), but it will also stop the batch processor, which will allow your program to terminate (this is why the snippet in jjatria/perl-opentelemetry-sdk#5 could exit).

Note that the expectation is that you'd call these methods on the global tracer provider (rather than directly on the exporter) so that the signal propagates properly: first to the provider, then to the processors, and from there to any exporters, etc.

The reason you can see the console exporter generating traces is because that one uses the simple span processor which does not have an internal queue and just exports any span it sees in the same process.

a simpler example program in the documentation would be helpful.

Definitely! Now that the API seems to have settled a little, we are in need of some realistic examples.

Also note that the OpenTelemetry->logger method will likely be removed in upcoming releases, since it's really only useful for the logging done by the OpenTelemetry libraries, and it will collide with the changes that will be needed to add support for the OpenTelemetry logging API.

You can replace that with the regular use Log::Any '$logger'; or however you prefer to use it.

abh commented

Ah, thank you! I misunderstood the other issue and expected that the system would recognize that the process is shutting down and then do the flushing automatically.

I'd tried to set the environment variables to make sure things would be flushed within a few seconds (rather than 30?), but I must have misunderstood the configuration options.

Removing logger makes sense; I was adding it to make sure that it was configured so anything the OT libraries did woulds be logged in a place I could see it.

In case it's helpful to others, this was the last version of my test program:

#!/usr/bin/env perl

use strict;
use warnings;
use experimental 'signatures';

BEGIN {
    #$ENV{OTEL_EXPORTER_OTLP_ENDPOINT} = 'http://otelcol-ntp.ntppipeline.svc:4318';
    #$ENV{OTEL_EXPORTER_OTLP_ENDPOINT} = 'http://100.90.46.106:4318';

    $ENV{OTEL_SERVICE_NAME} = "perl-test";

    # console just to give a little more output
    $ENV{OTEL_TRACES_EXPORTER} = 'console,otlp';
}

# These might also help give an idea of what is happening behind the scenes
use Log::Any::Adapter 'Stderr';
use Metrics::Any::Adapter 'Stderr';

# Since we 'use' these, they call `import` at compile time, which is
# why the ENV variables are set inside a BEGIN block. They could
# instead be set in your actual environment
use OpenTelemetry;
use OpenTelemetry::SDK;
use OpenTelemetry::Context;
use OpenTelemetry::Trace;
use Syntax::Keyword::Dynamically;

# Obtain the current default tracer provider
my $provider = OpenTelemetry->tracer_provider;

# Create a tracer
my $tracer = $provider->tracer(name => 'perl-tracer', version => '1.0');
my $span   = $tracer->create_span(name => "test span");
dynamically OpenTelemetry::Context->current = OpenTelemetry::Trace->context_with_span($span);

foo();    # create a new span in foo that is a child of the span created above

# Record spans
$tracer->in_span(
    outer => sub ($span, $context) {
        $span->set_name("outer-span");
        warn "outer";

        $tracer->in_span(
            inner => sub ($span, $context) {
                warn "inner";
            }
        );
    }
);

warn "done";

$span->end;

$provider->force_flush(5)->get;    # shorter timeout because impatience

warn "flushed";

$provider->shutdown->get;

warn "shutdown";

sub foo {
    my $span = $tracer->create_span(name => "foo");
    dynamically OpenTelemetry::Context->current = OpenTelemetry::Trace->context_with_span($span);
    $span->end;
}
tempo

Glad to see it worked! I'll close this issue in that case, but feel free to re-open if you think there's more to it.

Before that, very brifely:

flushed within a few seconds (rather than 30?)

I guess you meant OTEL_BSP_EXPORT_TIMEOUT? Re-reading that now, the description is a little ambiguous. That variable controls the amount of time the batch span processor will wait for the exporter to complete. There's no configuration available to make it maybe flush every certain number of seconds.

I'll see if that can be re-phrased to make it less confusing. Thanks for trying things out!

abh commented

ah, yeah -- that was unclear about the timeout. I thought it was equivalent to the OTEL_EXPORTER_OTLP_TIMEOUT setting in the Go OTLP exporter.

For my use case that setting makes more sense than trying to guess a reasonable number of spans to wait for batching.

I might be misreading things, but I think that variable also controls the timeout of the export request for the OTLP exporter: https://github.com/open-telemetry/opentelemetry-go/pull/1758/files#diff-50b034ca999a80755270486ef32e682b583a3e10d0686b49f64e0a80d1a24ac4R73-R77

That variable (and its meaning) is part of the OTel spec, so if Go and Perl interpret it differently, one of us has a bug,

abh commented

yeah, you are right! I was thinking of the "batchtimeout" in the batch processor ... ("timeout" is a little overloaded!) -- they default it to 5 seconds, so batches are sent every 5 seconds (or more often if the "buffer" is full, I think).

https://github.com/open-telemetry/opentelemetry-go/blob/204be6157f13594eb92a91914aaa7a7561bfd08a/sdk/trace/batch_span_processor.go#L246C24-L246C24

Ah, gotcha. This seems to be a Go-specific setting, in which case they can go crazy :)