GitLab CI jobs tracing
Petromar88 opened this issue · 6 comments
Hi everyone, i'm trying to trace GitLab CI pipelines jobs using otel-cli but i'm having troubles with unexpected span tracing.
When CI jobs run i can correctly see each span being traced on Cloud Trace with the expected context propagation.
Inside each job i'm running one Earthly command that causes a lot of other operations to be performed.
The problem is that all of these operations are traced within different unrelated spans.
I've tried many different solutions to force those spans to inherit the current context (traceparent) but none worked.
So for example, running the following pipeline:
where the start monitoring and the 2. test jobs aren't doing anything other than creating a span with a name and the 1. lint job is declared this way:
- lint:
stage: build
script:
-otel-cli span --name ${CI_JOB_NAME_SLUG} --tp-export --tp-carrier ${OTEL_ENV_FILE}
- earthly +lint --ENV=${TASK_ENV}
artifacts:
paths:
- "${OTEL_ENV_FILE}"
results in the following on Cloud Trace
All the spans following the one in the first screenshot are generated by buildkit operations performed by the job 1. lint without any explicit otel-cli command.
I've already tried using otel-cli exec
, executing the exported traceparent declaration before running earthly and sending a span with a --start and --end dates after running the earthly command. On top of that i've also tried those solutions by changing the spans kind but nothing worked.
Could you please help me understand how are those spans being created and how to make all of them inherit the traceparent from previous spans in the same job?
Thanks in advance!
@tobert can you please help me with this one? Was i clear enough explaining the problem and the context?
Many thanks
Hello, apologies for the delay. I started a new job this week.
It looks like the traceparent isn't being passed to otel-cli. Are you setting the TRACEPARENT
envvar or writing out a file with the traceparent in it?
One way to see what's going on is to replace your otel-cli span
with otel-cli status
which will do the same thing but dump a bunch of data in JSON to look through and see what otel-cli is doing internally. If you like, please send me a gist and I'll take a look.
Hello, apologies for the delay. I started a new job this week.
It looks like the traceparent isn't being passed to otel-cli. Are you setting the
TRACEPARENT
envvar or writing out a file with the traceparent in it?One way to see what's going on is to replace your
otel-cli span
withotel-cli status
which will do the same thing but dump a bunch of data in JSON to look through and see what otel-cli is doing internally. If you like, please send me a gist and I'll take a look.
No problem at all, thanks for your answer!
I managed to create this gist with a simplified version of our CI and Earthfile just to give you an overview of what i'm executing at the moment.
Unfortunately we have a lot more that is included in both the .gitlab-ci.yml and the Earthfile from private repositories so it would be hard to provide a fully functional gist.
Anyways, the problem can be summarized as everything from inside the docker container started by earthly is being traced even if it's not contained in a dedicated span. Any other command (like an echo or something) is not being traced as expected.
These are two screenshots from Cloud Trace after running the jobs i shared within the gist:
I've also tried to replace otel-cli span
with otel-cli status
like you suggested but then everything was traced with a different trace ID.
I suspect what's missing is using 1 of 2 approaches:
1.) since you're setting --tp-carrier you could use volume mounts to share the same file into docker containers. In this approach, the carrier file is what transmits the traceparent across invocations of otel-cli. You can either have otel-cli use it directly or in shell you can source the carrier file (with --tp-export enabled) and it will set the environment variable.
2.) you could also set the TRACEPARENT envvar somewhere before calling into your tools, and make sure it's propagated into Docker e.g. docker run -e TRACEPARENT="${TRACEPARENT}"
. This is the other way to communicate traceparent to otel-cli runs.
Does this help?
I realized the carrier file was already being copied into the docker container as i'm copying the whole workdir during lint operations. I've anyways tried to pass the TRACEPARENT as an argument and then set it as an env var like you suggested but with no luck.
I suspect all of the operations being traced with different trace IDs are from the IMPORTs in the Earthfile but what i'm missing is: why are those operations being traced without an explicit otel-cli invocation from inside any of the Earthly targets?
It looks like the only otel-cli span
command keeps listening until the end of the CI job even though i'm not using the background approach, or if i was using the otel-cli docker image as a base image which i'm not.
@tobert i just find out Earthly inherits the Go OTEL library as an indirect dependency in order to trace analytics data.
So basically my problem was setting most of the otel-cli's params through env vars which were also used by that dependency.
Even thought Earthly analytics data collection feature can be disabled from the .earthly/config.yaml file, setting the trace endpoint from the span command instead of using the OTEL_EXPORTER_OTLP_TRACES_ENDPOINT env var did resolve the problem.
Thank you very much for your help!