Trace Analytics not set up - otel-v1-apm-service-map index is blank
nnanoob opened this issue · 9 comments
Hi,
I tried to run with following combination but not able to get the Trace Analytics to show anything.
The otel-v1-apm-service-map
is always empty. Any idea what I could misconfigured?
- AWS Managed Elasticsearch 7.9.3
- amazon/opendistro-for-elasticsearch-data-prepper:1.0.0
- otel/opentelemetry-collector-contrib:0.28.0
GET _cat/indices?v&s=i
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open otel-v1-apm-service-map kkdvD7wEQWyyEEuV_Pl_9A 5 1 0 0 2kb 1kb
green open otel-v1-apm-span-000001 _QLY94WSTbS5ezSEazqg4Q 5 1 71 0 1.9mb 987.4kb
otel-collect.yml
receivers:
otlp:
protocols:
grpc:
http:
processors:
batch: null
exporters:
otlp/2:
endpoint: "<ip>:<port>"
insecure: true
logging:
extensions:
health_check: {}
service:
extensions: [health_check]
pipelines:
metrics:
receivers:
- otlp
exporters:
- logging
traces:
receivers:
- otlp
processors:
- batch
exporters:
- otlp/2
- logging
pipelines.yml
entry-pipeline:
delay: "100"
source:
otel_trace_source:
ssl: false
sink:
- pipeline:
name: "raw-pipeline"
- pipeline:
name: "service-map-pipeline"
raw-pipeline:
source:
pipeline:
name: "entry-pipeline"
prepper:
- otel_trace_raw_prepper:
sink:
- elasticsearch:
hosts: ["<host>:<port>"]
username: "<username>"
password: "<password>"
trace_analytics_raw: true
service-map-pipeline:
delay: "100"
source:
pipeline:
name: "entry-pipeline"
prepper:
- service_map_stateful:
sink:
- elasticsearch:
hosts: ["<host>:<port>"]
username: "<username>"
password: "<password>"
trace_analytics_service_map: true
data-prepper container logs
2021-06-14T07:12:55,395 [main] INFO com.amazon.dataprepper.pipeline.server.DataPrepperServer - Data Prepper server running at :4900
2021-06-14T07:12:55,484 [entry-pipeline-prepper-worker-1-thread-1] INFO com.amazon.dataprepper.pipeline.ProcessWorker - entry-pipeline Worker: No records received from buffer
2021-06-14T07:12:55,486 [service-map-pipeline-prepper-worker-3-thread-1] INFO com.amazon.dataprepper.pipeline.ProcessWorker - service-map-pipeline Worker: No records received from buffer
2021-06-14T07:12:58,392 [raw-pipeline-prepper-worker-5-thread-1] INFO com.amazon.dataprepper.pipeline.ProcessWorker - raw-pipeline Worker: No records received from buffer
2021-06-14T07:13:07,565 [entry-pipeline-prepper-worker-1-thread-1] INFO com.amazon.dataprepper.pipeline.ProcessWorker - entry-pipeline Worker: Processing 1 records from buffer
2021-06-14T07:13:07,679 [service-map-pipeline-prepper-worker-3-thread-1] INFO com.amazon.dataprepper.pipeline.ProcessWorker - service-map-pipeline Worker: Processing 1 records from buffer
2021-06-14T07:13:10,577 [raw-pipeline-prepper-worker-5-thread-1] INFO com.amazon.dataprepper.pipeline.ProcessWorker - raw-pipeline Worker: Processing 1 records from buffer
2021-06-14T07:25:02,209 [entry-pipeline-prepper-worker-1-thread-1] INFO com.amazon.dataprepper.pipeline.ProcessWorker - entry-pipeline Worker: Processing 1 records from buffer
2021-06-14T07:25:02,313 [service-map-pipeline-prepper-worker-3-thread-1] INFO com.amazon.dataprepper.pipeline.ProcessWorker - service-map-pipeline Worker: Processing 1 records from buffer
2021-06-14T07:25:05,210 [raw-pipeline-prepper-worker-5-thread-1] INFO com.amazon.dataprepper.pipeline.ProcessWorker - raw-pipeline Worker: Processing 1 records from buffer
Regards,
Boon
Hi, thanks for filling out an issue. This seems to be on the data-prepper side, @wrijeff could you take a look here please?
@nnanoob Thanks for reporting on the issue. I did not find any issue with service-map pipeline in pipelines.yml
, so it could be a data processing bug in service-map of data-prepper. For further investigation, could you share a snapshot of all raw spans belongs to a sample traceId. In particular, could you include the following fields at least?
- TraceId
- SpanId
- parentSpanId
- name
- serviceName
- traceGroup
Query
GET otel-v1-apm-span-000001/_search?
{
"sort": [
{
"startTime": {
"order": "desc"
}
}
],
"query": {
"term": {
"traceId": {
"value": "d66fa6449679a8ed796e0aaddd676ad3"
}
}
}
}
Results
{
"took" : 14,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "otel-v1-apm-span-000001",
"_type" : "_doc",
"_id" : "79f3ca6ca90d6152",
"_score" : null,
"_source" : {
"traceId" : "d66fa6449679a8ed796e0aaddd676ad3",
"spanId" : "79f3ca6ca90d6152",
"traceState" : "",
"parentSpanId" : "01852c7b15e32fdd",
"name" : "HTTP GET",
"kind" : "SPAN_KIND_CLIENT",
"startTime" : "2021-06-14T23:55:00.055073908Z",
"endTime" : "2021-06-14T23:55:00.155158663Z",
"durationInNanos" : 100084755,
"serviceName" : "infra.monitoring",
"events" : [ ],
"links" : [ ],
"droppedAttributesCount" : 2,
"droppedEventsCount" : 0,
"droppedLinksCount" : 0,
"traceGroup" : "MonitoringService.monitors",
"traceGroupFields.endTime" : "2021-06-14T23:55:00.156641Z",
"traceGroupFields.statusCode" : 0,
"traceGroupFields.durationInNanos" : 155638120,
"span.attributes.http@url" : "<some url - masked on purpose>",
"instrumentationLibrary.version" : "1.2.0",
"span.attributes.thread@id" : 35,
"resource.attributes.process@pid" : 1,
"resource.attributes.host@arch" : "amd64",
"span.attributes.net@peer@name" : "<some hostname - masked on purpose>",
"resource.attributes.telemetry@sdk@version" : "1.2.0",
"resource.attributes.service@name" : "infra.monitoring",
"status.code" : 0,
"instrumentationLibrary.name" : "io.opentelemetry.javaagent.http-url-connection",
"resource.attributes.service@version" : "1.0.0",
"resource.attributes.process@runtime@name" : "OpenJDK Runtime Environment",
"resource.attributes.os@type" : "linux",
"span.attributes.http@flavor" : "1.1",
"resource.attributes.deployment@environment" : "poc",
"span.attributes.http@status_code" : 200,
"span.attributes.thread@name" : "scheduling-1",
"resource.attributes.telemetry@sdk@language" : "java",
"resource.attributes.host@name" : "a7ebcbdd1baa",
"resource.attributes.process@runtime@description" : "Oracle Corporation OpenJDK 64-Bit Server VM 25.282-b08",
"resource.attributes.process@executable@path" : "/var/lib/jre:bin:java",
"span.attributes.net@transport" : "ip_tcp",
"resource.attributes.process@command_line" : "/var/lib/jre:bin:java -javaagent:/apps/lib/ext/opentelemetry-javaagent-all.jar -Dotel.resource.attributes=service.name=infra.monitoring,service.version=1.0.0,deployment.environment=poc -Dotel.exporter.otlp.endpoint=<some url - masked on purpose> -Dspring.profiles.active=prod",
"span.attributes.http@method" : "GET",
"resource.attributes.process@runtime@version" : "1.8.0_282-b08",
"resource.attributes.telemetry@sdk@name" : "opentelemetry",
"resource.attributes.telemetry@auto@version" : "1.2.0",
"resource.attributes.os@description" : "Linux 3.10.0-1160.25.1.el7.x86_64"
},
"sort" : [
1623714900055073908
]
},
{
"_index" : "otel-v1-apm-span-000001",
"_type" : "_doc",
"_id" : "fcacb962a394991a",
"_score" : null,
"_source" : {
"traceId" : "d66fa6449679a8ed796e0aaddd676ad3",
"spanId" : "fcacb962a394991a",
"traceState" : "",
"parentSpanId" : "01852c7b15e32fdd",
"name" : "HTTP GET",
"kind" : "SPAN_KIND_CLIENT",
"startTime" : "2021-06-14T23:55:00.001449565Z",
"endTime" : "2021-06-14T23:55:00.048979017Z",
"durationInNanos" : 47529452,
"serviceName" : "infra.monitoring",
"events" : [ ],
"links" : [ ],
"droppedAttributesCount" : 2,
"droppedEventsCount" : 0,
"droppedLinksCount" : 0,
"traceGroup" : "MonitoringService.monitors",
"traceGroupFields.endTime" : "2021-06-14T23:55:00.156641Z",
"traceGroupFields.statusCode" : 0,
"traceGroupFields.durationInNanos" : 155638120,
"span.attributes.http@url" : "<some url - masked on purpose>",
"instrumentationLibrary.version" : "1.2.0",
"span.attributes.thread@id" : 35,
"resource.attributes.process@pid" : 1,
"resource.attributes.host@arch" : "amd64",
"span.attributes.net@peer@name" : "<some hostname - masked on purpose>",
"resource.attributes.telemetry@sdk@version" : "1.2.0",
"resource.attributes.service@name" : "infra.monitoring",
"status.code" : 0,
"instrumentationLibrary.name" : "io.opentelemetry.javaagent.http-url-connection",
"resource.attributes.service@version" : "1.0.0",
"resource.attributes.process@runtime@name" : "OpenJDK Runtime Environment",
"resource.attributes.os@type" : "linux",
"span.attributes.http@flavor" : "1.1",
"resource.attributes.deployment@environment" : "poc",
"span.attributes.http@status_code" : 200,
"span.attributes.thread@name" : "scheduling-1",
"resource.attributes.telemetry@sdk@language" : "java",
"resource.attributes.host@name" : "a7ebcbdd1baa",
"resource.attributes.process@runtime@description" : "Oracle Corporation OpenJDK 64-Bit Server VM 25.282-b08",
"resource.attributes.process@executable@path" : "/var/lib/jre:bin:java",
"span.attributes.net@transport" : "ip_tcp",
"resource.attributes.process@command_line" : "/var/lib/jre:bin:java -javaagent:/apps/lib/ext/opentelemetry-javaagent-all.jar -Dotel.resource.attributes=service.name=infra.monitoring,service.version=1.0.0,deployment.environment=poc -Dotel.exporter.otlp.endpoint=<some url - masked on purpose> -Dspring.profiles.active=prod",
"span.attributes.http@method" : "GET",
"resource.attributes.process@runtime@version" : "1.8.0_282-b08",
"resource.attributes.telemetry@sdk@name" : "opentelemetry",
"resource.attributes.telemetry@auto@version" : "1.2.0",
"resource.attributes.os@description" : "Linux 3.10.0-1160.25.1.el7.x86_64"
},
"sort" : [
1623714900001449565
]
},
{
"_index" : "otel-v1-apm-span-000001",
"_type" : "_doc",
"_id" : "01852c7b15e32fdd",
"_score" : null,
"_source" : {
"traceId" : "d66fa6449679a8ed796e0aaddd676ad3",
"spanId" : "01852c7b15e32fdd",
"traceState" : "",
"parentSpanId" : "",
"name" : "MonitoringService.monitors",
"kind" : "SPAN_KIND_INTERNAL",
"startTime" : "2021-06-14T23:55:00.001002880Z",
"endTime" : "2021-06-14T23:55:00.156641Z",
"durationInNanos" : 155638120,
"serviceName" : "infra.monitoring",
"events" : [ ],
"links" : [ ],
"droppedAttributesCount" : 0,
"droppedEventsCount" : 0,
"droppedLinksCount" : 0,
"traceGroup" : "MonitoringService.monitors",
"traceGroupFields.endTime" : "2021-06-14T23:55:00.156641Z",
"traceGroupFields.statusCode" : 0,
"traceGroupFields.durationInNanos" : 155638120,
"span.attributes.thread@name" : "scheduling-1",
"instrumentationLibrary.version" : "1.2.0",
"resource.attributes.telemetry@sdk@language" : "java",
"span.attributes.thread@id" : 35,
"resource.attributes.host@name" : "a7ebcbdd1baa",
"resource.attributes.process@pid" : 1,
"resource.attributes.host@arch" : "amd64",
"resource.attributes.process@runtime@description" : "Oracle Corporation OpenJDK 64-Bit Server VM 25.282-b08",
"resource.attributes.process@executable@path" : "/var/lib/jre:bin:java",
"resource.attributes.telemetry@sdk@version" : "1.2.0",
"resource.attributes.service@name" : "infra.monitoring",
"resource.attributes.process@command_line" : "/var/lib/jre:bin:java -javaagent:/apps/lib/ext/opentelemetry-javaagent-all.jar -Dotel.resource.attributes=service.name=infra.monitoring,service.version=1.0.0,deployment.environment=poc -Dotel.exporter.otlp.endpoint=<some url - masked on purpose> -Dspring.profiles.active=prod",
"status.code" : 0,
"instrumentationLibrary.name" : "io.opentelemetry.javaagent.spring-scheduling-3.1",
"resource.attributes.process@runtime@version" : "1.8.0_282-b08",
"resource.attributes.service@version" : "1.0.0",
"resource.attributes.telemetry@sdk@name" : "opentelemetry",
"resource.attributes.process@runtime@name" : "OpenJDK Runtime Environment",
"resource.attributes.os@type" : "linux",
"resource.attributes.telemetry@auto@version" : "1.2.0",
"resource.attributes.deployment@environment" : "poc",
"resource.attributes.os@description" : "Linux 3.10.0-1160.25.1.el7.x86_64"
},
"sort" : [
1623714900001002880
]
}
]
}
}
@nnanoob Thanks for sharing. By examining your sample trace group, it looks like all the spans share the same serviceName:
"serviceName" : "infra.monitoring"
Notice that the service-map prepper will not produce data for internal API calls which is identified by serviceName. This is explicit in our end-to-end integration test:
Therefore, Could you check if for spans belong to all other traceGroups/traceIds, do they all share the same serviceName?
- If yes, does your client application architecture include multiple micro-services? If that is the case, we could dive deeper into why both the client spans and the internal spans share the same serviceName and see if there is possible enhancement on the data-prepper side.
- If no, I need to find other root cause on the issue.
@chenqi0805 , i'm actually doing a POC with a single standalone application to see what can Trace Analytics offers in the OSS Kibana. The standalone application is only doing a periodic API health call check to 2 samples website URL and I can see their URL is captured under span.attributes.http@url
All the other traceGroups/traceId shares the same serviceName because I only deployed on single application with attribute that looks like this in the agent -Dotel.resource.attributes=service.name=infra.monitoring,service.version=1.0.0,deployment.environment=poc
I was initially expecting to see something on Trace Analytics screen. Does it not work if the otel-v1-apm-service-map
is empty?
I was initially expecting to see something on Trace Analytics screen. Does it not work if the otel-v1-apm-service-map is empty?
@nnanoob No. If the index is empty then nothing shows up. AFAICT, trace analytics UI aggregation is dependent on both raw span data and service map edges.
@chenqi0805 , in that case, does a single application/service.name sending data to data-prepper will generate any service map? if yes, I'm just wondering why my example isn't inserting any docs into the otel-v1-apm-service-map
or does it bare minimum requires at least 2 applications where application A invoking application B (and both having telemetry configured)
in that case, does a single application/service.name sending data to data-prepper will generate any service map?
Unfortunately no as we use serviceName as identifier of different services/service-map nodes.
does it bare minimum requires at least 2 applications where application A invoking application B (and both having telemetry configured)
Yes. You could refer to our sample-app services as example:
@chenqi0805 , thanks. i guess the issue i'm facing was due to I only have 1 application then, while it needs minimum 2 to work