DataDog/datadog-agent

Can't attach host tags to metrics on AWS Fargate

Opened this issue ยท 11 comments

Describe what happened:
I launched datadog-agent container in a ECS task( cf. https://www.datadoghq.com/blog/monitor-aws-fargate/) and attaching host tags by environment variable DD_TAGS.
My app sent custom metrics via dogstatsd.
This metrics are not found in metrics explorer when filtering by host tags.

Describe what you expected:
Metrics Explorer shows metrics with host tags sent from Fargate container.

Steps to reproduce the issue:
Described above.

Additional environment details (Operating System, Cloud provider, etc):
It seems that associating metrics and host tags fails because host is empty on Fargate.
https://github.com/DataDog/datadog-agent/pull/1182/files#diff-1d9d99d196299ff9a78c3e928058494aR111

I talked to support and the DD team about this on Slack a couple weeks ago, and the reason this isn't working as you expect is that you (just like me) are incorrectly assuming that host tags are applied BEFORE the metrics are sent to DD. In fact, the host tag metrics are added AFTER the dogstatsd process has sent the metrics to the API.

This happens by the DD service creating a "host object" for each datadog agent process and associating tags with it. Then when processing the statsd metrics it checks if they are associated with any host objects and they inherit those tags.

Since there's no "host object" when running the agent with ECS_FARGATE=true then there's no way to attach the metrics from DD_TAGS with your statsd metrics.

The team suggested using the docker labels approach (DD_DOCKER_LABELS_AS_TAGS, see https://github.com/DataDog/datadog-agent/blob/master/Dockerfiles/agent/README.md#tagging) as the correct way to attach tags to statsd metrics, but this isn't working for me. Last I spoke to them they are aware this is something most people coming from an EC2 world into Fargate are going to run into and they are working on making it better.

For now, I have resorted to explicitly sending the tags we care about on all of our metrics. Hopefully the auto-tagging will work itself out soon.

It's worth noting that if you're using the Python client library to emit stats from inside the container you can set DATADOG_TAGS in the env (https://github.com/DataDog/datadogpy/blob/a439467/datadog/dogstatsd/base.py#L79). This doesn't work for anything other than Python though, and in my use case I needed to support a mix of languages, so we wired it all up ourselves at the app level.

Here's my Slack convo with their product team that has more info: https://datadoghq.slack.com/archives/C4JREERCY/p1534910557000100

CC @irabinovitch

Thanks for the issue! We are working on making this work via Docker labels (rather than env vars) in 6.5.1, which is in testing now.

@borgstrom @mikezvi
Thanks for reply and solution.
I came up with the workaround, which overwrites entrypoint as follow and works well.

#!/bin/bash

if [[ -n "${ECS_FARGATE}" ]]; then
  taskid=$(curl 169.254.170.2/v2/metadata | sed -r 's/^.*"TaskARN":".*:task\/([A-z0-9\-]+?)".*$/\1/')
  export DD_HOSTNAME=$taskid
fi

/init
danbf commented

@skobaken7 i'm finding that the count metric from our application is under-reported is a service runs multiple task instances. i think it's resulting from the same commit 5867aae which then winds up with dropping all but one count metric since it thinks they are from the same source.

while using your solution of a custom entrypoint.sh works i think perhaps that the datadog agent should still set a hostname on fargate for DogStatsD relayed metrics but still drop all host checks.

Thanks for the workaround @skobaken7. I hope Datadog would address this issue officially, it's' been open for quite a while already with Fargate-usage skyrocketing... The agent works so well "out of the box" anyway in so many cases, it's unfortunate that this kind of trickery is needed...

Another workaround that I stumbled on was duplicating the value of DD_TAGS into DD_DOGSTATSD_TAGS (mentioned here). (This fixes the global tagging issue as originally described, but not the undercounting described by @danbf.)

Hi all!

As commented here: #3159 (comment)
We have a feature request opened on our side to add the task_arn as a tag when sending custom metrics with dogstatsd (this would be the same as the agent as both containers are running in the same task). It should resolve the issue by giving a unique tag (with a higher cardinality) without adding a hostname to the agent, which could cause billing issue.

Please reach out to our support team (support@datadoghq.com) if you'd like to open another feature request that you think is relevant.

Simon

Hi! Not sure if anyone is still following this issue, but it has been more-or-less fixed as of agent 7.29.1. More related enhancements were released with 7.35.0. You might need an updated tracing component if the metrics you want are APM metrics.

ndroo commented

@kwolff-chwy I'm not sure how this is fixed, this exact issue continues to persist (On ECS fargate) unless you manually set the hostname. Am I missing something?

This is still an issue in the latest version of the container

emj-io commented

Would this be an issue if we ran sidecar datadog agents on non-Fargate ECS tasks?