A small workshop to illustrate log exclusion and masking with the Datadog Agent in a Docker environment.
The aim of this repo is to be as low-overhead as possible. The steps to get started are:
- Install Docker.
- Clone the repo with
git clone git@github.com:nsuarezcanton/datadog-data-privacy-logs.git
. - Make sure that that the environment variables
DD_API_KEY
andDD_SITE
are set in the terminal session that you're using. You can test this withecho $DD_API_KEY
. - Run
docker compose up --build
. - (Optional) If you're on Windows or Linux, you'll need to modify the volume mounts in the
datadog-agent
service withindocker-compose.yaml
to have the respective volume mounts — the process is detailed in Datadog's documentation.
At this stage, you should have the Datadog Agent running as well as two services: (a) cardpayment-raw
and (b) cardpayment-masked
. In short, you should see logs flowing into https://app.datadoghq.com/logs (or the respective site in which your Datadog account is hosted.) For each section, you shall uncomment the respective step in docker-compose.yaml
. Each step has a TODO
note so they can be identified easily.
From the logs view, you should be able to query for service:agent
and see the Agent's logs. There are times where you want to exclude a container's (or an image's) logs from being collected. The approach to take here is to leverage the DD_CONTAINER_EXCLUDE_LOGS
directive (i.e. environment variable) to exclude the Agent's image. We assign the value image:datadog/agent
to make sure that log collection for containers running this image are excluded. You can refer to our documentation to understand what pattern matching techniques you can apply.
Now, if you search for service:fw-datadog-data-masking-cardpayment-masked
or service:fw-datadog-data-masking-cardpayment-raw
, you'll notice that these logs contain a credit card number. Though we want to keep track of this transaction, our objective is to avoid sending this number to the Datadog backend (i.e. before ingestion). To do so, we can leverage the log_processing_rules
directive to scrub sensitive information from your logs. Datadog provides a set of examples with common patterns you may want to scrub. Otherwise, you'll need to leverage regular expressions to match the logs you want to scrub.
We have some logs in our account that contain sensitive information. If you query for service:fw-datadog-data-masking-cardpayment-raw
, you'll see that credit card information is displayed within the log event. This makes sense as fw-datadog-data-masking-cardpayment-raw
(i.e. cardpayment-raw
in docker-compose.yaml
) is running the same code but with no Agent-level processing rules applied. This example is useful to highlight Datadog's Sensitive Data Scanner.
Datadog allow you to exclude or obfuscate logs before they make it into the platform. That said, you may miss some patterns before forwarding your logs. With that in mind, Datadog also provides a Sensitive Data Scanner which will not only let you know but also allow you to remove, mask or hash these attributes. As a final note, you should review the documentation guide on reducing data related risks.