DataDog/datadog-cdk-constructs

LogGroup SubscriptionFilter has a different resource name every deployment causing cloudformation failures

Closed this issue · 7 comments

I am attempting to integrate datadog cdk construct v2 into my project like so:

if (props.datadog) {
      // For documentation: https://github.com/DataDog/datadog-cdk-constructs
      // NodeLayerVersion is gotten from: https://github.com/DataDog/datadog-lambda-js/releases
      const datadog = new Datadog(this, 'Datadog', {
        env: props.stackEnvironment,
        service: 'REDACTED',
        version: props.codeVersion,
        forwarderArn: Fn.importValue(props.datadog.forwarder!),
        nodeLayerVersion: props.datadog.nodeLayerVersion
      });

      datadog.addLambdaFunctions([...]);
    }

It appears that the SubscriptionFilters are being created with unique resource names every time. This causes cloudformation to try and add additional subscription to the log groups. It fails because there is a limit on subscriptions. This limit is set to 2 and cannot change according to AWS docs. https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/cloudwatch_limits_cwl.html

This resource id should not change from deploy to deploy. The offending code:

const subscriptionFilterName = generateSubscriptionFilterName(Names.uniqueId(lam), forwarderArn);

As long as the construct is scoped correctly, CDK will take care of the rest with the resource name.

How to reproduce?

  • Have datadog construct in the stack
  • Leverage forwarderArn
  • Attach to lambdas
  • Deploy 2-3 times (depends if other filters exists)
  • Observe error
12:06:30 PM | CREATE_FAILED        | AWS::Logs::SubscriptionFilter            | REDACTED...er348769153C2BF03B
Resource limit exceeded. (Service: AWSLogs; Status Code: 400; Error Code: LimitExceededException; Request ID: 85c404c8-84e0-4c7a-a379-17e3cb512271; Proxy: null)

12:06:30 PM | CREATE_FAILED        | AWS::Logs::SubscriptionFilter            | REDACTED...er3f6649f5015872C4
Resource limit exceeded. (Service: AWSLogs; Status Code: 400; Error Code: LimitExceededException; Request ID: f02d0345-31eb-4108-96e6-ed610f5eb326; Proxy: null)

12:06:31 PM | CREATE_FAILED        | AWS::Logs::SubscriptionFilter            | REDACTED...er09c33c10EA3CA0A6
Resource limit exceeded. (Service: AWSLogs; Status Code: 400; Error Code: LimitExceededException; Request ID: 35d42acb-8785-4aad-ad4a-def5b25c15b5; Proxy: null)

The stack named xxxx-sandbox-bri failed to deploy: UPDATE_ROLLBACK_COMPLETE: Resource limit exceeded. (Service: AWSLogs; Status Code: 400; Error Code: LimitExceededException; Request ID: 85c404c8-84e0-4c7a-a379-17e3cb512271; Proxy: null), Resource limit exceeded. (Service: AWSLogs; Status Code: 400; Error Code: LimitExceededException; Request ID: f02d0345-31eb-4108-96e6-ed610f5eb326; Proxy: null), Resource limit exceeded. (Service: AWSLogs; Status Code: 400; Error Code: LimitExceededException; Request ID: 35d42acb-8785-4aad-ad4a-def5b25c15b5; Proxy: null)

I'd suggest checking for other resources doing the same thing.

My workaround below.

I will also note that the same thing will happen if you also have Datadog AWS Integration installed at the account level and have CloudWatch checked off under the log configurations. This causes subscription filters to be automatically applied to any new log groups. Be careful! :) In my case, this was also happening, so I simply removed adding the filter, but if you don't have that enabled you should be ok to keep it in.

import { Construct } from 'constructs';
import { LambdaDestination } from 'aws-cdk-lib/aws-logs-destinations';
import { Datadog, DatadogProps } from 'datadog-cdk-constructs-v2';
import { Function } from 'aws-cdk-lib/aws-lambda';
import { FilterPattern } from 'aws-cdk-lib/aws-logs';
import { FooLambda } from './foo-lambda';

export interface FooDatadogProps extends DatadogProps {
    readonly fooLambdas: FooLambda[];
}

/**
 * Intended to workaround: https://github.com/DataDog/datadog-cdk-constructs/issues/108
 */
export class FooDatadog extends Construct {

    readonly _datadog: Datadog;

    constructor(scope: Construct, id: string, props: FooDatadogProps) {
        super(scope, id);
        this._datadog = new Datadog(this, 'Datadog', props);
        this._datadog.addLambdaFunctions(props.fooLambdas.map(fooLambda => fooLambda.function));

        // Fix for https://github.com/DataDog/datadog-cdk-constructs/issues/108
        if(props.forwarderArn) {

            const forwarderLambda = Function.fromFunctionArn(this, 'ForwarderLambda', props.forwarderArn!);
            const forwarderDestination = new LambdaDestination(forwarderLambda);

            props.idexxLambdas.forEach((lambda) => {
                const badFilterName = lambda.function.logGroup.node.findAll()[1].node.id;
                lambda.function.logGroup.node.tryRemoveChild(badFilterName);

                // remove this if you have datadog collecting ALL cloudwatch logs
                lambda.function.logGroup.addSubscriptionFilter('DatadogSubscription', {
                    destination: forwarderDestination,
                    filterPattern: FilterPattern.allEvents()
                });
            });
        }
    }

    get datadog() {
        return this._datadog;
    }

}

Hi @zomgbre - thanks for this report. Looks like an oversight in the initial implementation. We'll target this fix in the next release for both v1 and v2.

In the meantime, in addition to the workaround you've provided, I'd suggest exploring the Datadog Extension, which obviates the need for the Datadog Forwarder.

Thanks again!

Hi @zomgbre I'm having some issues reproducing this. We're using the unique ID from the lambda function name, along with the Forwarder ARN, and combine it with our own constant prefix. I'm pretty new to this repository, but my experiments have made me believe that the resource name is deterministic, as the SubscriptionFilterName from cdk synth is consistent over multiple runs, and indeed multiple deploys of a demo project show that the filter name remains the same the same over multiple deploys:
image
image

Could you run cdk synth a few times and share the output? I'm trying to identify which part of the name generation would be non-deterministic.

Thanks!

Hi @zomgbre - I'm wondering if you've had a chance to try the debugging steps I outlined above? I'd like to identify what specifically keeps changing about the subscription name, which is causing this issue for you.

Thanks!

In the meantime, in addition to the workaround you've provided, I'd suggest exploring the Datadog Extension, which obviates the need for the Datadog Forwarder.

@astuyve , what's the recommended way to take advantage of addForwarderToNonLambdaLogGroups if on migrates to the extension? Instantiate datadog twice in the cdk code? one with and one without a forwarder arn specified?

This library wasn't intended to be instantiated twice, and I'm not sure what the impact of that would be.

If there are non lambda log groups, I'd recommend creating new stack specifically for subscribing non lambda log groups to Datadog, then using the addForwarderToNonLambdaLogGroups method to subscribe the forwarder to those groups.

Closing as there's been no reply to the original issue. Please feel free to re-open if there's more information you can provide based on the troubleshooting steps outlined above.
Thanks!