flyteorg/flyte

[BUG] Log Links should work with CloudWatch FluentD Out Of the Box

Opened this issue · 11 comments

Describe the bug

As it stands, users wishing to enable CloudWatch logs will need to go through the AWS Process to do so here and then manually modify the fluent-bit-config configmap to alter the generated log stream to match what Flyte expects.

kubectl edit cm -n amazon-cloudwatch fluent-bit-config 
[OUTPUT]
    Name                cloudwatch_logs
    Match               application.*
    region              ${AWS_REGION}
    log_group_name      /aws/containerinsights/${CLUSTER_NAME}/application
    log_stream_prefix   var.
    auto_create_group   true
    extra_user_agent    container-insights

And then use this configmap for flyte:

  task_logs.yaml: |
    plugins:
      logs:
        cloudwatch-template-uri: 'https://{vars.region}.console.aws.amazon.com/cloudwatch/home?region={vars.region}#logsV2:log-groups/log-group/$252Faws$252Fcontainerinsights$252F<log group name>$252Fapplication$3FlogStreamNameFilter$3Dvar.application.var.log.containers.{{ .podName }}_{{ .namespace }}_{{ .containerName }}'

We should change the default template defined here to be: https://console.aws.amazon.com/cloudwatch/home?region=%s#logsV2:log-groups/log-group/%s$3FlogStreamNameFilter=var.log.containers.{{ .podName }}_{{ .namespace }}_{{ .containerName }}

Expected behavior

  1. Follow AWS Guide to deploy FluentD: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Container-Insights-setup-logs-FluentBit.html
  2. Enable CloudWatch logs in Flyte:
      task_logs.yaml: |
        plugins:
          logs:
            cloudwatch-enabled: true
            cloudwatch-log-group: 'bv-ml-pipelines'
            cloudwatch-region: 'us-east-1'
            kubernetes-enabled: false
    
  3. SUCCESS ✔️

Additional context to reproduce

No response

Screenshots

No response

Are you sure this issue hasn't been raised already?

  • Yes

Have you read the Code of Conduct?

  • Yes

I'd like to work on this.

#take

#self-assign

#self-assign

@samhita-alla Kindly review the PR and let me know if any changes are required...Thanks!

This assumes that the cluster is going to be deployed with FluentBit by-default, while FluentD is valid and recommended option, to which the older configuration is pointing to. This change will not be applicable for the all users.

https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Container-Insights-setup-logs-FluentBit.html

@andrusha do you mind elaborating on this? Are you commenting on the discussion we are having on the PR? flyteorg/flyteplugins#293 (comment)

Aha, I understand, I agree changing the default can break other people, do you know for a fact that FlutentBit in compatibility mode works out of the box? I vaguely recall hostname was still a requirement though, no?

Perhaps a better approach is to:

  1. Still add support for a host name template parameter
  2. Instead of changing the default, add another CloudWatchV2Enabled flag that default to the new version format...
  3. Add a "Enabling Cloud Provider-Native logs" where we can elaborate on templating for URLs, StackDriver, CloudWatch, FluentD and FluentBit.

Hello 👋, this issue has been inactive for over 9 months. To help maintain a clean and focused backlog, we'll be marking this issue as stale and will engage on it to decide if it is still applicable.
Thank you for your contribution and understanding! 🙏