aws/aws-for-fluent-bit

Init process GetBucketLocation fails with 403

alicyn opened this issue · 3 comments

alicyn commented

Describe the question/issue

I am trying to setup fluentbit with firelens to ship logs from my Ruby on Rails application containers in ECS Fargate to an external log monitoring tool (Coralogix).

I continue to get this error, "Cannot get bucket region of MYBUCKETNAME + path/tofile.conf, you must be the bucket owner to implement this operation", despite following the documentation exactly. Full error from the log below:

time="2023-10-10T20:02:18Z" level=fatal msg="[FluentBit Init Process] Cannot get bucket region of stagingdp + coralogix/base_filters.conf, you must be the bucket owner to implement this operation\n" | 96833da8cd3246fdb8b80a0a82b80c7e | log_router
-- | -- | --
October 10, 2023 at 16:02 (UTC-4:00) | time="2023-10-10T20:02:18Z" level=error msg="AccessDenied: Access Denied\n\tstatus code: 403, request id: RYZCWEA6ZMDBC18F, host id: hFZ/VZQJkcpyxyJB0fEz2ea3JTFq+JMf5vLOjq/dVDtVck8oylwJFe8xUgt08Crm4ll20rZrhEQ=""

Configuration

I have followed the instructions that are outlined here:
https://github.com/coralogix/telemetry-shippers/tree/master/logs/fluent-bit/ecs-fargate

and here:

https://github.com/aws/aws-for-fluent-bit/tree/mainline/use_cases/init-process-for-fluent-bit

Here is the relevant section of my ECS Task definition:

{
            "name": "log_router",
            "image": "public.ecr.aws/aws-observability/aws-for-fluent-bit:init-2.31.12",
            "cpu": 0,
            "portMappings": [],
            "essential": false,
            "environment": [
                {
                    "name": "aws_fluent_bit_init_s3_1",
                    "value": "arn:aws:s3:::stagingdp/coralogix/base_filters.conf"
                }
            ],
            "mountPoints": [],
            "volumesFrom": [],
            "user": "0",
            "logConfiguration": {
                "logDriver": "awslogs",
                "options": {
                    "awslogs-group": "bre-ecs-log-group",
                    "awslogs-region": "us-east-2",
                    "awslogs-stream-prefix": "firelens"
                }
            },
            "firelensConfiguration": {
                "type": "fluentbit"
            }
}

My task role has the following policy attached to it:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "s3:GetObject",
                "s3:GetBucketLocation"
            ],
            "Effect": "Allow",
            "Resource": "arn:aws:s3:::stagingdp/*"
        }
    ]
}

I am the bucket owner/creator of this S3 bucket, "stagingdp" and my user has full administrator permissions. Everything is in the us-east-2 region, nothing is existing in multiple AWS accounts.

If I run "GetBucketLocation" locally using CLI, it works successfully with no issues and returns "us-east-2", with or without the "expected bucket owner".

aws s3api get-bucket-location --bucket stagingdp
aws s3api get-bucket-location --bucket stagingdp --expected-bucket-owner <ACCOUNTNUMBER>`

Fluent Bit Version Info

docker image: public.ecr.aws/aws-observability/aws-for-fluent-bit:init-2.31.12

Cluster Details

ECS Fargate

alicyn commented

Ended up solving this. The instructions in the linked AWS documentation appears to be incorrect.

The correct policy to add to the task role is as follows:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "s3:GetObject",
                "s3:GetBucketLocation"
            ],
            "Effect": "Allow",
            "Resource": [
                "arn:aws:s3:::stagingdp/*",
                "arn:aws:s3:::stagingdp"
            ]
        }
    ]
}

According to this separate AWS document on using firelens, the s3:GetObject permission needs the path to the file to that object, but the s3:GetBucketLocation permission should only be applied at the bucket ARN level without any file path specified. https://docs.aws.amazon.com/AmazonECS/latest/userguide/using_firelens.html

In this same document, it mentions that these permissions should also be added to the task execution role. So, I did that also for good measure:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "s3:GetObject"
            ],
            "Effect": "Allow",
            "Resource": [
                "arn:aws:s3:::stagingdp/coralogix/base_filters.conf"
            ]
        },
        {
            "Action": [
                "s3:GetBucketLocation"
            ],
            "Effect": "Allow",
            "Resource": [
                "arn:aws:s3:::stagingdp"
            ]
        }
    ]
}

I'm not sure which one is actually needed or if both are needed, but the errors have stopped and my logs are successfully going to Coralogix.

The first document (https://github.com/aws/aws-for-fluent-bit/tree/mainline/use_cases/init-process-for-fluent-bit) says that the policy should only be added to the ECS task role, not the execution role. The instructions seem to directly contradict each other, so it would be nice if there was a conclusive answer on this.

Thanks a lot @alicyn for the detail here, you helped me move forward as I was having a very similar problem. In my case, I only updated the TaskRole (not TaskExecutionRole) to use a policy as follows:

    - PolicyName: Blah
      PolicyDocument:
        Statement:
        - Effect: Allow
          Action:
            - 's3:GetBucketLocation'
            - 's3:GetObject'
          Resource:
            - 'arn:aws:s3:::some-bucket/*'
            - 'arn:aws:s3:::some-bucket'       
        - Effect: Allow
          Action:
            - 'kms:Decrypt'
            - 'kms:GenerateDataKey'
          Resource:
            - '{{resolve:ssm:/my/path/to/kms/key'       

I was able to troubleshoot this by using ecs exec to get in to the container, installing the aws-cli, and attempting to run aws s3 cp commands. It was also very helpful to specifically configure the firelens container to use the awslogs driver while troubleshooting.

LogConfiguration: 
  LogDriver: awslogs
  Options: 
      awslogs-create-group: true
      awslogs-region: us-west-2
      awslogs-group: firelens
      awslogs-stream-prefix: some-prefix
      mode: non-blocking                

Note: These excerpts are pieces from Cloudformation hence any weirdness and the fact it is in yaml. Also using fargate. If it matters, I'm ultimately trying to ship my logs in to datadog.

For anyone who finds this and is stuck with the above not working, please also consider if you have VPC Endpoints setup and not allowing blanket access to S3 that you will need to grant access to this bucket too.
This was the cause of my issue after longer than I care to admit.