spreadshirt/backstage-plugin-s3

Folder structure, problem

Opened this issue · 19 comments

Hi

I would like to have a little guidance: so I have to upload a folder structure to s3:

my local structure is:

test1/
└── test2
    └── test3
        └── test.txt

on backstage gui structure

├─ test1
│  ├─ test2
│  │  ├─ test3
│  │  ├─ test3
│  │  ├─ test3
│  │  ├─ test3
│  │  ├─ test3
│  │  ├─ test3

So my question is, what could be the reson of that or how can I solve this?
I do not really understand this behaviour. ( I used minio s3 client)

Thanks

That sounds weird to me. I've just tried the same setup in our demo (available inside the folder /demo in this repo). Replacing the example with a test1/test2/test3/test.txt like you have, and the data was displayed properly (the demo uses ceph):
Screenshot from 2023-09-25 14-31-23

So I would suspect the problem is how minio returns the data. We are using the aws-sdk packages internally, and there might be the case that those 2 clients are not fully understanding each other, but I don't really know, since we haven't tested this setup with minio.

What I would suggest you to try is implementing a custom S3Api using a Minio client. To do it, you would need to implement a class extending the S3Api that can be found here. Most likely, the code for this client would be almost the same as in the S3Client. Then, you have to overwrite the used client before you call the method S3Builder.build() with the function getClient.

If this fixes the issue. I would suggest to extend the list of clients to be able to use minio as well.

@ivangonzalezacuna Thank for your reply, I just used minio client for the upload to S3, which type of client did you used for that? I would like to try that one first. Maybe it just a minio client bug.

Hi @C1ph3R-s , to try it out locally I used a ceph image. You can find more info in this block of the plugin documentation: https://github.com/spreadshirt/backstage-plugin-s3#development

Basically, the contents of the folder ./demo/examples will be synced as soon as you start the container (it takes some time, just wait a bit until the sync is done, which is the last step). Add or remove things as you like, just for testing.

Then, if you start the test environment using yarn start you should see what I shared in the screenshot. If your app-config.yaml looks like the one in here and using ceph works, it might be an issue with that minio client then.

@ivangonzalezacuna I have tested a the setup again, so minio works well becase when i use mc ls command i got this:

user@host:# mc ls s3/test_bucket/test1
0B test2/
user@host:# mc ls s3/test_bucket/test1/test2
0B test3/
user@host:# mc ls s3/test_bucket/test1/test2/test3
5B STANDARD test.txt

but when I check the backstage GUI it got this below:

test1 folder
backstage_test1

test2 folder
backstage_test2

test3 folder
backstage_test3

test.txt file in every test3 folder
backstage_test_txt

Do you have any idea?

Thank you

@ivangonzalezacuna

I have tested with aws cli tool too, but unfortunatelly I got the same result on backstage GUI.

Here is my script for the poc:

#!/bin/bash
export AWS_ACCESS_KEY_ID="your access key"\
export AWS_SECRET_ACCESS_KEY="your secret access key"

BUCKET_NAME="test_bucket"
DIRECTORY="test1"

alias aws='aws --endpoint-url https://s3-redhat-openshift-storage.com'

for FILE in $DIRECTORY/*
do
  aws s3 cp $FILE s3://$BUCKET_NAME/ --recursive
done

Thanks you

@ivangonzalezacuna It seems to me that when I upload a file to the s3 bucket, it works fine. However, when I attempt to add a file to a folder or a folder structure, the issue I described earlier occurs.

That still looks super weird to me. Specially because replicating the same directory structure you suggest works in my case.

Also, the method that seems to be failing is the listBucketKets from the default S3Client: the code is here

The internal client should return an object containing several arrays. The result inside the CommonPrefixes are the folders, and the values in Contents are the files. Those 2 results are appended into one slice are returned to the frontend, so they are printed.

The file listing looks fine to me, since you get no duplications at all. So I would suggest checking what the output of CommonPrefixes is. What we could do in this case is extend that check, and make sure all the keys are unique, so only one testX/ will be present.

But for understanding what is going on in the background, could you try out to get the full result from the s3Client.listObjects() method? You could use your backstage instance for it: When you initialize the s3 plugin with S3Builder.createBuilder you can add a setS3Client() call and use your custom client for the tests. This client must be a copy of the default S3Client, but adding some logs to the method mentioned above.

Also, could you do a yarn why @aws-sdk/client-s3? Just to see if the version of the package you're using is messing up with the results.

Hi @ivangonzalezacuna,

I have find the problem with this, do you have any offical way to report it for you?

Thank you,

Hi @ivangonzalezacuna,

I have find the problem with this, do you have any offical way to report it for you?

Thank you,

Awesome!

You can simply follow the usual ways to contribute by creating a fork and then a MR to this repo. You can explain in there what was the issue and what fixed it. You can assign it to me, and I will review it as soon as I can.

Or is there something else you wanted to share in private?

Hi @ivangonzalezacuna,
I have find the problem with this, do you have any offical way to report it for you?
Thank you,

Awesome!

You can simply follow the usual ways to contribute by creating a fork and then a MR to this repo. You can explain in there what was the issue and what fixed it. You can assign it to me, and I will review it as soon as I can.

Or is there something else you wanted to share in private?

Unfortunately I cannot handle the fix, but I know what is the bug, private would be better :)

Unfortunately I cannot handle the fix, but I know what is the bug, private would be better :)

In such case, you could also send me a message via Discord for example: ivan.gonzalez

@ivangonzalezacuna
Sorry for the late response, so the plugin works perfectly on OpenShift 4.9.10, but the bug manifests itself in a higher version then 4.12.

Oh, that's such an interesting behavior. Is there any change in any version in between that might have caused the issue?

Anyways, we will proceed with an upgrade of the plugin to the latest backstage release. I'll check if the aws packages can be updated to a newer version. As soon as the MR is ready I can ping you, and you could try out that branch and see if this fixed the issue or not.

@ivangonzalezacuna
That is a good question, unfortunatelly I do not know :/
Thank you very much for your help!

Could you try the code from the MR mentioned above? (#72).

Let's see if this update of the aws-sdk package is fixing the issue you had in newer OpenShift versions

@ivangonzalezacuna how can I test it? I just have to update the plugin version?

You can do it by modifying the used version for these packages. As an example, this is how it should look like:

    "@spreadshirt/backstage-plugin-s3-viewer": "https://github.com/spreadshirt/backstage-plugin-s3.git#11ae77825dfab21b0e51dc25acc69d1a4b0fa4e6",
    "@spreadshirt/backstage-plugin-s3-viewer-common": "https://github.com/spreadshirt/backstage-plugin-s3.git#11ae77825dfab21b0e51dc25acc69d1a4b0fa4e6",
    "@spreadshirt/backstage-plugin-s3-viewer-common": "https://github.com/spreadshirt/backstage-plugin-s3.git#11ae77825dfab21b0e51dc25acc69d1a4b0fa4e6",

What it's after the # is the commitId, but you can also use the branch name for example.

EDIT: The new release has been published. You can also use the latest tags now

You can do it by modifying the used version for these packages. As an example, this is how it should look like:

    "@spreadshirt/backstage-plugin-s3-viewer": "https://github.com/spreadshirt/backstage-plugin-s3.git#11ae77825dfab21b0e51dc25acc69d1a4b0fa4e6",
    "@spreadshirt/backstage-plugin-s3-viewer-common": "https://github.com/spreadshirt/backstage-plugin-s3.git#11ae77825dfab21b0e51dc25acc69d1a4b0fa4e6",
    "@spreadshirt/backstage-plugin-s3-viewer-common": "https://github.com/spreadshirt/backstage-plugin-s3.git#11ae77825dfab21b0e51dc25acc69d1a4b0fa4e6",

What it's after the # is the commitId, but you can also use the branch name for example.

EDIT: The new release has been published. You can also use the latest tags now

@ivangonzalezacuna I have tested with the new version but unfortunately i got the same result :/ Could u test it pls?

Hm, such a pity. Then, the only thing I can think about is that the response when listing keys is broken for the newest versions of OpenShift. We could try to add a fix like I explained here: #68 (comment), by removing duplicated entries (since it's a directory style, I wouldn't expect more than 1 file with the same name, but the API for those versions return that) and see if that fixes the issue. Anyways, might be something that broke in the recent versions and something has been fixed in the meantime