Help with moving bandersnatch to python3.12
Closed this issue · 7 comments
Hi,
I am the maintainer of bandersnatch, but I have very little s3 skills in general - so would love help moving to >= 3.12 + S3Path and cleaning up our usage of s3path which I feel we hacked around issues rather than fixing bugs here when it was contributed.
Feel free to close if you feel this is an abuse of using issues. I'd also love a list of how you'd approach the move if you don't have time to do it, as I am confused in the order I should potentially take and if there is any hope maintaining backwards compatibility or if I should give up with that and just move to >= 3.12 only for S3 support?
Hi @cooperlees
As long that we don't have a lot of open issues to manage I'm OK with keeping this issue as a platform for helping you guys plan the refactor.
The beauty in s3path is that you don't need to know S3, you just need to think about it like a normal file system and use the regular pathlib API.
In this way you can also switch between S3Path and PosixPath when you want to do a local test.
Now about your code, in general I don't see any reason to keep your our S3Path object.
The glob method is optimize now and you are not really using the mkdir method that you write.
Except for that all looks greate
You are not using s3path for copy, this makes sense (it's more optimize then the API that we have from pathlib).
The only think that I change except for the local s3path inherent is not to use s3path privets
For example you have this code:
resource, _ = path._accessor.configuration_map.get_configuration(path)
In python 3.12 it will brake
I recommend changing to this:
import s3path
resource, _ = s3path.configuration_map.get_configuration(path)
not to use s3path privets
I tried to do this like you've said (Please correct on PR if I misuderstood) in regards to configuration_map but it seems to not hold authentication settings when using it this way.
PR: https://github.com/pypa/bandersnatch/pull/1724/files
Example CI Fails: https://github.com/pypa/bandersnatch/actions/runs/9055904268/job/24877694652?pr=1724
botocore.exceptions.NoCredentialsError: Unable to locate credentials
Do we need a public API to do this with auth settings correct? Or does bandersnatch's CI need to set auth settings more than ENV variables and in the test initialization:
https://github.com/pypa/bandersnatch/blob/main/src/bandersnatch/tests/conftest.py#L195
Hi @cooperlees
I deployed a new version so you will have s3path.configuration_map in older Python versions
(s3path version 0.5.7)
About the issue that you have with boto credentials, can you give me a unit test (without your ecosystem) to understand that is the issue here?
I'm using s3 in integration tests and moto in unit tests and I don't see this issue
Thanks for the response. I guess moto mocks more than we do here. I don't know how to simplify the error situation, but I feel it's pretty clear it's something that works when we use the private instances that have been initialized with the auth settings we need vs. your public methods that do not. We just setup auth for minio and hit it via s3 APIs, the integration testing isn't that complex I feel for someone who understands s3 and s3path (which is not me).
I'll go try update to 0.5.7 and see if that helps, but I don't think it will.
All I want to do is move bandersnatch to 3.12, and s3path has made that difficult :(
Sadly same errors with 0.5.7 - I'm not sure what I am meant to do differently with it
@cooperlees
Maybe I'm missing something but it looks like something is weird with your setup
See the code here
The "public" PI that I added is the privet one, it's not really different
Maybe you changed your boto3 version? looks like something in the setup...
Ok, good news. I sat down a read your PRs for 0.5.7 and saw what I misunderstood and we got further! I was able to move to the new public API everywhere (missed a few but they are in the PR linked below).
I now even have CI "passing" in a 3.12 environment, so I thank you for all your help.
- 3.12 GitHib action Passing: pypa/bandersnatch#1728
I think we only have 1 remaining big item (apart from other unitest private usages I'll look into separately after working this out) tho. I'm sure you don't want us to be using from s3path.accessor import _generate_prefix
, so what is your recommendation here for this code?
- Sadly we do use the delete and glob through the "storage plugin" APIs, so I need to maintain them. It does look like we don't but we do (unless I've read things wrong - again, s3 was 100% contributed and I'm learning here) indirectly.