megvii-research/megfile

Cannot use s3 related functions

Closed this issue · 3 comments

Sysinfo:

(base) ubuntu@ip-10-53-8-252:~$ pip show megfile
Name: megfile
Version: 2.2.9.post3
Summary: Megvii file operation library
Home-page: https://github.com/megvii-research/megfile
Author: megvii
Author-email: megfile@megvii.com
License: 
Location: /home/ubuntu/miniconda3/lib/python3.10/site-packages
Requires: boto3, botocore, paramiko, pyyaml, requests, tqdm
Required-by: 

(base) ubuntu@ip-10-53-8-252:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 20.04.6 LTS
Release:        20.04
Codename:       focal

Issue:

tried using it in CLI and here's what happened:

(base) ubuntu@ip-10-53-8-252:~$ aws s3 ls s3://dataset-ingested/user-preference/
                           PRE pref_100k_min513x768/
                           PRE pref_100k_min513x768_YIELD/
                           PRE sd-human-ft/
                           PRE sd-user-pref-50k-ft-gpt/
                           PRE sd-user-pref-75k-ft/
                           PRE sd-user-pref-v2-large-full/
2023-07-04 00:41:59          0 
2023-07-11 06:16:49       1444 README.md
(base) ubuntu@ip-10-53-8-252:~$ 
(base) ubuntu@ip-10-53-8-252:~$ megfile ls s3://dataset-ingested/user-preference/

[S3UnknownError] Unknown error encountered: 's3://dataset-ingested/user-preference/', error: botocore.exceptions.ClientError('An error occurred (PermanentRedirect) when calling the ListObjectsV2 operation: The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future requests to this endpoint.'), endpoint: 'https://s3.amazonaws.com'
(base) ubuntu@ip-10-53-8-252:~$ 

the aws credentials have been configured with aws configure and works with AWS cli.

also tried using it in python:

from megfile import smart_walk

s3_directory = 's3://dataset-ingested/user-preference/'

# Walking through the directory
for root, dirs, files in smart_walk(s3_directory):
    print(f"Current directory: {root}")
    print(f"Subdirectories: {dirs}")
    print(f"Files: {files}")
    print("-" * 20)

error message:

---------------------------------------------------------------------------
ClientError                               Traceback (most recent call last)
File ~/miniconda3/lib/python3.10/site-packages/megfile/s3_path.py:1534, in S3Path.is_dir(self, followlinks)
   1533 try:
-> 1534     resp = self._client.list_objects_v2(
   1535         Bucket=bucket, Prefix=prefix, Delimiter='/', MaxKeys=1)
   1536 except Exception as error:

File ~/miniconda3/lib/python3.10/site-packages/botocore/client.py:535, in ClientCreator._create_api_method.<locals>._api_call(self, *args, **kwargs)
    534 # The "self" in this scope is referring to the BaseClient.
--> 535 return self._make_api_call(operation_name, kwargs)

File ~/miniconda3/lib/python3.10/site-packages/botocore/client.py:980, in BaseClient._make_api_call(self, operation_name, api_params)
    979     error_class = self.exceptions.from_code(error_code)
--> 980     raise error_class(parsed_response, operation_name)
    981 else:

ClientError: An error occurred (PermanentRedirect) when calling the ListObjectsV2 operation: The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future requests to this endpoint.

The above exception was the direct cause of the following exception:

S3UnknownError                            Traceback (most recent call last)
/home/ubuntu/dev/data-processings/ingested_gdl_twitter_processings.ipynb Cell 18 line 6
      3 s3_directory = 's3://dataset-ingested/user-preference/'
      5 # Walking through the directory
----> 6 for root, dirs, files in smart_walk(s3_directory):
      7     print(f"Current directory: {root}")
      8     print(f"Subdirectories: {dirs}")

File ~/miniconda3/lib/python3.10/site-packages/megfile/s3_path.py:2040, in S3Path.walk(self, followlinks)
   2037 if not bucket:
   2038     raise UnsupportedError('Walk whole s3', self.path_with_protocol)
-> 2040 if not self.is_dir():
   2041     return
   2043 stack = [key]

File ~/miniconda3/lib/python3.10/site-packages/megfile/s3_path.py:1540, in S3Path.is_dir(self, followlinks)
   1537     error = translate_s3_error(error, self.path_with_protocol)
   1538     if isinstance(error,
   1539                   (S3UnknownError, S3ConfigError, S3PermissionError)):
-> 1540         raise error
   1541     return False
   1543 if not key:  # bucket is accessible

S3UnknownError: Unknown error encountered: 's3://dataset-ingested/user-preference/', error: botocore.exceptions.ClientError('An error occurred (PermanentRedirect) when calling the ListObjectsV2 operation: The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future requests to this endpoint.'), endpoint: 'https://s3.amazonaws.com'

the bucket I'm trying to get is in the same region as my configuration (from aws configure). any clue on what happened? thanks.

I guess it's a bug. megfile not get all configurations from file. Are you setup region_name by aws configure ?

I test the region configuration in file is working.
This error message means the region you using is different from the bucket's region. You may check the region configuration.
If region is right, please show debug logs to me, like this:

import logging
logging.basicConfig(level=logging.DEBUG)

Thanks.

Reopen if the question is still existing.