Can't download datasets if `.aws` config is present
pvk-developer opened this issue · 0 comments
pvk-developer commented
Environment Details
Please indicate the following details about the environment in which you found the bug:
- SDGym version: 0.6.1
- Python version: Any
- Operating System: MacOS / Unix / Ubuntu
Error Description
When running on your local environment and it happens to have .aws/
folder with some configuration in it for your AWS
, you end up getting the following error:
ClientError: An error occurred (InvalidAccessKeyId) when calling the GetObject operation: The AWS Access Key Id you provided does not exist in our records.
Steps to reproduce
In order to reproduce the steps create a .aws
folder in your home
: mkdir ~/.aws
then create a file called credentials and add:
[default]
aws_access_key_id = <your id>
aws_secret_access_key = <your access key>
PS: In order for this to work make sure that you have cleared the cache of the downloaded datasets.
import sdgym
In [4]: sdgym.benchmark_single_table(synthesizers=['GaussianCopulaSynthesizer'], sdv_datasets=['student_plac
...: ements'], timeout=22)
---------------------------------------------------------------------------
ClientError Traceback (most recent call last)
Cell In[4], line 1
----> 1 sdgym.benchmark_single_table(synthesizers=['GaussianCopulaSynthesizer'], sdv_datasets=['student_placements'], timeout=22)
File ~/Projects/sdv-dev/SDGym/sdgym/benchmark.py:507, in benchmark_single_table(synthesizers, custom_synthesizers, sdv_datasets, additional_datasets_folder, limit_dataset_size, compute_quality_score, sdmetrics, timeout, output_filepath, detailed_results_folder, show_progress, multi_processing_config)
503 _validate_inputs(output_filepath, detailed_results_folder, synthesizers, custom_synthesizers)
505 _create_detailed_results_directory(detailed_results_folder)
--> 507 job_args_list = _generate_job_args_list(
508 limit_dataset_size, sdv_datasets, additional_datasets_folder, sdmetrics,
509 detailed_results_folder, timeout, compute_quality_score, synthesizers, custom_synthesizers)
511 scores = _run_jobs(multi_processing_config, job_args_list, show_progress)
512 if output_filepath:
File ~/Projects/sdv-dev/SDGym/sdgym/benchmark.py:90, in _generate_job_args_list(limit_dataset_size, sdv_datasets, additional_datasets_folder, sdmetrics, detailed_results_folder, timeout, compute_quality_score, synthesizers, custom_synthesizers)
88 datasets = []
89 if sdv_datasets is not None:
---> 90 datasets = get_dataset_paths(sdv_datasets, None, None, None, None)
92 if additional_datasets_folder:
93 additional_datasets = get_dataset_paths(None, None, additional_datasets_folder, None, None)
File ~/Projects/sdv-dev/SDGym/sdgym/datasets.py:200, in get_dataset_paths(datasets, datasets_path, bucket, aws_key, aws_secret)
196 else:
197 datasets = _get_available_datasets(
198 'single_table', bucket=bucket)['dataset_name'].tolist()
--> 200 return [
201 _get_dataset_path('single_table', dataset, datasets_path, bucket, aws_key, aws_secret)
202 for dataset in datasets
203 ]
File ~/Projects/sdv-dev/SDGym/sdgym/datasets.py:201, in <listcomp>(.0)
196 else:
197 datasets = _get_available_datasets(
198 'single_table', bucket=bucket)['dataset_name'].tolist()
200 return [
--> 201 _get_dataset_path('single_table', dataset, datasets_path, bucket, aws_key, aws_secret)
202 for dataset in datasets
203 ]
File ~/Projects/sdv-dev/SDGym/sdgym/datasets.py:60, in _get_dataset_path(modality, dataset, datasets_path, bucket, aws_key, aws_secret)
57 if local_path.exists():
58 return local_path
---> 60 download_dataset(
61 modality, dataset, dataset_path, bucket=bucket, aws_key=aws_key, aws_secret=aws_secret)
62 return dataset_path
File ~/Projects/sdv-dev/SDGym/sdgym/datasets.py:36, in download_dataset(modality, dataset_name, datasets_path, bucket, aws_key, aws_secret)
34 LOGGER.info('Downloading dataset %s from %s', dataset_name, bucket)
35 s3 = get_s3_client(aws_key, aws_secret)
---> 36 obj = s3.get_object(Bucket=bucket_name, Key=f'{modality.upper()}/{dataset_name}.zip')
37 bytes_io = io.BytesIO(obj['Body'].read())
39 LOGGER.info('Extracting dataset into %s', datasets_path)
File ~/.virtualenvs/SDGym/lib/python3.8/site-packages/botocore/client.py:530, in ClientCreator._create_api_method.<locals>._api_call(self, *args, **kwargs)
526 raise TypeError(
527 f"{py_operation_name}() only accepts keyword arguments."
528 )
529 # The "self" in this scope is referring to the BaseClient.
--> 530 return self._make_api_call(operation_name, kwargs)
File ~/.virtualenvs/SDGym/lib/python3.8/site-packages/botocore/client.py:960, in BaseClient._make_api_call(self, operation_name, api_params)
958 error_code = parsed_response.get("Error", {}).get("Code")
959 error_class = self.exceptions.from_code(error_code)
--> 960 raise error_class(parsed_response, operation_name)
961 else:
962 return parsed_response
ClientError: An error occurred (InvalidAccessKeyId) when calling the GetObject operation: The AWS Access Key Id you provided does not exist in our records.