.------..------..------..------..------..------..------..------.
|R.--. ||E.--. ||D.--. ||I.--. ||F.--. ||E.--. ||S.--. ||T.--. |
| :(): || (\/) || :/\: || (\/) || :(): || (\/) || :/\: || :/\: |
| ()() || :\/: || (__) || :\/: || ()() || :\/: || :\/: || (__) |
| '--'R|| '--'E|| '--'D|| '--'I|| '--'F|| '--'E|| '--'S|| '--'T|
`------'`------'`------'`------'`------'`------'`------'`------'
Does what it says on the tin: Generates a Redshift .manifest file given a list of S3 buckets. Can write said manifest to file, or back to S3.
Create a manifest generator with your AWS creds:
>>> gen = ManifestGenerator('aws_access_key_id', 'aws_secret_access_key')
If following the boto convention for creds in env variables, you do not need to pass them in.
Generate a manifest:
>>> manifest = gen.generate_manifest(['mybucket/folder1/folder2',
'mybucket/folder3'])
{'entries': [{'mandatory': True,
'url': u's3://mybucket/folder1/folder2/foo.json'},
{'mandatory': True,
'url': u's3://mybucket/folder1/folder2/bar.json'},
{'mandatory': True,
'url': u's3://mybucket/folder3/bar.json'}]}
Generate a manifest with filtering on S3 keys (e.g. any file that contains .bzip2
in its name):
>>> manifest = gen.generate_manifest(['mybucket/folder1/folder2',
'mybucket/folder3'], filter=".bzip2")
You can provide an optional path to write the manfifest back to S3 as part of the call to generate the manifest:
>>> gen.generate_manifest(['mybucket/folder1/folder2', 'mybucket/folder3'],
target='mybucket/manifest_files/qux.manifest')
Or write the dict you generated in generate_manifest
:
>>> gen.write_manifest(manifest, 'mybucket/manifest_files/qux.manifest')
You can also write to file:
>>> gen.generate_manifest(['mybucket/folder1/folder2', 'mybucket/folder3'],
path='qux.manifest')