/redifest

Generate a Redshift .manifest file for a given S3 bucket

Primary LanguagePythonMIT LicenseMIT

.------..------..------..------..------..------..------..------.
|R.--. ||E.--. ||D.--. ||I.--. ||F.--. ||E.--. ||S.--. ||T.--. |
| :(): || (\/) || :/\: || (\/) || :(): || (\/) || :/\: || :/\: |
| ()() || :\/: || (__) || :\/: || ()() || :\/: || :\/: || (__) |
| '--'R|| '--'E|| '--'D|| '--'I|| '--'F|| '--'E|| '--'S|| '--'T|
`------'`------'`------'`------'`------'`------'`------'`------'

Does what it says on the tin: Generates a Redshift .manifest file given a list of S3 buckets. Can write said manifest to file, or back to S3.

API

Create a manifest generator with your AWS creds:

>>> gen = ManifestGenerator('aws_access_key_id', 'aws_secret_access_key')

If following the boto convention for creds in env variables, you do not need to pass them in.

Generate a manifest:

>>> manifest = gen.generate_manifest(['mybucket/folder1/folder2', 
                                      'mybucket/folder3'])

{'entries': [{'mandatory': True,
              'url': u's3://mybucket/folder1/folder2/foo.json'},
             {'mandatory': True,
              'url': u's3://mybucket/folder1/folder2/bar.json'},
             {'mandatory': True,
               'url': u's3://mybucket/folder3/bar.json'}]}

Generate a manifest with filtering on S3 keys (e.g. any file that contains .bzip2 in its name):

>>> manifest = gen.generate_manifest(['mybucket/folder1/folder2', 
                                      'mybucket/folder3'], filter=".bzip2")

You can provide an optional path to write the manfifest back to S3 as part of the call to generate the manifest:

>>> gen.generate_manifest(['mybucket/folder1/folder2', 'mybucket/folder3'],
                          target='mybucket/manifest_files/qux.manifest')

Or write the dict you generated in generate_manifest:

>>> gen.write_manifest(manifest, 'mybucket/manifest_files/qux.manifest')

You can also write to file:

>>> gen.generate_manifest(['mybucket/folder1/folder2', 'mybucket/folder3'],
                          path='qux.manifest')