Singer target that uploads loads data to S3 in JSONL format following the Singer spec.
target-s3-jsonl
is a Singer Target which intend to work with regular Singer Tap. It take the output of the tap and export it as a JSON Lines files.
First, make sure Python 3 is installed on your system or follow these installation instructions for Mac or Ubuntu.
It's recommended to use a virtualenv:
python -m venv venv
. venv/bin/activate
pip install --upgrade pip
pip install target-s3-jsonl
python -m venv venv
. venv/bin/activate
pip install --upgrade pip
pip install --upgrade https://github.com/ome9ax/target-s3-jsonl/archive/main.tar.gz
python -m venv ~/.virtualenvs/target-s3-jsonl
source ~/.virtualenvs/target-s3-jsonl/bin/activate
pip install target-s3-jsonl
deactivate
Alternative
python -m venv ~/.virtualenvs/target-s3-jsonl
~/.virtualenvs/target-s3-jsonl/bin/pip install target-s3-jsonl
Like any other target that's following the singer specificiation:
some-singer-tap | target-s3-jsonl --config [config.json]
It's reading incoming messages from STDIN and using the properites in config.json
to upload data into Postgres.
Note: To avoid version conflicts run tap
and targets
in separate virtual environments.
Running the the target connector requires a config.json
file. An example with the minimal settings:
{
"s3_bucket": "my_bucket"
}
Profile based authentication used by default using the default
profile. To use another profile set aws_profile
parameter in config.json
or set the AWS_PROFILE
environment variable.
For non-profile based authentication set aws_access_key_id
, aws_secret_access_key
and optionally the aws_session_token
parameter in the config.json
. Alternatively you can define them out of config.json
by setting AWS_ACCESS_KEY_ID
, AWS_SECRET_ACCESS_KEY
and AWS_SESSION_TOKEN
environment variables.
Full list of options in config.json
:
Property | Type | Mandatory? | Description |
---|---|---|---|
naming_convention | String | (Default: None) Custom naming convention of the s3 key. Replaces tokens date , stream , and timestamp with the appropriate values.Supports datetime and other python advanced string formatting e.g. {stream:_>8}_{timestamp:%Y%m%d_%H%M%S}.json or {stream}/{timestamp:%Y}/{timestamp:%m}/{timestamp:%d}/{timestamp:%Y%m%d_%H%M%S_%f}.json .Supports "folders" in s3 keys e.g. folder/folder2/{stream}/export_date={date}/{timestamp}.json .Honors the s3_key_prefix , if set, by prepending the "filename". E.g. naming_convention = folder1/my_file.json and s3_key_prefix = prefix_ results in folder1/prefix_my_file.json |
|
timezone_offset | Integer | Offset value in hour. Use offset 0 hours is you want the naming_convention to use utc time zone. The null values is used by default. |
|
memory_buffer | Integer | Memory buffer's size used before storing the data into the temporary file. 64Mb used by default if unspecified. | |
temp_dir | String | (Default: platform-dependent) Directory of temporary JSONL files with RECORD messages. | |
compression | String | The type of compression to apply before uploading. Supported options are none (default), gzip , and lzma . For gzipped files, the file extension will automatically be changed to .json.gz for all files. For lzma compression, the file extension will automatically be changed to .json.xz for all files. |
|
local | Boolean | Keep the file in the temp_dir directory without uploading the files on s3 . |
|
s3_bucket | String | Yes | S3 Bucket name |
s3_key_prefix | String | (Default: None) A static prefix before the generated S3 key names. | |
aws_profile | String | AWS profile name for profile based authentication. If not provided, AWS_PROFILE environment variable will be used. |
|
aws_endpoint_url | String | AWS endpoint URL. | |
aws_access_key_id | String | S3 Access Key Id. If not provided, AWS_ACCESS_KEY_ID environment variable will be used. |
|
aws_secret_access_key | String | S3 Secret Access Key. If not provided, AWS_SECRET_ACCESS_KEY environment variable will be used. |
|
aws_session_token | String | AWS Session token. If not provided, AWS_SESSION_TOKEN environment variable will be used. |
|
encryption_type | String | (Default: 'none') The type of encryption to use. Current supported options are: 'none' and 'KMS'. | |
encryption_key | String | A reference to the encryption key to use for data encryption. For KMS encryption, this should be the name of the KMS encryption key ID (e.g. '1234abcd-1234-1234-1234-1234abcd1234'). This field is ignored if 'encryption_type' is none or blank. | |
role_arn | String | The ARN of the role to assume |
Install the tools
pip install .[test,lint]
Run pytest
pytest -p no:cacheprovider
- Update the version number at the beginning of
target-s3-jsonl/target_s3_jsonl/__init__.py
- Merge the changes PR into
main
- Create a tag
git tag -a 1.0.0 -m 'Release version 1.0.0'
- Release the new version in github
Apache License Version 2.0