/onyx-amazon-s3

Amazon S3 plugin for Onyx

Primary LanguageClojure

onyx-amazon-s3

Onyx plugin for Amazon S3.

Installation

In your project file:

[org.onyxplatform/onyx-amazon-s3 "0.10.0.0-technical-preview-4"]

ABS TODO

  • Need some way to control batch sizes. batch-timeout is not supported in ABS currently

Functions

Input Task

In your peer boot-up namespace:

(:require [onyx.plugin.s3-input])

Catalog entry:

{:onyx/name task-name
 :onyx/plugin :onyx.plugin.s3-input/input
 :onyx/type :input
 :onyx/medium :s3
 :onyx/batch-size 20
 :onyx/max-peers 1
 :s3/bucket "mybucket"
 :s3/prefix "filter-prefix/example/"
 :s3/deserializer-fn :my.ns/deserializer-fn
 :s3/buffer-size-bytes 10000000
 :onyx/doc "Reads segments from keys in an S3 bucket."}

Lifecycle entry:

{:lifecycle/task <<TASK_NAME>>
 :lifecycle/calls :onyx.plugin.s3-input/s3-input-calls}

Attributes

key type description
:s3/bucket string The name of the s3 bucket to read objects from.
:s3/deserializer-fn keyword A namespaced keyword pointing to a fully qualified function that will deserialize from bytes to segments. Currently only reading from newline separated values is supported, thus the serializer must deserialize line by line.
:s3/prefix string Filter the keys to be read by a supplied prefix.
:s3/file-key string When set, includes the S3 key of file from which the segment's line was read under this key.
Output Task

In your peer boot-up namespace:

(:require [onyx.plugin.s3-output])

Catalog entry:

{:onyx/name <<TASK_NAME>>
 :onyx/plugin :onyx.plugin.s3-output/output
 :s3/bucket <<BUCKET_NAME>>
 :s3/encryption :none
 :s3/serializer-fn :my.ns/serializer-fn
 :s3/key-naming-fn :onyx.plugin.s3-output/default-naming-fn
 :s3/prefix "filter-prefix/example/"
 :s3/serialize-per-element? false
 :onyx/type :output
 :onyx/medium :s3
 :onyx/batch-size 20
 :onyx/doc "Writes segments to s3 files, one file per batch"}

Segments received by this task must be serialized to bytes by the :s3/serializer-fn, into a file per batch, placed at a key in the bucket which is named via the function defined at :s3/key-naming-fn. This function takes an event map and returns a string. Using the default naming function, :onyx.plugin.s3-output/default-naming-fn, will name keys in the following format in UTC time format: "yyyy-MM-dd-hh.mm.ss.SSS_batch_BATCH_UUID".

You can define :s3/encryption to be :aes256 if your S3 bucket has encryption enabled. The default value is :none.

When :s3/serialize-per-element? is set to true, the serializer will be called on each individual segmnt, rather than the whole batch, and will be separated by the string value set in :s3/serialize-per-element-separator.

Lifecycle entry:

{:lifecycle/task <<TASK_NAME>>
 :lifecycle/calls :onyx.plugin.s3-output/s3-output-calls}

Attributes

key type description
:s3/bucket string The name of the s3 bucket to write to
:s3/serializer-fn keyword A namespaced keyword pointing to a fully qualified function that will serialize the batch of segments to bytes
:s3/key-naming-fn keyword A namespaced keyword pointing to a fully qualified function that be supplied with the Onyx event map, and produce an s3 key for the batch.
:s3/prefix string A prefix to prepend to the keys generated by :s3/key-naming-fn.
:s3/content-type string Optional content type for value
:s3/encryption keyword Optional server side encryption setting. One of :sse256 or :none.
:s3/endpoint string The S3 endpoint to write objects to.
:s3/region string The S3 region to write objects to.
:s3/serialize-per-element? boolean Flag for whether to serialize as an entire batch, or serialize per element and separate by newline characters.
:s3/serialize-per-element-separator string String to separate per element strings with. Defaults to newline charactor.

Acknowledgments

Many thanks to AdGoji for allowing this work to be open sourced and contributed back to the Onyx Platform community.

Contributing

Pull requests into the master branch are welcomed.

License

Copyright © 2017 Distributed Masonry LLC

Distributed under the Eclipse Public License, the same as Clojure.