A Singer target that egresses data via CSV+SFTP. Ultimately the destination is configured via a Singer config.json, although currently this is only targeted to outbound to Storeforce.
To use,
- Install Python 3 (see .python-version).
- Install the application and its dependencies.
- Create a
config.json
file which will configure the component. An example follows, with descriptions of the configuration fields following. - Run the target. Usually this involves also running a Singer tap and piping the tap's output to the target:
tap-snowflake | target-storeforce --config ./config.json
{
"delimiter" : ",",
"destination_path" : "/destination/",
"file_mapping": {
"stream_name": "DATA_NONPROD-DA3-V_STOREFORCE_POS",
"file_name": "pos.txt",
"output_map": [
"STORE_CODE",
"TRANSACTION_DATE",
"TRANSACTION_SLOT",
"SALE_TRANS",
"SALE_VALUE",
"SALE_UNITS",
"REFUND_TRANS",
"REFUND_VALUE",
"REFUND_UNITS"
]
},
"header" : true,
"host" : "sftp_server_address",
"password" : "sftp_password",
"port" : 22,
"private_key_file" : "/location/id_rsa",
"quotechar" : "'",
"username" : "user",
"timestamp_file" : true
}
delimiter
: The field delimiter used in the output file. Note that carriage-returns are used for the record delimiter. Defaults to,
.destination_path
: The destination directory on the remote SFTP server, relative to the SFTP user's mount point. In the above sample, if theuser
SFTP user's SFTP mount point is/home/user
, then the destination directory's fully-qualified path would be/home/user/destination
.file_mapping
: Configures how to mapRECORD
message fields to the tabular format:stream_name
: The data stream to pull records from (as indicated by thestream
attribute in theRECORD
messages.file_name
: Overrides the filename to be used when uploading the file, in conjunction with thedestination_path
. Using the above sample, the file would be uploaded to the SFTP host at/home/user/destination/pos.txt
. IMPORTANT NOTE: The current vendor, Storeforce, requires a singular, previously-defined filename, that gets overwritten on every execution. Currently the value for this ispos.txt
(i.e."file_name": "pos.txt"
). Unless circumstances change, using any value other thanpos.txt
will break the integration. You have been warned.output_map
: Indicates the order of columns in the output file. Each message is parsed and the fields extracted; they are then written to the tabular data in the order indicated byoutput_map
. In the above sample, the first column of each record would be the value of theSTORE_CODE
message property; the second column would beTRANSACTION_DATE
, and so on.
header
: Toggles to write a header line to the output file. Defaults tofalse
.host
: The SFTP host to upload to.password
: The SFTP user's password. Eitherpassword
orprivate_key_file
can be used, althoughprivate_key_file
is strongly recommended.port
: The SFTP port to connect to (defaults to port 22).private_key_file
: The path, local to the machine the app is running on, to the RSA key file that should be used to authenticate to the SFTP host. Eitherpassword
orprivate_key_file
can be used, althoughprivate_key_file
is strongly recommended.quotechar
: The character to use for strings containing embeddeddelimeter
s. Defaults to'
.username
: The SFTP username to use.timestamp_file
: Toggles whether or not to mark the staged file (not the final, uploaded file) using a timestamp. Defaults totrue
.
- Install Python 3.* as you see fit (I like
asdf
but whatever tickles your fancy). - Run
make setup
. This does the following (feel free to execute these steps yourself as an alternative): - Installs
poetry
. - Installs
direnv
. - Runs
poetry
to pull dependencies. - Runs
pytest
. - Calculates and reports test coverage.
- Lints the code.
The project is built and managed by poetry
, which
uses pyproject.toml
for just about everything. It's not
perfect, but it seems like the best option, at least right now.
The project has been set up to enforce good practices, including linted code and
test coverage. As currently configured, the project will fail at CircleCI if
tests fail, test coverage drops below 70%, or the linter produces any errors.
make build
will check all of these, and each one can be run independently
using make test
, make coverage
, or make lint
, respectively. Circle will
run each as an independent step for troubleshooting and traceability purposes.
Use semantic versioning. Use Git tags to tag releases; e.g. git tag -a 0.1.0 -m 0.1.0 && git push --tags
.
The project is built and managed by poetry
, which as of this writing shows a
lot of promise but is a bit buggy. Namely, private package hosting and
retrieval (e.g. Gemfury) is quite wonky, so for the time being the release
target make dist
has been commented.
The pattern being used here is in 3 parts: the entrypoint, the target, and the generator.
This is marked as the package executable under the [tools.poetry.scripts]
directive in the project defintion. In other words, poetry run target-storeforce
(or the equivalent) kicks this off. It simply parses the
standard Singer configuration file (see below),
instantiates a Target
, and then calls Target#run
, passing in
stdin
and stdout
.
The Target
's responsibility is to manage the application's lifecycle. In the
case of this application, this entails
- performing some setup,
- initiating the generator to build a CSV file,
- SFTP-ing the file (using the
bonobos-singer-support
package), - and finally emtting state to the output stream (i.e.
stdout
) for Singer.
The generator's responsibility is to consume messages from the input stream
(i.e. stdin
) and build a data file by handling the various message type(s)
appropriately. Currently this is implemented as a simple function
(target_storeforce.generator:generate_csv_from_messages
), but this may not
always be the case. Note also that the module itself has some cruft, left over
from the prior implementation in target-sftp
; namely
target_storeforce.generator:output_state
,
target_storeforce.generator:persist_messages_csv
and
target_storeforce.generator:send_file
. These have been deprecated and should
eventually be removed for clarity.