embulk/embulk-output-sftp

Multiple output files - SFTP

shbadawy opened this issue · 3 comments

Hello,

I am trying to get data from Redshift / S3 using their input plugins, to SFTP server as CSV using SFTP output plugin. The output is always divided into 4 sequential files.

For example, if my data size is 4MB I get 4 files 1MB each ( 0_test.csv, 1_test.csv, 2_test.csv, 3_test.csv)

Is there a way to get them into one file?

Thanks

Hi @shbadawy Try the following exec settings It may solve the problem.

exec:
  max_threads: 1
  min_output_tasks: 1

in:
  type: something
  ...

out:
  type: sftp
  ...

This document may also be helpful.
https://www.embulk.org/docs/built-in.html

The min_output_tasks option enables “page scattering”. The feature is enabled if number of input tasks is less than min_output_tasks. It uses multiple filter & output threads for each input task so that one input task can use multiple threads. Setting larger number here is useful if embulk doesn’t use multi-threading with enough concurrency due to too few number of input tasks. Setting 1 here disables page scattering completely.

Thanks @morihaya for sharing