streamdal/plumber

Support input from STDIN

christos-h opened this issue · 8 comments

It would be cool if I could pipe data into plumber. Something like:

$ cat data.json | plumber write kafka --topic stdinpls

Fantastic idea! How would you see event delimitation to work? One event per line? Something else?

There are 2 options I see:

  1. Separate events by new lines.
{ "a" : 5 }
{ "a" : 10 }

This is how reading from a file works with plumber I believe so behavior would be consistent. The main drawback here is that I don't think this is part of the json spec.

  1. Events are submitted as a JSON array.
[ { "a" : 5 }, { "a" : 10 } ]

This would conform to the json spec (i.e. can be parsed without string-fu). The question here is how to disambiguate between an 'array of events', and 'my event is literally an array'. I guess if your event is an array you could have:

[ [...], [...] ]

Naively I would opt for 2 but I'm not sure. Are there other tools out there which receive collections of json objects as new-line delimited or is this unique to plumber?

I'd opt for #1 as it doesn't require you to transform your input JSON. I think the fact that it's not valid JSON is irrelevant - it is just a transport stream - the resulting JSON input (the 1 line) is what matters.

Another reason - if you're piping data into plumber via CLI, you will be using various other tools to get that done. Minifying JSON and transforming it into a single line would be trivial while transforming the input JSON into a single blob would be quite difficult. Also, if you're piping in 100M events, does that mean you have a single JSON array with a 100 million entries?

Finally - streaming input data - if you do arrays, streaming would be rough.


As for other folks that do newline delimited JSON - you can have newline delimited JSON in S3 and have it be searchable using Athena (not optimal, but hey).


Another option I see potentially viable - delimiters between entries:

{{BEGIN}} 
{"foo": 
  {
  "bar": "baz"
  }
}
{{END}}

I think I still go for option #1. Minifying is easy and less intrusive than having to manage delimiters.

Thoughts?

Option 1 sounds good to me :)

Are there other tools out there which receive collections of json objects as new-line delimited or is this unique to plumber?

The mongodb mongoimport tool which is used to import data into mongo has an optional --jsonArray field which toggles accepting json in array format. For whatever reason this is limited to 16MB but I think this is a good compromise.

You know, we could just support both haha

Yeah! That is the point of the --jsonArray flag 💯

@christoshadjiaslanis piping data has been added in v0.30.0

The default behavior is to treat each newline as a separate message.
If you specify the --json-array flag, it will expect a JSON array as input and treat each object in the top level array as a separate message