Support input from STDIN
christos-h opened this issue · 8 comments
It would be cool if I could pipe data into plumber. Something like:
$ cat data.json | plumber write kafka --topic stdinpls
Fantastic idea! How would you see event delimitation to work? One event per line? Something else?
There are 2 options I see:
- Separate events by new lines.
{ "a" : 5 }
{ "a" : 10 }
This is how reading from a file works with plumber
I believe so behavior would be consistent. The main drawback here is that I don't think this is part of the json spec.
- Events are submitted as a JSON array.
[ { "a" : 5 }, { "a" : 10 } ]
This would conform to the json spec (i.e. can be parsed without string-fu). The question here is how to disambiguate between an 'array of events', and 'my event is literally an array'. I guess if your event is an array you could have:
[ [...], [...] ]
Naively I would opt for 2
but I'm not sure. Are there other tools out there which receive collections of json objects as new-line delimited or is this unique to plumber
?
I'd opt for #1 as it doesn't require you to transform your input JSON. I think the fact that it's not valid JSON is irrelevant - it is just a transport stream - the resulting JSON input (the 1 line) is what matters.
Another reason - if you're piping data into plumber via CLI, you will be using various other tools to get that done. Minifying JSON and transforming it into a single line would be trivial while transforming the input JSON into a single blob would be quite difficult. Also, if you're piping in 100M events, does that mean you have a single JSON array with a 100 million entries?
Finally - streaming input data - if you do arrays, streaming would be rough.
As for other folks that do newline delimited JSON - you can have newline delimited JSON in S3 and have it be searchable using Athena (not optimal, but hey).
Another option I see potentially viable - delimiters between entries:
{{BEGIN}}
{"foo":
{
"bar": "baz"
}
}
{{END}}
I think I still go for option #1. Minifying is easy and less intrusive than having to manage delimiters.
Thoughts?
Option 1 sounds good to me :)
Are there other tools out there which receive collections of json objects as new-line delimited or is this unique to plumber?
The mongodb mongoimport tool which is used to import data into mongo has an optional --jsonArray
field which toggles accepting json in array format. For whatever reason this is limited to 16MB but I think this is a good compromise.
You know, we could just support both haha
Yeah! That is the point of the --jsonArray
flag 💯