Add support for saving the raw value on GCS
Closed this issue · 4 comments
I'm using the following configuration:
format.output.type: jsonl
format.output.fields: value
By result the connector is saving objects on GCS with this format:
{"value":{"id":"foo"}}
{"value":{"id":"bar"}}
What I'd like to have is:
{"id":"foo"}
{"id":"bar"}
This would helpful for example when using GCS as source of BigQuery, so we could avoid too much nesting when defining the schema and also when querying the data.
One solution would adding a new output field plainValue
and here checking the output field, in case it is plainValue
we could use ValuePlainWriter otherwise JsonLinesOutputWriter
as default.
Any other suggestions are welcome too, if you like the idea I am willing to raise a PR. Thank you!
Hi @neemiasjnr
Looks like our configuration could be improved!
I believe there's a workaround to achieve what you want. Could you please try:
value.converter=org.apache.kafka.connect.converters.ByteArrayConverter
format.output.type=csv
format.output.fields=value
format.output.fields.value.encoding=none
However, this will work only if record values are already single-line JSON strings.
You contribution would be very much welcome!
I can suggest a slightly different idea, something like this: We can introduce a new configuration format.output.json.envelope
(true
by default to keep backward compatibility), which controls if JSON outputs should be enveloped in this "key": ... "value": ...
thing. A check is needed to allow disabling the envelope only when there is only one field in format.output.fields
. It can be ignored when the output type is not json
or jsonl
.
How does it sound?
@HelenMel would be happy to help you.