Support for JSON Stream
francescorivola opened this issue · 4 comments
Hi ClickHouse Js team,
We are currently migrating our solution from https://github.com/TimonKK/clickhouse to your lib. In the former there is a support for Stream JSON using JSONStream lib under the hood. However at the moment we haven't found stream support in the lib for JSON format.
In this issue I would like to ask you two questions:
- Is there any plan to provide stream support to JSON Format in the future?
- At the moment we plan to overcome this limitation by using a stream with CSVWithNamesAndTypes format and transform it to js objects using the following Stream Tranformer https://gist.github.com/francescorivola/c9438128f3a4de8f63e6dce9fbbe11da (not completed yet but It should explain the idea). Do you think or recommend a better way to achieve the same goal?
BTW: I thought writing this question in the slack community (as @slvrtrn pointed out to me few days ago) but I haven't found there a specific node.js driver channel, so I thought to create a new question issue here. Please let me know if you have any other preference to this specific type of questions/issues.
Thank you in advance for you help and support. It is really appreciated.
Is there any plan to provide stream support to JSON Format in the future?
It supports streaming of the following JSON family formats (see the EachRow
in the format name except JSONObjectEachRow):
- JSONEachRow <- easiest to use.
- JSONStringsEachRow
- JSONCompactEachRow
- JSONCompactStringsEachRow
- JSONCompactEachRowWithNames
- JSONCompactEachRowWithNamesAndTypes
- JSONCompactStringsEachRowWithNames
- JSONCompactStringsEachRowWithNamesAndTypes
"Raw" formats can be streamed in and out, too:
- CSV
- CSVWithNames
- CSVWithNamesAndTypes
- TabSeparated
- TabSeparatedRaw
- TabSeparatedWithNames
- TabSeparatedWithNamesAndTypes
- CustomSeparated
- CustomSeparatedWithNames
- CustomSeparatedWithNamesAndTypes
- Parquet
I checked the doc entry: https://clickhouse.com/docs/en/integrations/language-clients/javascript#insert-streaming-in-nodejs - I understand now why this could be unclear.
"JSON format" in the doc means ClickHouse JSON Format, which is a single object (NOT to be confused with JSON itself).
The following ClickHouse formats could not be streamed (see the returned object shape here):
Thanks for raising this issue; the doc needs to be updated for clarity (plus add a few more "non-streamable" formats there).
There is an example using NDJSON file and JSONCompactEachRow format.
If you want to use a regular JSONEachRow with a stream, which just accepts "normal" full serialized objects, then it's something like this:
// Assuming we read a few objects from somewhere.
function getStream() {
const stream = new Stream.Readable({
objectMode: true, // required for JSON* family formats
read() {
//
},
})
stream.push({ id: 42, name: 'foo' })
stream.push({ id: 144, name: 'bar' })
stream.push(null) // this closes the stream
return stream
}
const stream = getStream()
await client
.insert({
table: tableName,
values: stream,
format: 'JSONEachRow',
})
// check that the table has 2 records now
const rs = await client.query({
query: `SELECT * from ${tableName}`,
format: 'JSONEachRow',
})
console.log(await rs.json())
Hi @slvrtrn ,
First of all thank you very much for your detailed reply.
Our use case is related to reading a stream, not insert one. If I am getting the library right no JSON format is supported right now in that scenario. Am I right?
Thank you
Ok, sorry, my bad. I am checking the output of a reading stream with JSONEachRow and is working. you right. Sorry :)
{
text: `{"productBrand":"brand'1","numberOfPurchases":2,"numberOfBuyers":1}`,
json: [Function: json]
},
{
text: `{"productBrand":"brand'2","numberOfPurchases":2,"numberOfBuyers":2}`,
json: [Function: json]
}
]
@francescorivola, we actually have an example with JSONEachRow streaming for selects if that is what you are looking for as well.
NB: when using Query and traversing the stream from the ResultSet, for/await and on(data) approaches are interchangeable; it just turns out that on(data) is less runtime overhead (well, at least it was as of Node.js 16 last time I checked). Here's a CSV example with on(data) - it can be applied to JSONEachRow, too (just replace the format
in the query call; also, you might want to use row.json()
instead of row.text
if you want to get the actual objects to work with, depends on your use case).