superfly/corrosion

Streaming rows as comma-separated JSON that form and array

jeromegn opened this issue · 2 comments

Not all languages / frameworks / libraries have an easy time dealing with NDJSON out of the box, even if they have streaming capabilities.

It might be interesting to explore and alternate format: streaming as a normal JSON array.

How? First we send the [ character, then we send each row as an independent JSON object (well, an array? could be configurable to set the column names properly), send a comma in between, send a ] at the end.

This way we don't have to hold onto all the rows in-memory but can still stream a normal JSON array.

I just submitted a basic implementation in #110.

This works:

lillian@mbp corrosion % curl -s -H content-type:application/json 'localhost:8080/v1/queries?ndjson=false' -d '"select * from todos"' | jq
[
  {
    "columns": [
      "id",
      "title",
      "completed_at"
    ]
  },
  {
    "row": [
      1,
      [
        1,
        "get groceries",
        1701627653
      ]
    ]
  },
  {
    "row": [
      2,
      [
        2,
        "write some code",
        null
      ]
    ]
  },
  {
    "eoq": {
      "time": 9.59E-7
    }
  }
]

NDJSON still works as well (at least it seems like it works? the test is failing!):

lillian@mbp corrosion % curl -s -H content-type:application/json 'localhost:8080/v1/queries?ndjson=true' -d '"select * from todos"'
{"columns":["id","title","completed_at"]}
{"row":[1,[1,"get groceries",1701627653]]}
{"row":[2,[2,"write some code",null]]}
{"eoq":{"time":2.583e-6}}

(it took me a minute to figure out that it still returns NDJSON, jq apparently knows how to parse it)

Unfortunately my current implementation requires the client to specify whether it wants NDJSON or not; there's no default at the moment.

Also... I'm not sure I like getting data like this in the first place. For a client that doesn't support (or want) NDJSON, it would be better to parse a JSON object like this:

{
  "columns": [
    "id",
    "title",
    "completed_at"
  ],
  "rows": [
    [
      1,
      [
        1,
        "get groceries",
        1701627653
      ]
    ],
    [
      2,
      [
        2,
        "write some code",
        null
      ]
    ]
  ],
  "eoq": {
    "time": 9.59E-7
  }
}

Maybe we could have NDJSON, regular comma-separated JSON, and this third method as options, while defaulting to NDJSON? Let me know!

I think this might be more practical if there was a way to specify more settings about the expected format, e.g. "just send me JSON objects in an array" where the column names would be used as fields in the JSON object, no need for a columns object at the start.

eoq is not be necessary, but it doesn't hurt anything. Possibly something configurable too.

The reason for NDJSON is so clients don't need to buffer large quantities of data. It's not always possible or required to do that so the alternative comma-separated format sounds nice.

I think the default format for a client that just wants a big valid JSON array should be:

[
  {"col1": "value", "col2": 12345, "col3": null},
  {"col1": "value", "col2": 12345, "col3": null},
  {"col1": "value", "col2": 12345, "col3": null}
]

because they'll probably have to re-arrange the data themselves if we're sending it in a different format.