mgirlich/jsontools

RSQLite (JSON functions) vs jqr/jq

Closed this issue · 2 comments

Hi there, I stumbled upon your package repo while going through some issues I've found with jqr and I noticed you recently made a large commit (8f6fe42) that suggests you've decided to use RSQLite's JSON utilities instead of jqr. I'm curious about how you got to that decision (since I'm probably in a similar situation myself) and if you've done some performance evaluations of that architecture change. My impression is that jqr is quite fast, and I'm wondering if the overhead of SQLite communication adds much of a performance penalty. (I also don't know if perhaps SQLite is just using jqr -- or something similar -- under their hood, too :-).

I agree that jq is very fast and extremely powerful. But I had some issues with jq itself and the R package jqr:

  • Major: jq doesn't work correctly with big integers (see jqlang/jq#2182). It is fixed on Github but not yet released...
  • Major: jqr sometimes returns incorrect results (see ropensci/jqr#80).
  • Minor: jqr pastes bare numbers together (see ropensci/jqr#79).
  • Minor: jqr throws an error for NA input (see ropensci/jqr#78). Easy to workaround but I still don't like it.
  • I struggle with the pipe evaluation mechanics of jqr.
  • Many people have RSQLite installed but not jqr.
  • I wanted to follow more closely the ideas and logic of the JSON functions as added in SQL:2016. This way a dbplyr translation seems easier/more realistic and people will be more familiar to it.
  • I think the syntax in SQLite is easier to work with.

I haven't done any performance tests but SQLite seemed pretty fast to me.

So, I think both options are good to work with. If you are familiar with SQL and you don't need to do overly complex operations or you rather need to work with tabular data in the end I would suggest you try using SQLite.

Btw if speed is your concern you might also have a look at the json parser rcppsimdjson.