layerware/hugsql

Batch Inserts?

jbaiter opened this issue · 5 comments

Is it possible to do "real" batch inserts (à la clojure.java.jdbc/insert!) with hugsql?

I just tried to import ~8000 records with the syntax from the documentation:

-- :name create-sentences! :! :n
-- :doc creates a number of new sentence records
INSERT INTO sentences
(tokens, tags, document_id)
VALUES :t*:sentences
(->> sentences
     (map #(vec (map % [:tokens :tags :document_id])))
     vec
     (#(q/create-sentences! {:sentences %}))))

This yields a "Too many SQL variables" exception (I'm using SQLite).

Doing the same with clojure.java.jdbc/insert!, however, works like a charm:

(->> sentences                                                                          
     (apply (partial jdbc/insert! q/conn :sentences [:tokens :tags :document_id]))
     (reduce +)))

The Tuple List Parameter supports multi-record insert for the INSERT...VALUES (...),(...) syntax, but you can still hit some JDBC driver limits with this, which is what is happening here. The limits may be higher with other database drivers.

For these kinds of batch cases, you can map or doseq through your batch calling your single-record insert or multi-record insert (cut up your batch with partition-all).

On the other hand, I say know your tools and use them to their full extent. Using clojure.java.jdbc/insert! is fine. It's already internally mapping over insert-single-row-sql, so why not use it instead of needing to write your own?

But, I can see how this is confusing, and I think this at least needs some more documentation with an example or two.

Thanks for the clarification! I had a solution with map-ing over the single insert statement before and I was under the impression that the insert! method was faster, but that may just as well have been the placebo effect, I did not do any proper profiling.
Also, thank you for creating the library, I've been looking for a replacement to yesql and this looks very much like it :-)

Thanks again for the report!

...I was under the impression that the insert! method was faster, but that may just as well have been the placebo effect...

insert! may have been faster if it was a single transaction and your map-ing solution was multiple transactions. See Transactions.

I was curious on the actual limit of parameters. It looks like ~32k for postgresql, and ~2k for sql server.

http://www.postgresql.org/message-id/10218.1208960238@sss.pgh.pa.us
http://stackoverflow.com/questions/14631346/java-jdbc-prepared-statement-maximum-parameter-markers

edit: Also mentioned in the SO post is that for postgres, before 9.4 every record in a batch did a round trip if a generated key was returned. Am I reading that right? Or is that a different kind of jdbc "batch"?