andrewoma/kwery

Batch upsert fails when some raws already exist

Closed this issue · 1 comments

Consider following scenario: I want to batch insert raws into my postgres table. But some raws already exist. So I use UPSERT.

If I do this in kwery, I get exception: java.lang.IllegalArgumentException: Expected 10 keys but received 8.

DDL

CREATE TABLE user (
    name TEXT NOT NULL PRIMARY KEY
);

Script:

import com.github.andrewoma.kwery.core.SessionFactory
import com.github.andrewoma.kwery.core.dialect.PostgresDialect
import org.postgresql.ds.PGSimpleDataSource
import java.io.File
import javax.sql.DataSource

fun main(args: Array<String>) {
    val ds: DataSource = PGSimpleDataSource().apply {
        url = "jdbc:postgresql://localhost:5432/user_db"
        user = "root"
        password = ""
    }
    val factory = SessionFactory(ds, PostgresDialect())

    val users = File("users.csv").readLines().map { mapOf("name" to it) }
    factory.use { 
        it.batchInsert("""
            INSERT INTO user(name)
            VALUES (:name)
            ON CONFLICT (name) DO NOTHING""", users, f = {}).sumBy { it.first } 
    }
    println("done")
}

Exception stack trace:

Exception in thread "main" java.lang.IllegalArgumentException: Expected 100 keys but received 0
	at com.github.andrewoma.kwery.core.DefaultSession$batchInsert$2.invoke(DefaultSession.kt:117)
	at com.github.andrewoma.kwery.core.DefaultSession$batchInsert$2.invoke(DefaultSession.kt:46)
	at com.github.andrewoma.kwery.core.DefaultSession.withPreparedStatement(DefaultSession.kt:252)
	at com.github.andrewoma.kwery.core.DefaultSession.batchInsert(DefaultSession.kt:103)
	at com.github.andrewoma.kwery.core.Session$DefaultImpls.batchInsert$default(Session.kt:86)

Kwery version: 0.17

I believe root cause is that kwery expects sql server to send generated key for each entry from parameter list (which is no longer the case in my scenario).
DefaultSession.kt:117:
require(keys.size == parametersList.size) { "Expected ${parametersList.size} keys but received ${keys.size}" }

Hi,

*insert has a very specific use case, handling statements that generate keys.

It doesn't make much sense to have a batch of statements that may or may not generate keys (as presumably you'd want to use those keys and associate them with something).

Use batchUpdate instead to do batches of inserts or updates that do not generate keys.