Error with reliabilityLevel = NO_DUPLICATES and sequence in primary key
alexey-abrosin opened this issue · 0 comments
alexey-abrosin commented
Hello guys.
I bumped into a bug during writing to a table with both sequance as a primary key and reliabilityLevel = NO_DUPLICATES
option.
Steps to reproduce:
- Create target table and PK sequence for it
drop sequence if exists stg.test_table_seq
create sequence stg.test_table_seq as int start with 1 increment by 1;
drop table if exists stg.test_table;
create table stg.test_table (
pk int not null primary key clustered default (next value for stg.test_table_seq),
some_long_value int not null unique with (ignore_dup_key = on)
);
- Build and run spark application
import org.apache.spark.sql.types._
import org.apache.spark.sql.{DataFrame, Row, SaveMode, SparkSession}
import java.util
object Main {
def main(args: Array[String]): Unit = {
val spark: SparkSession = SparkSession
.builder()
.appName("TestApp")
.getOrCreate()
val schema: StructType = StructType(
Seq(
StructField("pk", IntegerType),
StructField("some_long_value", IntegerType)
)
)
val rows: java.util.List[Row] = new util.LinkedList[Row]()
rows.add(Row(null, 1000))
rows.add(Row(null, 1001))
rows.add(Row(null, 1003))
rows.add(Row(null, 1003))
rows.add(Row(null, 1004))
rows.add(Row(null, 1004))
rows.add(Row(null, 1005))
val df: DataFrame = spark.createDataFrame(rows, schema)
df.write
.format("com.microsoft.sqlserver.jdbc.spark")
.mode(SaveMode.Append)
.option("url", "jdbc:sqlserver://localhost:1433;database=MSSQLDB")
.option("dbtable", "stg.test_table")
.option("user", "admin")
.option("password", "qwerty123")
.option("truncate", "true")
.option("batchsize", "10000")
.option("reliabilityLevel", "NO_DUPLICATES")
.option("tableLock", "true")
.option("schemaCheckEnabled", "false")
.save()
}
}
- Get errors
23/07/03 13:18:53 WARN TaskSetManager: Lost task 4.0 in stage 0.0 (TID 4) (os-2912.ajax.io executor 3): com.microsoft.sqlserver.jdbc.SQLServerException: Cannot insert the value NULL into column 'pk', table 'tempdb.dbo.##application_1670426294525_70004_stg.test_table_4'; column does not allow nulls. INSERT fails.
23/07/03 13:18:53 WARN TaskSetManager: Lost task 3.0 in stage 0.0 (TID 3) (os-2913.ajax.io executor 2): com.microsoft.sqlserver.jdbc.SQLServerException: Cannot insert the value NULL into column 'pk', table 'tempdb.dbo.##application_1670426294525_70004_stg.test_table_3'; column does not allow nulls. INSERT fails.
23/07/03 13:18:53 WARN TaskSetManager: Lost task 2.0 in stage 0.0 (TID 2) (os-2910.ajax.io executor 1): com.microsoft.sqlserver.jdbc.SQLServerException: Cannot insert the value NULL into column 'pk', table 'tempdb.dbo.##application_1670426294525_70004_stg.test_table_2'; column does not allow nulls. INSERT fails.
...
If I use BEST_EFFORT
instead of NO_DUPLICATES
then the app works like a charm and fills the target table
select * from stg.test_table;
+--+---------------+
|pk|some_long_value|
+--+---------------+
|1 |1003 |
|2 |1004 |
|4 |1000 |
|6 |1005 |
|7 |1001 |
+--+---------------+
My env:
- Microsoft SQL Server 2019 (RTM) - 15.0.2000.5 (X64)
Sep 24 2019 13:48:23
Copyright (C) 2019 Microsoft Corporation
Developer Edition (64-bit) on Windows Server 2019 Standard 10.0 (Build 17763: ) - Spark 3.2.1
- Scala 2.12.15
- spark-mssql-connector 1.2.0