microsoft/mssql-jdbc

SQLServerBulkCSVFileRecord doesn't revert IDENTITY_INSERT to OFF after a failure

ingvard opened this issue · 5 comments

SQLServerBulkCopy doesn't revert IDENTITY_INSERT to OFF after a failure

Driver version

mssql-jdbc-12.6.1.jre11

SQL Server version

Microsoft SQL Server 2017 (RTM-CU31-GDR) (KB5021126) - 14.0.3460.9 (X64) Developer Edition (64-bit) on Linux (Ubuntu 18.04.6 LTS)

Client Operating System

Linux (Ubuntu 18.04.6 LTS)

JAVA/JVM version

Kotlin 1.9.22 (Open JDK 17)

Table schema

CREATE TABLE "TEST_IDENTITY_TABLE" (
    "ID" "int"  IDENTITY (1, 1) NOT NULL,
    "TEXT" DATETIME
)

CREATE TABLE "TEST_IDENTITY_TABLE2" (
    "ID" "int"  IDENTITY (1, 1) NOT NULL
)

Problem description

After using SQLServerBulkCopy and it finishes with an exception, a connection will have IDENTITY_INSERT ON. The example below shows and repeats the problem step by step:

  1. We had to make two tables. One for data and the other to try to find out if the first one has IDENTITY_INSERT ON state.
CREATE TABLE "TEST_IDENTITY_TABLE" (
    "ID" "int"  IDENTITY (1, 1) NOT NULL,
    "TEXT" DATETIME
)

CREATE TABLE "TEST_IDENTITY_TABLE2" (
    "ID" "int"  IDENTITY (1, 1) NOT NULL
)
  1. Open the connection. In this example, we're using one connection for all actions. However, in real production code, the connection is usually returned to the connection pool with a faulty state.
val conn: Connection = DriverManager.getConnection(
        "jdbc:sqlserver://localhost:1433;encrypt=false;databaseName=test;username=sa;password=A_Str0ng_Required_Password"
    )
  1. Make a CSV file for inserting data at once, and then insert it into the database.
val csvFile = Files.createTempFile("bulk_csv", ".csv")
csvFile.toFile().writeText("1, 2022/7/28 12:21:00.0000000")


val bulkRecord = SQLServerBulkCSVFileRecord(
    csvFile.toAbsolutePath().toString(),
    StandardCharsets.UTF_8.name(),
     ",",
     false
).apply {
    addColumnMetadata(1, "ID", 4, 1, 1)
     addColumnMetadata(2, "TEXT", 93 /* timestamp */, 50, 3)
}

try {
   SQLServerBulkCopy(conn).use { bulkCopy ->
      bulkCopy.bulkCopyOptions = SQLServerBulkCopyOptions().apply {
         isKeepIdentity = true // Insert identity keys as is
       }
       bulkCopy.destinationTableName = "TEST_IDENTITY_TABLE"
       bulkCopy.writeToServer(bulkRecord)
    }
} catch (e: Exception) {
   e.printStackTrace()

    // Ignore exception ...
}

I have examined various situations where the insertion process breaks. From what I observe, it seems necessary to insert data that the server will accept, and then throw an exception during the response parsing stage.

com.microsoft.sqlserver.jdbc.SQLServerException: Conversion failed when converting date and/or time from character string.
	at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:261)
	at com.microsoft.sqlserver.jdbc.TDSTokenHandler.onEOF(tdsparser.java:316)
	at com.microsoft.sqlserver.jdbc.TDSParser.parse(tdsparser.java:137)
	at com.microsoft.sqlserver.jdbc.TDSParser.parse(tdsparser.java:42)
	at com.microsoft.sqlserver.jdbc.TDSParser.parse(tdsparser.java:31)
	at com.microsoft.sqlserver.jdbc.SQLServerBulkCopy.doInsertBulk(SQLServerBulkCopy.java:1605)
	at com.microsoft.sqlserver.jdbc.SQLServerBulkCopy$1InsertBulk.doExecute(SQLServerBulkCopy.java:673)
	at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:7739)
	at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:4384)
	at com.microsoft.sqlserver.jdbc.SQLServerBulkCopy.sendBulkLoadBCP(SQLServerBulkCopy.java:707)
	at com.microsoft.sqlserver.jdbc.SQLServerBulkCopy.writeToServer(SQLServerBulkCopy.java:1670)
	at com.microsoft.sqlserver.jdbc.SQLServerBulkCopy.writeToServer(SQLServerBulkCopy.java:630)
	at db.SQLServerBulkCopyReproducerKt.main(SQLServerBulkCopyReproducer.kt:53)
	at db.SQLServerBulkCopyReproducerKt.main(SQLServerBulkCopyReproducer.kt)
  1. Attempt to reuse the connection, but with a different identity.
val statement = conn.createStatement()

statement.execute("SET IDENTITY_INSERT dbo.TEST_IDENTITY_TABLE2 ON;")

The code above leads to the following exception:

Exception in thread "main" com.microsoft.sqlserver.jdbc.SQLServerException: IDENTITY_INSERT is already ON for table 'test.dbo.TEST_IDENTITY_TABLE'. Cannot perform SET operation for table 'dbo.TEST_IDENTITY_TABLE2'.
	at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:261)
	at com.microsoft.sqlserver.jdbc.SQLServerStatement.getNextResult(SQLServerStatement.java:1752)
	at com.microsoft.sqlserver.jdbc.SQLServerStatement.doExecuteStatement(SQLServerStatement.java:946)
	at com.microsoft.sqlserver.jdbc.SQLServerStatement$StmtExecCmd.doExecute(SQLServerStatement.java:840)
	at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:7739)
	at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:4384)
	at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeCommand(SQLServerStatement.java:293)
	at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeStatement(SQLServerStatement.java:263)
	at com.microsoft.sqlserver.jdbc.SQLServerStatement.execute(SQLServerStatement.java:813)
	at db.SQLServerBulkCopyReproducerKt.main(SQLServerBulkCopyReproducer.kt:64)
	at db.SQLServerBulkCopyReproducerKt.main(SQLServerBulkCopyReproducer.kt)

because SQLServerBulkCSVFileRecord doesn't revert IDENTITY_INSERT ON after the failure.

Expected behavior

The expected behavior is that the connection returns to its original state as it was before starting. For instance, in a successful case, it reverts back to its initial state.

Actual behavior

The actual behavior is that the connection state for each conn.createStatement() or another SQLServerBulkCSVFileRecord will remain as IDENTITY_INSERT dbo.TEST_IDENTITY_TABLE ON.

Hi @ingvard,

I'm not sure this is a mistake within the driver.

It's in the nature of IDENTITY_INSERT to persist for that connection, regardless of whether the connection is being re-used or not. If we're 'resetting' the value of IDENTITY_INSERT after failure, we're changing the state of the connection as its being returned to the pool, and that's not something the driver should do.

Reading up on best practices for IDENTITY_INSERT, the recommendation is to be sure to set IDENTITY_INSERT to OFF for a table, if there is a plan to set it to ON for another. This can easily be added to your catch clause for the case above.

Let us know if you have more to add regarding this topic. If not, we will move forward with closing the issue.

@Jeffery-Wasty, thank you for your response.

Your argument seems logical, but it's worth noting that SQLServerBulkCopy for SQLServerBulkCSVFileRecord already restores the state to its original form upon successful execution. This results in inconsistent behavior between two scenarios:

  1. Returning to the original state after a successful execution.
  2. Remaining in a broken state after a failed execution.

Reading up on best practices for IDENTITY_INSERT, the recommendation is to be sure to set IDENTITY_INSERT to OFF for a table, if there is a plan to set it to ON for another. This can easily be added to your catch clause for the case above.

However, it's important to acknowledge that checking whether the current connection has IDENTITY_INSERT set to ON for a specific table isn't a trivial task, as per information from StackOverflow.

It looks like the leaked abstraction, as the user of the SQLServerBulkCopy API needs to remember that certain errors during its execution can lead to the IDENTITY_INSERT state being left hanging, and they must manually reset it. This requirement adds complexity and burden to the user, which could be avoided with a more robust abstraction.

What do you think about it?

Hi @ingvard,

I agree that the inconsistency you have mentioned is not something we desire, and you make a good point about the the burden of resetting IDENTITY_INSERT for the user. I'll spend more time looking into this/check in with the team, and see what more we can do from our end.

Hi @ingvard,

So I spent additional time looking into this. After trying this other drivers and with just SQL through SSMS, I was able to replicate the same behavior as seen in JDBC. That is JDBC was consistent with other sources. Furthermore, the driver does no modification to IDENTITY_INSERT, regardless of success or failure. We just pass along the value and use it to determine whether additional actions are taken against the identity column.

I did try to add in behavior to set IDENTITY_INSERT for the most recent used table to OFF, in case of a failure, but was not able to find a solution.

Addressing your concerns from you last comment:

it's worth noting that SQLServerBulkCopy for SQLServerBulkCSVFileRecord already restores the state to its original form upon successful execution

As noted above. This 'inconsistent' behavior is consistent with other sources, and we can not determine how to change this in the desired way.

it's important to acknowledge that checking whether the current connection has IDENTITY_INSERT set to ON for a specific table isn't a trivial task

No, but it would be trivial to remember to set IDENTITY_INSERT to OFF after setting it to ON. In fact, I would urge that this behavior be extended to all scenarios. If you set to ON at the start of a program or query, set to OFF after transactions with that table have completed.

Given the above, I'm inclined to chalk this up as 'intended driver behavior' and mark the issue closed. Please let me know if you have any additional questions, concerns, or information.

Closing for above reasons.